123
Table of Contents
Keynote lectures ............................................................................................................................................................................................. S6 The perception of other’s goal-directed actions ............................................................................................................................. S6 Body ownership, self-location, and embodied cognition ............................................................................................................ S6 Life as we know it ................................................................................................................................................................................... S6 Elements of extreme expertise ............................................................................................................................................................. S7 Dynamic Field Theory: from the sensory-motor domain to embodied higher cognition ................................................. S7 How t(w)o perform actions together .................................................................................................................................................. S7 Symposia ........................................................................................................................................................................................................... S8 DRIVER COGNITION ............................................................................................................................................................................. S8 The CSB model: A cognitive approach for explaining speed behavior ................................................................................. S8 Validation of the Driving by Visual Angle car following model ............................................................................................ S8 The effects of event frequency and event predictability on driver’s attention allocation ................................................ S9 Integrated modeling for safe transportation (IMoST 2): driver modeling & simulation .................................................. S9 Simulating the influence of event expectancy on drivers’ attention distribution ................................................................ S9 PROCESSING LANGUAGE IN CONTEXT: INSIGHTS FROM EMPIRICAL APPROACHES ............................... S10 Investigations into the incrementality of semantic interpretation: the processing of quantificational restriction ... S10 When the polar bear fails to find a referent: howare unmet presuppositions processed? .............................................. S10 Deep or surface anaphoric pronouns?: Empirical approaches ................................................................................................. S10 Comparing presuppositions and scalar implicatures ................................................................................................................... S11 The time course of referential resolution ....................................................................................................................................... S11 COGNITION OF HUMAN ACTIONS: FROM INDIVIDUAL ACTIONS TO INTERACTIONS .............................. S11 Signaling games in sensorimotor interactions............................................................................................................................... S11 Perceptual cognitive processes underlying the recognition of individual and interactive actions ............................... S11 Neural theory for the visual processing of goal-directed actions ........................................................................................... S11 From individual to joint action: representational commonalities and differences ............................................................ S12 Neural mechanisms of observing and interacting with others ................................................................................................. S12 CORTICAL SYSTEMS OF OBJECT GRASPING AND MANIPULATION ..................................................................... S12 Influences of action characteristics and hand used on the neural correlates of planning and executing object manipulations ........................................................................................................................................................................................... S13 Attention is needed for action control: evidence from grasping studies .............................................................................. S13 Effects of object recognition on grasping ...................................................................................................................................... S13 The representation of grasping movements in the human brain ............................................................................................. S13 Avoiding obstacles without a ventral visual stream ................................................................................................................... S14 Action and semantic object knowledge are processed in separate but interacting streams: evidence from fMRI and dynamic causal modelling........................................................................................................................................................... S14 EYE TRACKING, LINKING HYPOTHESES AND MEASURES IN LANGUAGE PROCESSING ........................ S14 Conditional analyses of eye movements......................................................................................................................................... S14 Rapid small changes in pupil size index processing difficulty: the index of cognitive activity in reading, visual world, and dual task paradigms ...................................................................................................................................................................... S15 Measures in sentence processing: eye tracking and pupillometry.......................................................................................... S15 Improving linking hypotheses in visually situated language processing: combining eye movements and event-related brain potentials ........................................................................................................................................................................................ S16 Oculomotor measurements of abstract and concrete cognitive processes ........................................................................... S16 MANUAL ACTION ................................................................................................................................................................................ S16 The Bremen-Hand-Study@Jacobs: effects of age and expertise on manual dexterity .................................................... S17
123
Planning anticipatory actions: on the interplay between normative and mechanistic models ...................................... S17 Identifying linguistic and neural levels of interaction between gesture and speech during comprehension using EEG and fMRI ................................................................................................................................................................................................... S17 Neural correlates of gesture-syntax interaction ............................................................................................................................ S18 Interregional connectivity minimizes surprise responses during action perception .......................................................... S18 The development of cognitive and motor planning skills in young children ..................................................................... S18 PREDICTIVE PROCESSING: PHILOSOPHICAL AND NEUROSCIENTIFIC PERSPECTIVES ............................. S18 Bayesian cognitive science, unification, and explanation ......................................................................................................... S19 The explanatory heft of Bayesian models of cognition ............................................................................................................. S19 Predictive processing and active inference .................................................................................................................................... S19 Learning sensory predictions for perception and action ............................................................................................................ S19 Layer resolution fMRI to investigate cortical feedback and predictive coding in the visual cortex .......................... S19 HOW LANGUAGE AND NUMERICAL REPRESENTATIONS CONSTITUTE MATHEMATICAL COGNITION ........................................................................................................................................................................................... S20 Influences of number word inversion on multi-digit number processing: a translingual eye-tracking study .......... S20 On the influence of linguistic and numerical complexity in word problems ..................................................................... S21 Linguistic influences on numerical understanding: the case of Welsh................................................................................. S21 Reading space into numbers: an update ......................................................................................................................................... S21 How language and numerical representations constitute mathematical cognition: an introductory review ............. S21 Language influences number processing: the case of bilingual Luxembourg .................................................................... S21 Language differences in basic numerical tasks ............................................................................................................................ S22 Cognitive components of the mathematical processing network in primary school children: linguistic and language independent contributions .................................................................................................................................................................... S22 It does exist! A SNARC effect amongst native Hebrew speakers is masked by the MARC effect........................... S22 MODELING OF COGNITIVE ASPECTS OF MOBILE INTERACTION .......................................................................... S22 Creating cognitive user models on the basis of abstract user interface models ................................................................ S22 Expectations during smartphone application use ......................................................................................................................... S22 Evaluating the usability of a smartphone application with ACT-R ....................................................................................... S23 Simulating interaction effects of incongruous mental models................................................................................................. S24 ‘‘Special offer! Wanna buy a trout?’’—Modeling user interruption and resumption strategies with ACT-R ......... S24 Tutorials .......................................................................................................................................................................................................... S25 Introduction to probabilistic modeling and rational analysis ................................................................................................... S25 Modeling vision ...................................................................................................................................................................................... S25 Visualization of eye tracking data .................................................................................................................................................... S25 Introduction to cognitive modelling with ACT-R ....................................................................................................................... S25 Dynamic Field Theory: from sensorimotor behaviors to grounded spatial language ...................................................... S25 Poster presentations ..................................................................................................................................................................................... S27 The effect of language on spatial asymmetry in image perception ....................................................................................... S27 Towards formally founded ACT-R simulation and analysis.................................................................................................... S27 Identifying inter-individual planning strategies ............................................................................................................................ S28 Simulating events. The empirical side of the event-state distinction .................................................................................... S29 On the use of computational analogy-engines in modeling examples from teaching and education ......................... S30 Brain network states affect the processing and perception of tactile near-threshold stimuli ........................................ S31 A model for dynamic minimal mentalizing in dialogue ........................................................................................................... S32 Actions revealing cooperation: predicting cooperativeness in social dilemmas from the observation of everyday actions ........................................................................................................................................................................................................ S33 The use of creative analogies in a complex problem situation ............................................................................................... S34 Yes, that’s right? Processing yes and no and attention to the right vs. left........................................................................ S35 Perception of background color in head mounted displays: applying the source monitoring paradigm ................... S36 Continuous goal dynamics: insights from mouse-tracking and computational modeling .............................................. S37 Looming auditory warnings initiate earlier event-related potentials in a manual steering task ................................... S38 The creative process across cultures ................................................................................................................................................ S38
123
How do human interlocutors talk to virtual assistants? A speech act analysis of dialogues of cognitively impaired people and elderly people with a virtual assistant..................................................................................................................................... S40 Effects of aging on shifts of attention in perihand space ......................................................................................................... S41 The fate of previously focused working memory content: decay or/and inhibition? ...................................................... S41 How global visual landmarks influence the recognition of a city ......................................................................................... S42 Explicit place-labeling supports spatial knowledge in survey, but not in route navigation .......................................... S44 How important is having emotions for understanding others’ emotions accurately? ...................................................... S45 Prosody conveys speakers’ intentions: acoustic cues for speech act perception ............................................................... S46 On the perception and processing of social actions.................................................................................................................... S46 Stage-level and individual-level interpretation of multiple adnominal adjectives as an epiphenomenon—theoretical and empirical evidence ......................................................................................................................................................................... S47 What happened to the crying bird? –Differential roles of embedding depth and topicalization modulating syntactic complexity in sentence processing ................................................................................................................................................... S48 fMRI-evidence for a top-down grouping mechanism establishing object correspondence in the Ternus display . S48 Event-related potentials in the recognition of scene sequences .............................................................................................. S49 Sensorimotor interactions as signaling games .............................................................................................................................. S50 Subjective time perception of verbal action and the sense of agency .................................................................................. S51 Memory disclosed by motion: predicting visual working memory performance from movement patterns ............. S52 Role and processing of translation in biological motion perception ..................................................................................... S53 How to remember Tu¨bingen? Reference frames in route and survey knowledge of one’s city of residency ......... S53 The effects of observing other people’s gaze: faster intuitive judgments of semantic coherence .............................. S54 Towards a predictive processing account of mental agency .................................................................................................... S55 The N400 ERP component reflects implicit prediction error in the semantic system: further support from a connectionist model of word meaning ....................................................................................................................................................................... S56 Similar and differing processes underlying carry and borrowing effects in addition and subtraction: evidence from eyetracking ...................................................................................................................................................................................................... S57 Simultaneous acquisition of words and syntax: contrasting implicit and explicit learning ........................................... S58 Towards a model for anticipating human gestures in human-robot interactions in shared space ............................... S59 Preserved expert object recognition in a case of unilateral visual agnosia ......................................................................... S60 Visual salience in human landmark selection ............................................................................................................................... S60 Left to right or back to front? The spatial flexibility of time ................................................................................................. S61 Smart goals, slow habits? Individual differences in processing speed and working memory capacity moderate the balance between habitual and goal-directed choice behavior .......................................................................................... S62 Tracing the time course of n - 2 repetition costs ...................................................................................................................... S62 Language cues in the formation of hierarchical representation of space............................................................................. S63 Processing of co-articulated place information in lexical access ........................................................................................... S64 Disentangling the role of inhibition and emotional coding on spatial stimulus devaluation ........................................ S65 The role of working memory in prospective and retrospective motor planning ............................................................... S66 Temporal preparation increases response conflict by advancing direct response activation ......................................... S67 The flexibility of finger-based magnitude representations ....................................................................................................... S68 Object names correspond to convex entities ................................................................................................................................. S69 The role of direct haptic feedback in a compensatory tracking task .................................................................................... S71 Comprehending negated action(s): embodiment perspective ................................................................................................... S71 Effects of action signaling on interpersonal coordination ........................................................................................................ S72 Physiological changes through sensory augmentation in path integration: an fMRI study ........................................... S73 Do you believe in Mozart? The influence of beliefs about composition on representing joint action outcomes in music ..................................................................................................................................................................................................... S73 Processing sentences describing auditory events: only pianists show evidence for an automatic space pitch association ................................................................................................................................................................................................ S74 A free energy approach to template matching in visual attention: a connectionist model ............................................ S75 ORAL PRESENTATIONS ....................................................................................................................................................................... S77 Analyzing psychological theories with F-ACT-R: an example F-ACT-R application .................................................... S79
123
F-ACT-R: defining the ACT-R architectural space .................................................................................................................... S81 Defining distance in language production: extraposition of relative clauses in German ............................................... S81 How is information distributed across speech and gesture? A cognitive modeling approach ...................................... S84 Towards formally well-founded heuristics in cognitive AI systems ..................................................................................... S87 Action planning is based on musical syntax in expert pianists. ERP evidence................................................................. S89 Motor learning in dance using different modalities: visual vs. verbal models .................................................................. S90 A frontotemporoparietal network common to initiating and responding to joint attention bids .................................. S93 Action recognition and the semantic meaning of actions: how does the brain categorize different social actions? S95 Understanding before language ......................................................................................................................................................... S95 An embodied kinematic model for perspective taking .............................................................................................................. S97 The under-additive effect of multiple constraint violations ................................................................................................... S100 Strong spatial cognition...................................................................................................................................................................... S103 Inferring 3D shape from texture: a biologically inspired model architecture .................................................................. S105 An activation-based model of execution delays of specific task steps ............................................................................... S107 How action effects influence dual-task performance ............................................................................................................... S110 Introduction of an ACT-R based modeling approach to mental rotation .......................................................................... S112 Processing linguistic rhythm in natural stories: an fMRI study............................................................................................ S114 Numbers affect the processing of verbs denoting movements in vertical space ............................................................. S115 Is joint action necessarily based on shared intentions? ........................................................................................................... S117 A general model of the multi-level architecture of mental phenomena. Integrating the functional paradigm and the mechanistic model of explanation................................................................................................................................... S119 A view-based account of spatial working and long-term memories: Model and predictions ..................................... S120 Systematicity and Compositionality in Computer Vision ....................................................................................................... S123 Control and flexibility of interactive alignment: Mo¨bius syndrome as a case study..................................................... S125 Efficient analysis of gaze-behavior in 3D environments ........................................................................................................ S127 The role of the posterior parietal cortex in relational reasoning .......................................................................................... S129 How to build an inexpensive cognitive robot: Mind-R ........................................................................................................... S131 Crossed hands stay on the time-line .............................................................................................................................................. S134 Is the novelty-P3 suitable for indexing mental workload in steering tasks? .................................................................... S135 Modeling perspective-taking by forecasting 3D biological motion sequences ................................................................ S137 Matching quantifiers or building models? Syllogistic reasoning with generalized quantifiers .................................. S139 What if you could build your own landmark? The influence of color, shape, and position on landmark salience S142 Does language shape cognition? ..................................................................................................................................................... S144 Ten years of adaptive rewiring networks in cortical connectivity modeling. Progress and perspectives ............... S146 Bayesian mental models of conditionals ...................................................................................................................................... S148 Visualizer verbalizer questionnaire: evaluation and revision of the German translation ............................................. S151 AUTHOR INDEX ..................................................................................................................................................................................... S155
Disclosure: This issue was not sponsored by external commercial interests.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 DOI 10.1007/s10339-014-0632-2
ABSTRACTS
Special Issue: Proceedings of KogWis 2014 12th Biannual conference of the German cognitive science society (Gesellschaft fu¨r Kognitionswissenschaft) Edited by Anna Belardinelli and Martin V. Butz
Keynote lectures The perception of other’s goal-directed actions Harold Bekkering Donders Institute for Brain, Cognition and Behavior, Radboud University Nijmegen, The Netherlands It is widely assumed that perception of the world is based on internal models of that world and that models are shaped via prior experiences that modulate the likelihood of a certain action given a certain context. In this talk, I will outline some experimental and theoretical ideas how humans perceive goal-directed actions of others on the basis of object and movement knowledge. I will also discuss a potential role for language in improving our world model including a better perception of other agents’ goal-directed actions.
Body ownership, self-location, and embodied cognition H. Henrik Ehrsson Department of Neuroscience, Karolinska Institutet, Stockholm, Sweden Ask any child if his hands belong to him and the answer will be ‘‘Of course!’’ However, how does the brain actually identify its own body? In this talk, Dr. Ehrsson will describe how cognitive neuroscientists have begun to address this fundamental question. One key idea is that parts of the body are distinguished from the external world by the patterns of the correlated information they produce from different sensory modalities (vision, touch and muscle sense). It is hypothesized that these correlations are detected by neuronal populations in premotor and posterior parietal areas that integrate multisensory information from the space near the body. Dr. Ehrsson and his team have recently used a combination of functional magnetic resonance imaging (fMRI) and human behavioral experiments to present experimental results that support these predictions. To change the feeling of body ownership, perceptual illusions were used so that healthy individuals experienced a rubber hand as their own, their real hand being ‘disowned’, or that a mannequin was their body. Dr. Ehrsson will also describe recent experiments that investigate how we come to experience our body as being located at a specific place in the world, and how this sense of self-location depends on body ownership To this end an ‘out-of-body’ illusion was used to perceptually ‘teleport’
123
participants’ bodily self to different locations during high-resolution fMRI acquisition. It was found that activity patterns in the hippocampus, retrosplenial, posterior cingulate, and posterior parietal cortices reflected the sense of self-location, and that the functional interplay between selflocation and body ownership was mediated via the posterior cingulate cortex suggesting a key role of this structure in generating the coherent experience of the bodily self in space. In the final part of his talk Dr. Ehrsson will discuss recent studies that have investigated how the central construct of the bodily self influences other higher cognitive functions such as the visual perception of the world and the ability to remember personal events (embodied cognition). These experiments suggest that the representation of one’s own body affects visual perception of object size by rescaling the visual representation of external space, and that efficient hippocampus-based episodic-memory encoding requires a first-person perspective of the spatial relationship between the body and the world. Taken together, the studies reviewed in this lecture advance our understanding of how we come to experience ownership of a body located at a single place, and unravel novel basic links between central body representation, visual perception of the world and episodic memory.
Life as we know it Karl Friston Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, UK How much about our interaction with—and experience of—our world can be deduced from basic principles? This talk reviews recent attempts to understand the self-organized behavior of embodied agents—like ourselves—as satisfying basic imperatives for sustained exchanges with our world. In brief, one simple driving force appears to explain nearly every aspect of our behavior and experience. This driving force is the minimization of surprise or prediction error. In the context of perception, this corresponds to (Bayes-optimal) predictive coding that suppresses exteroceptive prediction errors. In the context of action, simple reflexes can be seen as suppressing proprioceptive prediction errors. We will look at some of the phenomena that emerge from this formulation, such as hierarchical message passing in the brain and the perceptual inference that ensues. I hope to illustrate these points using simple simulations of how life-like behavior emerges almost inevitably from coupled dynamical systems—and how this behavior can be understood in terms of perception, action and action observation.
Cogn Process (2014) 15 (Suppl 1):S1–S158
Elements of extreme expertise Wayne D. Gray Rensselaer Polytechnic Institute, Troy, NY, USA We are studying the acquisition and deployment of extreme expertise during the real-time interaction of a single human with complex, dynamic decision environments. Our dilemma is that people who have the specific skills we wish to generalize to (such as helicopter piloting, laparoscopic surgery, and air traffic control) are very rare in the college population and too expensive to bring into our lab. Our solution has been to study expert and novice video game players. Our approach takes the position that Cognitive Science has been overly fixated on isolating small components of individual cognition. That approach runs the danger of overfitting theories to paradigms. Our way out of this dilemma is to bring together (a) powerful computational models, (b) machine learning techniques, and (c) microanalysis techniques that integrate analyzes of cognitive, perceptual, and action data collected from extreme performers to develop, test, and extend cognitive theory. Since our January 2013 start, we have built our experimental paradigm, collected naturalistic and laboratory data, published journal and conference papers, won Rensselaer Undergraduate research prizes, developed ‘‘single-piece optimizers’’ (SPOs, i.e., machine learning systems), compared machine performers to human performers, and begun analyzing eye and behavioral data from two 6 h human studies. Our tasks have been the games of Tetris and Space Fortress. Future plan include (a) using our SPOs to tutor piece-bypiece placement, (b) developing integrated cognitive models that account for cognition, action, and perception, and (c) continued exploration of the differences between good players and extreme experts in Tetris and Space Fortress. Games such as Tetris and Space Fortress are often dismissed as ‘‘merely requiring reflex behavior.’’ However, with an estimated total number of board configurations of 2199 (approx. 8 followed by 59 zeroes), Tetris cannot be ‘‘merely reflect behavior.’’ Our preliminary analyzes show complex goal hierarchies, dynamic ‘‘two-piece’’ plans that are updated after every episode, sophisticated use of subgoaling, and the gradual adaptation of strategies and plans as the speed of play increases. These are very sophisticated, human strategies, beyond our current capability to model, and are challenging topic for the study of the Elements of Extreme Expertise.
Dynamic Field Theory: from the sensory-motor domain to embodied higher cognition
S7 been extended to understand elements of visual cognition such as scene representations, object recognition, change detection, and binding. Sequences of cognitive or motor operations can be understood in this framework, which begins to reach into language by providing simple forms of grounding of spatial and action concepts. Discrete events emerge from instabilities in the underlying neural dynamics. Categories emerge from inhomogeneities in the underlying neural populations that are amplified into macroscopic states by dynamic instabilities. I will illustrate how the framework makes contact with psychophysical and neural data, but can also be used to create artificial cognitive systems that act and think based on its own sensory and motor systems.
How t(w)o perform actions together Natalie Sebanz SOMBY LAB, Department of Cognitive Science, Central European University, Budapest, Hungary Humans are remarkably skilled at coordinating their actions with one another. Examples range from shaking hands or lifting a box together to dancing a tango or playing a piano duet. What are the cognitive and neural mechanisms that enable people to engage in joint actions? How does the ability to perform actions together develop? And why is it so difficult to have robots engage in smooth interactions with humans and with each other? In this talk, I will review recent studies addressing two key ingredients of joint action: how individuals include others in their action planning, and how they achieve the finegrained temporal coordination that is essential for many different types of joint action. This research shows that people have a strong tendency to form representations of others’ tasks, which affects their perception and attention, their action planning, and their encoding of information in memory. To achieve temporal coordination of their actions, people reduce the variability of their movements, predict the actions of their partners using their own motor system, and modulate their own actions to highlight critical information to their partner. I will discuss how social relations between individuals and groups and the cooperative or competitive character of social interactions modulate these processes of action planning and coordination. The next challenge for joint action research will be to understand how joint action enables learning. This will allow us to understand what it takes for people to become experts in particular joint actions, and how experts teach individual skills through performing joint actions with novices.
Gregor Scho¨ner Institut fu¨r Neuroinformatik, Ruhr-Universita¨t Bochum, Germany The embodiment stance emphasizes that cognitive processes are closely linked to the sensory and motor surfaces. This implies that cognitive processes share with sensory-motor processes fundamental properties including graded state variables, continuous time dependence, stability, and continuous metric contents. According to the embodiment hypothesis these properties are pervasive throughout cognition. This poses the challenge to understand how seemingly categorical states emerge, on which cognitive processes seem to operate at discrete event times. I will review Dynamic Field Theory, a theoretical framework that is firmly grounded in the neurophysiology of population activation in the higher nervous system. Dynamic Field Theory has its origins in the sensory-motor domain where it has been used to understand movement preparation, sensory-motor decisions, and motor memory. In the meantime, however, the framework has
123
S8
Symposia DRIVER COGNITION Convenor: Martin Baumann Ulm University, Germany From a psychological point of view driving is a highly complex task despite the fact that millions of people perform this task in a safe and efficient way each day. It involves many mental processes and structures, such as perception, attention, memory, knowledge, manual control, decision making, and action selection. These processes and structures need to work closely integrated to master the challenges of driving a vehicle in a highly dynamic task environment—our daily traffic. On the other hand despite all advances in traffic safety in recent years still about 31.000 people were killed in 2010 on European roads. A high percentage of these fatalities are due to human error, which reflects a brake-down of the interplay between the aforementioned cognitive processes. Therefore, understanding the cognitive processes that underlie driver behavior is not just a highly interesting academic endeavor to learn how the human mind masters highly dynamic tasks but is also vital for further improvement of traffic safety. The papers presented in this symposium address different aspects of driver cognition demonstrating the variety of processes relevant in the study of driver cognition. They have all in common that their empirical work is based on models of the underlying mental processes, ranging from conceptual models to quantitative and computational models. Two papers present recent results on models addressing the driver’s longitudinal control behavior. Whereas Ka¨thner and Kuhl present results on the validation of a specific car following model that is based on those input variables that are actually available to the human driver, Brandenburg and Thu¨ring present empirical results on the validation of a general model of speed behavior based on the interplay of bottom-up and top-down processes. Weber presents the results of a joint research project that aimed at developing an integrated driver model within a computational cognitive architecture, called CASCaS, allowing simulations of driver behavior in a real-time driving simulation environment. Both papers of Wortelen and Kaul and Baumann investigate factors influencing the distribution of attention while driving. Wortelen implemented a computational model of attention distribution within the cognitive architecture CASCaS to model the effects of expectations about event frequencies and of information value on attention distribution. Kaul and Baumann investigated the effects of event predictability in comparison to event frequency on attention distribution to explain related findings on rear-end accidents.
The CSB model: A cognitive approach for explaining speed behavior Stefan Brandenburg, Manfred Thu¨ring Cognitive Psychology and Cognitive Ergonomics, TU Berlin, Germany Based on Daniel Kahneman’s (2012) distinction between highly automated, fast processes (system 1) and conscious slower cognition (system 2), the Components of Speed Behavior Model explains the driver’s longitudinal control of a vehicle by the interplay of bottomup and top-down processes.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 System 1 is active in common and uncritical situations. The regulation of speed is determined by sensory data from the environment that are processed automatically without demanding many resources. The resulting visual, auditive, haptic and kinesthetic sensations are integrated into a subjective speed impression. An unconscious and automated control process continuously matches this impression against the driver’s skills and resources. When his capabilities are exceeded, the driver decelerates the vehicle, when they are underchallenged he accelerates it. In case both components are balanced, he keeps the speed constant. The driver’s behavior determines the objective speed of the vehicle that in turn impacts his sensations and thus his subjective speed impression. Hence in the dynamic situation of driving, system 1 is considered as a closed-loop process that requires but little attention and controls the speed of the car in an automated way. This process is monitored by system 2 that is responsible for tactic and strategic actions. It takes over control when a critical situation demands specific maneuvers under attention or when decisions for way finding and navigation are required. The assumptions of the CSB Model with respect to system 1 were tested in four experiments using a simple driving simulator. Their results support the basic characteristics of the model. In the most complex study, features of the environment were varied together with the drivers’ mental workload. As predicted by the model, these variables influenced the subjective impression of speed as well as the objective speed. Besides such supporting evidence, additional influences were detected which served to state some components more precisely and to expand the model. The final CSB version is published in Brandenburg (2014). References Brandenburg S (2014) Geschwindigkeitswahl im Straßenverkehr: Theoretische Erkla¨rung und empirische Untersuchungen. SVHVerlag, Saarbru¨cken Kahnemann D (2012) Schnelles Denken - langsames Denken. Siedler Verlag, Mu¨nchen
Validation of the Driving by Visual Angle car following model David Ka¨thner, Diana Kuhl Deutsches Zentrum fu¨r Luft- und Raumfahrt, Braunschweig, Germany Development and validation of Advanced Driver Assistance Systems require both an understanding of driver behavior as well as the means to quickly test systems under development at various stages. Quantitative models of human driver behavior offer both capabilities. Conventional models attempt to reproduce or predict behavior based on arbitrary input, whereas psychological models seek to emulate human behavior based on assumed cognitive functions. One common driving task is car following. As this is a straightforward control problem, a plethora of control models exist (Brackstone, McDonnald 1999). But typical car following models use input variables that are not directly accessible to human drivers, such as speed of or distance to a lead vehicle. One example of such a model is the classic Helly car following model (Helly 1959). Andersen and Sauer (2007) argued that to a human driver the only available input parameter is the visual angle of a lead vehicle. They substituted both velocities and distances in Helly’s model with the visual angle, changing the properties of the controller considerably. They showed their Driving by Visual Angle (DVA) model to be superior to other car following models but did not compare it directly to the Helly model. In a simulator pre-study, we
Cogn Process (2014) 15 (Suppl 1):S1–S158 recreated Anderson and Sauer’s experimental setting to gather information on the DVA parameter properties and compared them to the original findings. To test the model’s usability in real world settings, we conducted an extensive data collection in real traffic. On a 70 km course through urban and rural settings as well as on a motorway, 10 subjects were instructed to follow a lead vehicle driven by a confederate. We will present findings on the model’s quality, properties of the model’s parameters such as their stability, and compare them to similar models of car following.
S9
Integrated modeling for safe transportation (IMoST 2): driver modeling & simulation Lars Weber OFFIS, Institute for Information Technology, Oldenburg, Germany
References Andersen GJ, Sauer CW (2007) Optical information for car following: the driving by visual angle (DVA) model. Human Factors 49:878–896 Brackstone M, McDonald M (1999) Car-following: a historical review. Transp Res Part F 2(4):181–196 Helly W (1959) Simulation of bottlenecks in single lane traffic flow. In: International symposium on the theory of traffic flow, New York, NY, USA
The effects of event frequency and event predictability on driver’s attention allocation Robert Kaul1, Martin Baumann2 1 Deutsches Zentrum fu¨r Luft- und Raumfahrt e.V., Institut fu¨r Verkehrssystemtechnik, Braunschweig, Germany; 2 Department Human Factors, Ulm University, Germany Safe driving requires the appropriate allocation of visual attention to the relevant objects and events of a traffic situation. According to the SEEV model (e.g., Horrey, Wickens, Consalus 2006) the allocation of visual attention to a visual information source is influenced by four parameters: i) the salience of the visual information, ii) the effort to allocate attention to this source, iii) the expectancy, i.e. the expectation that at a given location new relevant information will occur and iv) the value or importance of the piece of information perceived at an information source. Whereas the first two reflect more or less bottomup processes of attention allocation, the latter two reflect top-down processes. According to the SEEV model the expectancy itself is mainly determined by the frequency of events at that information source or location. But it seems plausible to assume that these topdown processes, represented in the expectancy parameter of the model, are also influenced by the by the predictability of events at a certain information source. That is, many predictable events in channel cause less attention allocation than a single but unexpected event. In a driving simulator experiment, conducted within the EU project ISi-PADAS, we compared the effects of event frequency and event predictability on the allocation of visual attention. 20 participants took part in this experiment. They had to drive in an urban area with a lead car changing either frequently its speed or not at all on a straight section before a crossing, braking either predictably at the crossing (stop sign) or unpredictably (priority sign) at the crossing, and simultaneously performing a visual secondary task with either high frequency or low frequency stimulus presentation. Drivers’ gaze behavior was recorded while driving. The results show drivers allocation of visual attention is mainly determined by the predictability of the lead car’s behavior demonstrating the importance of the driver’s ability to predict events as major determinant of driving behavior. References Horrey WJ, Wickens CD, Consalus KP (2006) Modeling drivers’ visual attention allocation while interacting with in-vehicle technologies. J Exp Psychol Appl 12(2):67–78
IMoST 21 is an interdisciplinary research project between the three partners C.v.O. University of Oldenburg, OFFIS and the DLR Brunswick (2010–2013). The project addresses the question of completing the scope of model-based design to also incorporate human behavior. The application area is the design of advanced driver assistance systems (ADAS) in the automotive domain. Compared to the predecessor project IMoST 1 which addressed a single driving maneuver only (entering the ‘‘autobahn’’), IMoST 2 increased the scope of the scenario and deals with typical driving maneuvers on the autobahn, including lane changes as well as free-flow and car-following. The presentation will give an overview of the final state of driver modeling and simulation activities conducted in the project. During the 3 years of the project a driver model was implemented based on the cognitive architecture CASCaS. The architecture incorporates several psychological theories about human cognition and offers a flexible component based approach to integrate various human modeling techniques. The presentation will provide a brief overview about the various submodels like multimodal perception, situation representation, decision making and action selection/execution and how this architecture can be used to model and simulate human machine interaction in the domain of driver modeling. Additionally, some of the empirical study results will be presented which were used to parameterize the model.
Simulating the influence of event expectancy on drivers’ attention distribution Bertram Wortelen OFFIS, Institute for Information Technology, Oldenburg, Germany The distribution of attention is a critical aspect of driving. The increased use of assistance and automation system as well as new 1
Integrated Modeling for Safe Transportation 2 (Funded by MWK Niedersachsen (VW Vorab)).
123
S10 infotainment systems changes the distribution of attention. This work presents the Adaptive Information Expectancy (AIE) model, a new model of attention distribution, which is based on Wicken’s SEEVmodel. It can be integrated into cognitive architectures which are used to simulate task models. The AIE model enables a very detailed simulation of the distribution of attention in close interaction with the simulation of a task model. Unlike the SEEV model, simulations using the AIE model allow to derive several measures of human attention distribution besides the percentage gaze distribution, like gaze frequencies and gaze transition probabilities. Due to the tight integration with the simulation of task models, it is also possible to simulate the resulting consequences on the operator’s behavior (e.g. steering behavior of drivers). The AIE model considers two factors which have a great impact on drivers’ attention: the expectancy of events and the value of information. The main focus is on the expectancy of events. The AIE model provides a new method to automatically determine the event expectancy from the simulation of a task model. It is shown how the AIE model is integrated in the cognitive architecture CASCaS. A driving simulator study is performed to analyze the AIE model in a realistic driving environment. The simulation scenario is driven by human drivers as well as by a driver model developed with CASCaS using the AIE model. This scenario investigates the effects of both factors on drivers’ attention distribution: event expectancy and information value. Comparing the behavior of the human drivers to model behavior shows a good model fit for the percentage distribution of attention as well as gaze frequencies and gaze transition probabilities.
Cogn Process (2014) 15 (Suppl 1):S1–S158 provide an interdisciplinary platform for linguists and cognitive psychologists to discuss questions pertaining the cognitive processing of language. Our speakers will present their research obtained by means of different empirical approaches.
Investigations into the incrementality of semantic interpretation: the processing of quantificational restriction Petra Augurzky, Oliver Bott, Wolfgang Sternefeld, Rolf Ulrich SFB 833, University of Tu¨bingen, Germany Language comprehenders have the remarkable ability to restrict incoming language seemingly effortless in a way that it optimally fits the referential domain of discourse. We present a study which investigates the incremental nature of this update process, in particular, whether the semantic processor immediately takes into account the context of the utterance to incrementally compute and, if necessary, reanalyze the semantic value of yet partial sentences.
When the polar bear fails to find a referent: how are unmet presuppositions processed? Christian Brauner, Bettina Rolke SFB 833, University of Tu¨bingen, Germany
PROCESSING LANGUAGE IN CONTEXT: INSIGHTS FROM EMPIRICAL APPROACHES Convenors: Christian Brauner, Gerhard Ja¨ger, Bettina Rolke Project B2, SFB 833, University of Tu¨bingen, Germany Discourse understanding does not only mean integrating semantic knowledge along syntactic rules. It rather needs a Theory of Mind, entails the inclusion of context information, and presupposes that pragmatic principles were met. Moreover, data from brain imaging studies suggest that language is embodied within the motor and sensory processing systems of the brain. Thus, it seems clear that the faculty of language does not constitute a single, encapsulated processing module. Instead it requires the interoperation of several different processing modules serving to aid an unambiguous discourse understanding. Important processing prerequisites for successful discourse understanding are the ability to make references to previously established knowledge and to integrate new information into a given context. There are several linguistic tools which help to signal the requirement for suitable referents in a given discourse and which provide additional meaning aspects. One example are presuppositions. These carry context assumptions aside from the literal meaning of the words. For example, the sentence ‘‘The cat ran away’’ asserts that some cat ran away, whereas it presupposes that there exists a cat and that the cat that is mentioned is unique in the discourse. This symposium will have its main focus on the cognitive processing of such semantic and pragmatic phenomena. The interconnection of the faculty of language with different cognitive processing modules confronts us with questions that seem to escape a uniform analysis by one single academic discipline. Hence, research into cognitive language processing and pragmatics in particular are a fruitful interdisciplinary interface between linguistics and cognitive psychology. While linguists have mainly focused on theoretical aspects of pragmatics, cognitive psychologists aimed to identify involved cognitive processing functions. The symposium will
123
Discourse understanding fails when presuppositions, i.e., essential context information, are not given. We investigated the time-course of presupposition processing by presenting presupposition triggers such as the definite article or the iterative ‘‘again’’ in a context which contrasted with the presupposed content of the trigger or was compatible with it. By means of reading-time studies and event-related brain potentials we show that readers notice semantic inconsistencies at the earliest time point during reading. Our results additionally suggest that different presupposition processing strategies were employed depending on the type of required reference process.
Deep or surface anaphoric pronouns?: Empirical approaches Pritty Patel-Grosz, Patrick Grosz SFB 833, University of Tu¨bingen, Germany Anaphoric expressions, such as pronouns (‘he/she/it/they’), which generally retrieve earlier information (e.g. a discourse referent), are typically taken to be central in establishing textual coherence. From a cognitive/ processing perspective, the following question has been posed by Hankamer, Sag (1976) and Sag, Hankamer (1984): do all anaphoric expressions involve the same cognitive mechanisms, or is there a difference between ‘deep anaphora’ (which retrieves information directly from the context) vs. ‘surface anaphora’ (which operates on structural information/syntactic principles)? This question has largely been investigated for phenomena such as do it anaphora vs. VP ellipsis, but it also bears relevance for pronouns proper: in the course of categorizing pronouns into weak vs. strong classes (cf. Cardinaletti, Starke 1999), Wiltschko (1998) argues that personal pronouns are deep anaphoric (by lacking an elided NP), whereas demonstrative pronouns are surface anaphoric (and contain an elided NP). We present new empirical evidence and argue that a distinction between deep anaphoric vs. surface anaphoric pronouns must be rejected, at least in the case of personal vs. demonstrative pronouns, and that the observed
Cogn Process (2014) 15 (Suppl 1):S1–S158 differences between these classes can be deduced at the level of pragmatics, employing economy principles in the spirit of Cardinaletti, Starke (1999) and Schlenker (2005). References Cardinaletti A, Michal S (1999) The typology of structural deficiency: a case study of three classes of pronouns. In Henk van Riemsdijk (ed) Clitics in the languages of Europe. Mouton, Berlin, pp 145–233 Hankamer J, Ivan S (1976) Deep and surface anaphora. Linguist Inq 7:391–426 Ivan S, Hankamer J (1984) Toward a theory of anaphoric processing. Linguist Philos 7:325–345 Schlenker P (2005) Minimize restrictors! (Notes on definite descriptions, condition C and epithets. In Proceedings of Sinn und Bedeutung 2004, pp 385–416 Wiltschko M (1998) On the syntax and semantics of (relative) pronouns and determiners. J Comper German Linguisti 2:143–181
Comparing presuppositions and scalar implicatures Jacopo Romoli University of Ulster, UK In a series of experiments sentences were used containing a presupposition that was either compatible or incompatible to a context sentence. This was compared to a sentence in a context containing either a compatible or incompatible scalar implicature. The talk will draw some conclusions on the cognitive cost of presuppositions in relation to the putative cost of scalar implicatures.
The time course of referential resolution Petra Schumacher University of Mainz, Germany Referential expressions are essential ingredients for speaker-hearer interactions. During reference resolution incoming information must be linked with prior context and also serves information progression. Speakers use different referential forms and other means of information packaging (e.g., linear order, prosody) to convey additional meaning aspects. Using event-related brain potentials, we can investigate the time course of reference resolution and examine how comprehenders exploit multiple cues during the construction of a mental representation. In this talk, I present data that indicate that reference resolution is guided by two core mechanisms associated with i) referential accessibility and expectation (N400) and ii) accommodation and mental model updating (Late Positivity).
COGNITION OF HUMAN ACTIONS: FROM INDIVIDUAL ACTIONS TO INTERACTIONS Convenor: Stephan de la Rosa Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany Previous research has focused on the perceptual cognitive processes involved in the execution and observation of individual actions such as a person walking. Only more recently research started to investigate the perceptual-cognitive processes involved in the interaction of two or more people. This symposium provides an interdisciplinary
S11 view regarding the relationship between individual actions and interactions. It will provide new insights from several research fields including decision making, neuroscience, philosophy of neuroscience, computational neuroscience, and psychology. The aim of the symposium is give a state of the art overview about commonalities and differences of the perceptual cognitive processes underlying individual actions and social interactions.
Signaling games in sensorimotor interactions Daniel Braun Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany In our everyday lives, humans not only signal their intentions through verbal communication, but also through body movements, for instance when doing sports to inform team mates about one’s own intended actions or to feint members of an opposing team. Here, we study such sensorimotor signaling in order to investigate how communication emerges and on what variables it depends on. In our setup, there are two players with different aims that have partial control in a joint motor task and where one of the two players possesses private information the other player would like to know about. The question then is under what conditions this private information is shared through a signaling process. We manipulated the critical variables given by the costs of signaling and the uncertainty of the ignorant player. We found that the dependency of both players’ strategies on these variables can be modeled successfully by a game-theoretic analysis.
Perceptual cognitive processes underlying the recognition of individual and interactive actions Stephan de la Rosa Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany Humans are social beings whose physical interactions with other people require rapid recognition of the other person actions, for example when shaking hands. Previous research has investigated the perceptual cognitive processes involved in action recognition using open loop experiments. In these experiments participants passively view actions during recognition. These studies identified several important bottom-up mechanisms in action recognition. However, in daily life participants often recognize action for or during action production. In order to fully understand action recognition under more realistic conditions, we examined visual action perception in classical open-loop (participants observe actions), semi-closed (participants interact with an avatar which carries out prerecorded actions), and closed loop experiments (two participants interact naturally with each other using feedback loops). Our results demonstrate the importance of considering high level factors that are under top-down control in action recognition.
Neural theory for the visual processing of goal-directed actions Martin. A. Giese Section for Computational Sensomotorics, Dept. for Cognitive Neurology, HIH and CIN, University Clinic Tu¨bingen, Germany The visual recognition of biological movements and actions is an important visual function that involves computational processes that link neural representations for action perception and execution.
123
S12 This fact has made this topic highly attractive for researchers in cognitive neuroscience, and a broad spectrum of partially highly speculative theories have been proposed about the computational processes that might underlie action vision in primate cortex. In spite of this very active discussion about hypothetical computational and conceptual theories, our detailed knowledge about the underlying neural processes is quite limited, and a broad spectrum of critical experiments that narrow down the relevant computational key steps remain yet to be done. I will present a physiologically-inspired neural theory for the processing of goal-directed actions, which provides a unifying account for existing neurophysiological results on the visual recognition of hand actions in monkey cortex. At the same time, the model accounts for several new experimental results, where a part of these experiments were motivated by testing aspects of the proposed neural theory. Importantly, the present model accounts for many basic properties of cortical actionselective neurons by simple physiologically plausible mechanisms that are known from visual shape and motion processing, without necessitating a central computational role of motor representations. The same model also provides an account for experiments on the visual perception of ‘causality’, suggesting that simple forms of causality perception might be a side-effect of computational processes that mainly subserve the recognition of goal-directed actions. Extensions of the model might provide a basis for the investigation of the neurodynamic phenomena in the visual processing of action stimuli. Acknowledgments Research supported by the EC FP7 projects AMARSi, Koroibot, ABC, and Human Brain Project, and by the BMBF and the DFG.
From individual to joint action: representational commonalities and differences Hong Yo Wong CIN, University of Tu¨bingen, Germany To what extent do the structures underpinning individual action differ from those underpinning joint action? What are the representational commonalities and differences between individual and joint action? Can an individual account of planning intentions be extended to cover the case of joint action (as suggested by Bratman)? What is the phenomenology of acting together? Is an adequacy condition on a theory of action that it must account for the action of an arbitrary number of agents (as suggested by Butterfill)? This talk will approach these questions from the point of view of the philosophy of action. We will draw on recent empirical studies on joint action to reflect on prominent philosophical accounts of joint action, using this as an opportunity to reflect on the significance of a philosophy of action for the science of action (and vice versa).
Neural mechanisms of observing and interacting with others Kai Vogeley University Hospital Cologne, Germany Over the last decade, cognitive neuroscience has started to systematically study the neural mechanisms of social cognition or
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 social information processing. Essentially, two different neural systems have been established in this research domain that appear to constitute two different routes of processing underlying our social cognitive capacities in everyday social encounters, namely the socalled ‘‘mirror neuron system’’ (MNS) and the ‘‘social neural network’’ (SNN, also theory of mind network or mentalizing network). The functional roles of both systems appear to be complementary. The MNS serves comparatively ‘‘early’’ stages of social information processing that are more related to spatial or bodily signals expressed in the behaviour of others and supports the ‘‘detection’’ of potential social salience, including observation of other persons actions. Complementary to the functional role of the MNS, the SNN serves comparatively ‘‘late’’ stages of social information processing that are more related to the ‘‘evaluation’’ of emotional and psychological states of others that have to be inferred as inner mental experience from the behaviour of this person. Empirical studies on the neural mechanisms of ongoing social interactions with others show that essentially SNN components are recruited during the experience of social encounters together with the reward system of the brain.
CORTICAL SYSTEMS OF OBJECT GRASPING AND MANIPULATION Convenor: Marc Himmelbach Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, Centre for Integrative Neuroscience, University of Tu¨bingen, Germany Reaching for objects, grasping them, and finally using or manipulating these objects are typical human capabilities. Although several non-human species are able to do these things, the anatomical adaptation of our hands for an extraordinarily precise and flexible use in the interaction with an infinite number of different target objects makes humans unique among the vertebrate species. The unique anatomy of our hands is matched by a cortical sensorimotor control system connecting multiple areas in the frontal and parietal lobes of the human cortex, which underwent a considerable enlargement across the primate species. Although our hands by themselves, their flexible and precise use, and the capacities of our cortical hand motor systems already distinguish us from all other species, the use of objects as tools to act on further objects and thereby mediate and transform our actions, makes us truly human. Although various nonhuman species use tools in some situations, the versatility of human tool use is totally unrivalled. Neuropsychological and neuroimaging research showed that dedicated cortical tool use systems overlap partially with the arm/hand sensorimotor systems but include additional frontal, parietal, and temporal cortical structures. While most of the structures that seem to be relevant for tool use beyond the armhand sensorimotor system have been identified, we are still missing a satisfactory description of their individual functional contributions. Across the whole range from simple grasping to the use of objects as tools on other objects, investigations of interdependencies and interactions between these cortical system components are still at the beginning. The speakers of this symposium together cover the range from simple grasping to tool use and will present their current behavioral, neuropsychological, and neuroimaging findings that further specify the functional description of the human object grasping and manipulation systems.
Cogn Process (2014) 15 (Suppl 1):S1–S158
Influences of action characteristics and hand used on the neural correlates of planning and executing object manipulations Joachim Hermsdo¨rfer1, Marie-Luise Brandi1,2, Christian Sorg2, Georg Goldenberg3, Afra Wohlschla¨ger2 1 Department of Sport and Movement Science, Technical University Munich, Germany; 2 Department of Neurology, Technical University Munich, Germany; 3 Department of Neuropsychology, Bogenhausen Hospital, Germany Studies using functional magnetic resonance imaging (fMRI) techniques have revealed a wide-spread neural network active during the naming or imagination of tool action as well as during pantomimes of tool use. Actual tool has however only rarely been investigated due to methodological problems. We have constructed a ‘tool carousel’ to enable the controlled and quick presentation and use of a variety of everyday tools and corresponding recipients, while restricting body movements to lower arm and hand. In our paradigm we compared the use of tools as well as the goal-directed manipulation of neutral objects with simple transportation. We tested both hands in 17 right-handed healthy subjects. An action network including parietal, temporal as well as frontal areas was found. Irrespectively of the exact characteristics of the action, planning was strongly lateralized to the left brain and involved similar areas, which remained active during actual task execution. Handling a tool versus a neutral bar and using an object versus simple transportation strengthens the lateralization of the action network towards the left brain. The results support the assumption that a dorso-dorsal stream is involved in the online manipulation of objects according to orientation and structure independent of object knowledge. Regions of a ventral-dorsal pathway process and code the specific knowledge of how a common tool is used. Temporal-ventral areas identify objects and may code semantic tool information. Use of the left-hand leads to a larger recruitment of action areas, possibly to compensate for the lack of routine and automatism when using the non-dominant hand.
S13
Effects of object recognition on grasping Marc Himmelbach Division of Neuropsychology, Hertie-Institute for Clinical Brain Research, Centre for Integrative Neuroscience, University of Tu¨bingen, Germany Grasping a manipulable object requires action programming and object recognition, two processes that were supposed to be anatomically segregated in a dorsal and a ventral visual subsystem. Our studies investigated interactions between these proposed subsystems studying the influence of familiar everyday objects on grasp programming and its cortical representation in humans. Our behavioral studies revealed an effect of learned identity-size associations on reach-to-grasp movements under binocular viewing conditions, counteracting veridical binocular depth and size information. This effect of object recognition on grasp programming was further supported by differences in the scaling of the maximum grip aperture between grasping featureless cuboids and grasping recognizable everyday objects in healthy humans. A subsequent fMRI experiment showed that during grasping everyday objects relative to grasping featureless cuboids BOLD signal levels were not only increased at the lateral occipital cortex but also at the anterior intraparietal sulcus, suggesting that objectidentity information is represented in the dorsal subsystem. Measuring reach-to-grasp kinematics in two patients with lateral occipito-temporal brain damage we observed significant behavioral deficits in comparison to a large healthy control group, suggesting a causal link between visual processing in the ventral system and grasp programming. In conclusion, our work shows that the recognition of a particular object not only affects grasp planning, i.e. the selection of a broad motor plan, but also the parameterization of reach-to-grasp movements.
The representation of grasping movements in the human brain Attention is needed for action control: evidence from grasping studies Constanze Hesse School of Psychology, University of Aberdeen, UK It is well known that during movement preparation, attention is allocated to locations which are relevant for movement planning. However, until now, very little research has examined the influence of distributed attention on movement kinematics. In our experiments, we investigated whether the execution of a concurrent perceptual task that requires attentional resources interferes with movement planning (primarily mediated by the ventral stream) and/or movement control (primarily mediated by the dorsal stream) in grasping. Participants had to grasp objects of varying sizes whilst simultaneously performing a perceptual identification task. Movement kinematics and perceptual identification performance in the dual-task conditions were compared to the baseline performance in both tasks (i.e. performance levels in the absence of a secondary task). Furthermore, movement kinematics were measured continuously such that interference effects could also be detected at early stages of the movement. Our results indicate that both movement planning (as indicated by prolonged reaction times) as well as movement control (as indicated by a delayed adjustment of the grip to the object’s size) are altered when attention has to be shared between a grasping task and a perceptual task. These findings suggest that the dorsal and the ventral stream share common attentional processing resources and that even simple motor actions such as grasping are not completely automated.
Angelika Lingnau Center for Mind/Brain Sciences, Department of Psychology and Cognitive Science, University of Trento, Italy Daily life activities require skillful object manipulations. Whereas we begin to understand the neural substrates of hand prehension in monkeys at the level of single cell spiking activity, we still have a limited understanding of the representation of grasping movements in the human brain. With recent advances in human neuroimaging, such as functional magnetic resonance imaging (fMRI) repetition suppression (fMRI-RS) and multi-variate pattern (MVP) analysis, it has become possible to characterize some of the properties represented in different parts of the human prehension system. In this talk, I will present several studies using fMRI-RS and MVP analysis that investigated the representation of reach direction, wrist orientation, grip type and effector (left/right hand) of simple non-visually guided reach-to-grasp movements. We observed a preference for reach direction along the dorsomedial pathway, and overlapping representations for reach direction and grip type along the dorsolateral pathway, in line with a growing literature that casts doubts on a clear-cut distinction between separate pathways for the reach and grasp component. Moreover, we were able to distinguish between premotor areas sensitive to grip type, wrist orientation and effector, and parietal areas that are sensitive to grip type across wrist orientation and grip type. Our results support the view of a hierarchical representation of movements within the prehension system.
123
S14
Avoiding obstacles without a ventral visual stream Thomas Schenk Department of Neurology, University of Erlangen-Nuremberg, Germany When reaching for a target it is important to avoid knocking over objects that stand in the way. We do this without thinking about it. Experiments in a hemiblind patient demonstrate that obstacles that are not perceived can be avoided. To account for such dissociations the two visual-streams model suggests that perception is handled in the ventral visual stream while visually-guided action depends on visual input from the dorsal stream. The model also assumes that the dorsal stream cannot store visual information. Consequently it is predicted that patients with dorsal stream damage will fail in the obstacleavoidance task, but succeed when a short delay is introduced between obstacle presentation and response onset. This has been confirmed in patients with optic ataxia. In contrast ventral stream damage should allow normal obstacle avoidance but destroy the patient’s ability to avoid obstacles in a delayed condition. We tested these predictions in DF. As expected we found that she can avoid obstacles in the standard condition. More surprisingly she is equally good in the delayed condition and a subtle change in the standard condition is sufficient to impair her obstacle-avoidance skills. The implications of these findings for the control of reaching will be discussed.
Action and semantic object knowledge are processed in separate but interacting streams: evidence from fMRI and dynamic causal modelling Peter H. Weiss-Blankenhorn Department of Neurology, University Hospital Cologne, Germany & Cognitive Neuroscience, Institute of Neuroscience & Medicine (INM3), Research Centre Ju¨lich, Germany While manipulation knowledge is differentially impaired in patients suffering from apraxia, function knowledge about objects is selectively impaired in patients with semantic dementia. These clinical observations fuelled the debate whether manipulation and function knowledge about objects rely on differential neural substrates, as the processing of function knowledge may be based on either the action or the semantic system. By using new experimental tasks and effective connectivity analysis, fMRI studies can contribute to this debate. Behavioral data revealed that functional object knowledge (= motor-related semantic knowledge) and (non-motor) semantic object knowledge are processed similarly, while processing manipulation-related action knowledge took longer. For the manipulation task compared to the two (motor and non-motor) semantic tasks, a general linear model analysis revealed activations in the bilateral extra-striate body area and the left intra-parietal sulcus. The reverse contrast led to activations in the fusiform gyrus and inferior parietal lobe bilaterally as well as in the medial prefrontal cortex. Effective connectivity analysis demonstrated that action and semantic knowledge about objects are processed along two separate, but interacting processing streams with the inferior parietal lobe mediating the exchange of information between these streams.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158
EYE TRACKING, LINKING HYPOTHESES AND MEASURES IN LANGUAGE PROCESSING Convenors: Pia Knoeferle, Michele Burigo Bielefeld University, Germany The present symposium focuses on a core topic of eye tracking in language processing, viz. linking hypotheses (the attributive relationship between eye movements and cognitive processes). One central topic will be eye-tracking measures and their associated linking hypotheses in both language comprehension and production. The symposium will discuss both new and established gaze measures and their linking assumptions, as well as ambiguity in our linking assumptions and how we could begin to address this issue.
Conditional analyses of eye movements Michele Burigo, Pia Knoeferle Bielefeld University, Germany In spoken language comprehension fixations guided by the verbal input have been interpreted as reflecting a referential link between words and corresponding objects (Tanenhaus, Spivey-Knowlton, Eberhard, Sedivy 1995). However, they cannot reveal other aspects of how comprehenders interrogate a scene (e.g., attention shifts from one object to another). Inspections, on the other hand, are, by definition, a good reflection of attentional shifts much like saccades (see Altmann, Kamide 2004 for related discussion). One domain where attentional shifts and their direction are informative is spatial language (e.g., ‘the plant is above the clock). Some models predict that objects are inspected as they are mentioned while others predict that attention must shift from a later-mentioned object (the clock) to the earlier mentioned located object (the plant). To assess these model predictions, we examined in particular the directionality of attention shifts via conditional analyzes. We ask where people look next, as they hear the spatial preposition and after they have made one inspection to the clock. Will they continue to inspect the clock or do they shift attention back to the plant? Three eye tracking experiments were used to investigate the directionality of attention shifts during spatial language processing. The results from these conditional analyzes revealed, for the first time, the overt attentional shifts from the reference object (the clock) to the located object (the plant) in sentences such as ‘The plant is above the clock’). In addition conditional analyzes of inspections may provide a useful approach for further refining the linking hypotheses between eye movements and cognitive processes (Fig. 1). References Altmann GTM (1999) Thematic role assignment in context. J Memory Lang 41:124–145 Regier T, Carlson L (2001) Grounding spatial language in perception: an empirical and computational investigation. J Exp Psychol Gen 130:273–298 Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JE (1995) Integration of visual and linguistic information in spoken language comprehension. Science 268:1632–1634
Cogn Process (2014) 15 (Suppl 1):S1–S158
Fig. 1 ‘The plant is above the clock’. The 5 9 6 grid was used to define the objects locations and was invisible to participants
Rapid small changes in pupil size index processing difficulty: the index of cognitive activity in reading, visual world, and dual task paradigms Vera Demberg Saarland University, Saarbru¨cken, Germany The size of the pupil has long been known to reflect arousal (Hess, Polt 1960) and cognitive load in a variety of different tasks such as arithmetic problems (Hess, Polt 1964), digit recall (Kahneman, Beatty 1966), attention (Beatty 1982) as well as language complexity (Schluroff 1982; Just, Carpenter 1993; Hyo¨na¨ et al. 1995; Zellin et al. 2011; Frank, Thompson 2012), grammatical violations (Gutirrez, Shapiro 2010) and context integration effects (Engelhardt et al. 2010). All of these studies have looked at the macro-level effect of the overall dilation of the pupil as response to a stimulus. Recently, a micro-level measure of pupil dilation has been proposed, called the ‘‘Index of Cognitive Activity’’ or ICA (Marshall 2000, 2002, 2007), which does not relate processing load to the overall changes in size of the pupil, but instead counts the frequency of rapid small dilations, which are usually discarded as pupillary hippus (Beatty, LuceroWagoner 2000). Some aspects which make the ICA particularly interesting as a measure of cognitive load are that the ICA a) is less sensitive to changes in ambient light and fixation position b) is more dynamic, which makes it easier to separate the effect of stimuli in close sequence and c) is faster than overall pupil size, i.e., it can usually be measured in the time window of 300–1,200 ms after stimulus. If it reliably reflects (linguistic) processing load, the ICA could hence constitute a useful new method to assess processing load using an eye-tracker, in auditory experiments, visual world experiments, as well as in naturalistic environments which are not well suited for the use of EEG, e.g. while driving a car, and could therefore usefully complement the range of experimental paradigms currently used. In this talk I will report experimental results on the index of cognitive activity (ICA) in a range of reading experiments, auditory language plus driving experiments as well as a visual world experiment, which all indicate that the ICA is a useful index of linguistic processing difficulty. References Beatty J (1982) Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychol Bull 91(2):276
S15 Beatty J, Lucero-Wagoner B (2000) The pupillary system. Cambridge University Press, Cambridge Engelhardt PE, Ferreira F, Patsenko EG (2010) Pupillometry reveals processing load during spoken language comprehension. Quart J Exp Psychol 63:639–645 Frank S, Thompson R (2012) Early effects of word surprisal on pupil size during reading. In: Miyake N, Peebles D, Cooper RP (eds) Proceedings of 34th annual conference cognitive science society, pp 1554–1559 Gutirrez RS, Shapiro LP (2010) Measuring the time-course of sentence processing with pupillometry. In: CUNY conference on human sentence processing Hess E, Polt J (1960) Pupil size as related to interest value of visual stimuli. Science Hess E, Polt J (1964) Pupil size in relation to mental activity during simple problem-solving. Science Hyo¨na¨ J, Tommola J, Alaja A (1995) Pupil dilation as a measure of processing load in simultaneous interpretation and other language tasks. Quart J Exp Psychol 48(3):598–612 Just MA, Carpenter PA (1993) The intensity dimension of thought: pupillometric indices of sentence processing. Can J Exp Psychol 47(2) Kahneman D, Beatty J (1966) Pupil diameter and load on memory. Science Marshall S (2000) US patent no. 6,090,051 Marshall S (2002) The index of cognitive activity: Measuring cognitive work-load. In: Proceedings of 7th conference on human factors and power plants, IEEE, pp 5–7 Marshall S (2007) Identifying cognitive state from eye metrics. Aviat Space Environ Med 78(Supplement 1):B165–B175 Schluroff M (1982) Pupil responses to grammatical complexity of sentences. Brain Lang 17(1):133–145 Zellin M, Pannekamp A, Toepel U, der Meer E (2011) In the eye of the listener: pupil dilation elucidates discourse processing. Int J Psychophysiol
Measures in sentence processing: eye tracking and pupillometry Paul E. Engelhardt1, Leigh B. Fernandez2 1 University of East Anglia, UK; 2 University of Potsdam, Germany In this talk, we will present data from two studies that measured pupil diameter as participants heard temporarily ambiguous sentences. In the first study, we examined visual context. Tanenhaus et al. (1995) found that in the context of a relevant ‘‘visual world’’ containing an apple on a towel, an empty towel, and a box, listeners will often incorrectly parse an instruction, such as put the apple on the towel in the box. The misinterpretation is that the apple must be moved on to the empty towel, and thus, the primary dependent measure is rate of saccadic eye movements launched to the empty towel. Eye movements to the empty towel do not occur when the ‘‘visual world’’ contains more than one apple (Ferreira et al. 1995). In the first study, we examined the role that visual context plays on the processing effort associated with gardenpath sentence processing (see example A). Pupil diameter was measured from the key (disambiguating) word in the sentence (e.g. played). Our main hypothesis was that relevant visual context (e.g. a picture of a woman dressing herself) would be associated with reduced processing effort (i.e. no increase in pupil size). In contrast, when the visual context supported the garden-path misinterpretation (e.g. a picture of a woman dressing a baby) pupil diameter would reliably increase.2 Results were consistent with both predictions. 2
The prosodic boundary between clauses was also manipulated.
123
S16 A. While the woman dressed (#) the baby that was cute and cuddly played on the floor. B. The superintendent learned [which schools/students] the proposal [that expanded/to expand] upon the curriculum would motivate ____ during the following semester.3 In the second study, we examined a special type of filler gap dependency, called parasitic gap constructions. Filler gap dependencies occur when a constituent within a sentence has undergone movement (e.g. Whati did the boy buy ti?). In this sentence, what has moved from its canonical position as the object of buy, and thus, the parser must be able to keep track of moved constituents and correctly associate them with the correct verbs (or gap sites). Difficulty arises when (1) there are multiple verbs in the sentence, and (2) when those verbs are optionally transitive (i.e. have the option to take a direct object or not). Parasitic gaps are a special type of construction because a filler is associated with two gaps. An example is What did the attempt to fix _ ultimately damage _?. Even more interestingly, from a linguistic perspective, is that the first gap occurs in an ‘‘illegal’’ position. Phillips (2006) used a self-paced word-by-word reading paradigm to test sentences containing parasitic gap like constructions (see example B). He found slowdowns only in situations in which parasitic gap dependency was allowed (i.e. with to expand). Furthermore, reading times were influenced by plausibility (i.e. it is possible to expand schools but not students). In our second study, we used similar materials to investigate parasitic gaps using changes in pupil diameter over time as an index of processing load. Our data indicates that the parser actively forms dependencies as soon as possible, regardless of semantic fit. In summary, this talk will compare and contrast findings from eye tracking and reading times with pupil diameter. Both of our studies showed similarities to the original works, but at the same time, also showed novel dissociations. The relationship between discrete measures, such as saccadic eye movements, and continuous measures, such as pupil diameter and mouse tracking, will be discussed. References Ferreira F, Henderson JM, Singer M (1995) Reading and language processing: Similarities and differences. In Henderson JM, Singer M, Ferreira F (Eds) Reading and language processing. Erlbaum, Hillsdale, pp 338–341 Phillips C (2006) The real-time status of island phenomena. Language 795–823 Tanenhaus MK, Spivey-Knowlton MJ, Eberhard KM, Sedivy JC (1995) Integration of visual and linguistic information in spoken language comprehension. Science 268(5217):1632–1634
Improving linking hypotheses in visually situated language processing: combining eye movements and event-related brain potentials
Cogn Process (2014) 15 (Suppl 1):S1–S158 movements and ERPs). This naturally limits the conclusions we can draw from this research with regard to language comprehension (theory). In further refining our theories of sentence comprehension, better linking hypotheses would thus be an essential step. The present contribution argues that combining eye-tracking and event-related brain potentials would improve the interpretation of these two individual measures, the associated linking hypotheses, and correspondingly insights into situated language comprehension processes (Knoeferle, in press). Acknowledgments This research was funded by the Cognitive Interaction Technology Excellence Center (DFG). References Knoeferle P (in press) Cognitive Neuroscience of Natural Language Use, Cambridge University Press, Cambridge, chap Language comprehension in rich non-linguistic contexts: combining eye tracking and event-related brain potentials
Oculomotor measurements of abstract and concrete cognitive processes Andriy Myachykov Northumbria University, Newcastle-upon-Tyne, UK Analysis of oculomotor behavior has long been used as a window into the cognitive processes underlying human behavior. Eye tracking allows recording of highly accurate categorical and chronometric data, which provides experimental evidence about various aspects of human cognition including, but not limited to, retrieval and activation of information in memory and allocation and distribution of visual attention. As a very diverse and accurate experimental tool, eye tracking has been used for the analysis of low-level perceptual processes as well as for the investigation of higher cognitive processes including mental arithmetic, language, and communication. One example of the latter is research using ‘‘visual world’’ paradigm, which uses eye movements of language users (listeners and speakers) in order to understand cognitive processes underlying human linguistic communication. In the first part of my talk, I will offer a broad overview of eye tracking methodology with a specific focus on measurements and their evidential value in different cognitive domains. The second part will discuss results of a number of recent eye-tracking studies on sentence production and comprehension as well as number processing.
MANUAL ACTION Pia Knoeferle Bielefeld University, Germany Listeners’ eye movements to objects in response to auditory verbal input, as well as their event-related brain potentials (ERPs) have revealed that non-linguistic cues contribute rapidly towards real-time language comprehension. While the findings from these two measures have contributed important insights into context effects during real-time language comprehension, there is also considerable ambiguity in the linking between comprehension processes and each of these two measures (eye 3
The critical items were taken from Phillips (2006) and were simplified for auditory presentation.
123
Convenor: Dirk Koester Bielefeld University, Germany The hand is one of our most important tools for interacting with the environment, both physically and socially. Manual actions and the associated processes of motor control, both sensorimotor and cognitive, have received much attention. This research strand has a focus on the complexity of movement details (e.g. kinematics, dynamics or degrees of freedom). At the same time, in a seemingly different research field, manual actions have been scrutinized for their communicative goals or functions, so-called co-speech gestures. Here, a focus is on what kind of information is supported by such actions; whether meaning is conveyed but also synchronization, akin to
Cogn Process (2014) 15 (Suppl 1):S1–S158 kinematics, is currently under investigation. A tight functional interrelation between manual action control and language has long been proposed (e.g. Steklis, Harnad 1976). Not only hand movements are relevant and have to be controlled, also the environmental context (i.e., the situation) has to be taken into account in order to fully understand manual actions. Furthermore, technical advances permit also the deeper investigation of the neural basis, in addition to the cognitive basis, of (manual) action control. Regarding other cognitive domains, recent evidence points towards a tight functional interaction of grasping with other cognitive domains such as working memory or attention (Spiegel et al. 2013; Logan, Fischman 2011). What’s more, manual actions may be functional for abstract cognitive processing, e.g., numerical reasons (as suggested by the phenomenon of finger counting). In this symposium we will bring together latest research that explores the manifold functions and purposes of manual actions such as exploring and manipulating objects, the development of such control processes for grasping and the changes associated with aging. Different models of action control will be presented and evaluated. Also, evidence for the role of manual gestures in interacting and communicating with other people will be presented. That is, not only the (physical) effects of manual actions in the environment will be discussed but also the interpretation of gestures, i.e., communicative goals will be debated. The symposium will shed light on new concepts of and approaches to understanding the control of manual actions and their functions in a social and interactive world. References Logan SW, Fischman MG (2011) The relationship between end-state comfort effects and memory performance in serial and free recall. Acta Psychol 137:292–299 Spiegel MA, Koester D, Schack T (2013) The functional role of working memory in the (re-)planning and execution of grasping movements. J Exp Psychol Human Percept Performance 39:1326–1339 Steklis HD, Harnad SR (1976) From hand to mouth: Some critical stages in the evolution of language. Annal N Y Acad Sci 280(1):445–455
The Bremen-Hand-Study@Jacobs: effects of age and expertise on manual dexterity Ben Godde, Claudia Voelcker-Rehage Jacobs Center on Lifelong Learning and Institutional Development, Jacobs University, Bremen, Germany A decline in manual dexterity is common in older adults and has been demonstrated to account for much of the observed impairment in everyday tasks, like pouring milk into a cup, preparing meals, or retrieving coins from a purse. Aiming at the understanding of the underlying mechanisms, the investigation of the regulation and coordination of isometric fingertip forces has been given lot of attention during the last decades. Also tactile sensitivity is increasingly impaired with older age and deficits in tactile sensitivity and perception and therefore in sensorimotor feedback loops play an important role for age-related decline in manual dexterity. Within the Bremen-Hand-Study@Jacobs our main focus was on the question of how age and expertise influence manual dexterity during middle adulthood. In particular, we were interested in the capacity of older employees to enhance their fine motor performance through practice. To reach this goal, we investigated basic mechanisms responsible for age-related changes in precision grip control and tactile performance as well as learning capacities (plasticity) in different age and expertise groups on a behavioral and neurophysiological (EEG) level.
S17 Our results confirmed a decline in basic components of manual dexterity, finger force control and tactile perception, with increasing age, even already during middle adulthood. Also age-related changes in underlying neurophysiological correlates could be observed in middle-aged adults. Performing manual tasks on a comparable level to younger adults required more frontal (i.e. cognitive) brain resources in older workers indicating compensatory plasticity. Furthermore, in both the motor and tactile domain expertise seemed to counteract age-related decline and to postpone age effects for about 10 years. Although older adults generally performed at a lower baseline performance level, they were able to improve motor and tactile functioning by short term practice or stimulation interventions. Particularly in the tactile domain such an intervention was well suited to attenuate age-related decline. Overall, our data suggest that the aging process of manual dexterity seems to start slowly but continuously goes on during the working lifespan and can be compensated by continuous use (expertise) or targeted interventions.
Planning anticipatory actions: on the interplay between normative and mechanistic models Oliver Herbort Department of Psychology, University of Wu¨rzburg, Germany Actions frequently foreshadow subsequent actions. For example, the hand orientation used to grasp an object depends on the intended object manipulation. Here, I examine whether such anticipatory grasp selections can be described purely in terms of their function or whether the planning process also has to be taken into account. To test functional accounts, three posture-based cost functions were used to predict grasp selection. As an example for a model of the planning process, I evaluated the recently proposed weighted integration of multiple biases model. This model posits that grasp selection is heuristically based on the direction of the intended object rotation as well as other factors. The models were evaluated using two empirical datasets. The datasets were from two experiments, in which participants had to grasp and rotate a dial by various angles. The models were fitted to the empirical data of individual participants using maximum likelihood estimates of the models’ free parameters. The model including the planning process provided a closer fit to the data of both experiments than the functional accounts. Thus, human actions can only be understood as the superimposition of their function and computational artifacts imposed by the limitations of the central nervous system.
Identifying linguistic and neural levels of interaction between gesture and speech during comprehension using EEG and fMRI Henning Holle Department of Psychology, University of Hull, UK Conversational gestures are hand movements that co-occur with speech but do not appear to be consciously produced by the speaker. The role that these gestures play in communication is disputed, with some arguing that gesture adds only little information over and above what is already transmitted by speech alone. My own work has provided strong evidence for the alternative view, namely that gestures add substantial information to the comprehension process. One level at which this interaction between gesture and speech takes place seems to be semantics, as indicated by the N400 of the Event Related Potential. I will also present findings from a more recent study that
123
S18 has provided evidence for a syntactic interaction between gesture and speech (as indexed by the P600 component). Finally, fMRI studies suggest that areas associated with the detection of semantic mismatches (left inferior frontal gyrus) and audiovisual integration (left posterior temporal lobe) are crucial components of the brain network for co-speech gesture comprehension.
Neural correlates of gesture-syntax interaction
Cogn Process (2014) 15 (Suppl 1):S1–S158 activation for the surprising than for the non-surprising context in the parietal and temporal multi-modal association cortices (ACs) that are known to process context. Fronto-insular cortex (FIC) was more active for surprising actions compared to non-surprising actions. When the non-surprising action was perceived, functional connectivity between brain areas that represent action surprise and contextual surprise was enhanced. The findings suggest that the strength of the interregional neural coupling minimizes surprising sensations necessary for perception of others’ goal-directed actions and provide support for a hierarchical predictive model of brain function.
Leon Kroczek, Henning Holle, Thomas Gunter Max-Planck-Institute for Human Cognitive and Brain Sciences, Leipzig, Germany In a communicative situation, gestures are an important source of information which also impact speech processing. Gesture can for instance help when speech perception is troubled by noise (Obermeier et al. 2012) or when speech is ambiguous (Holle et al. 2007). Recently, we have shown that not only meaning, but also structural information (syntax) used during language comprehension is influenced by gestures (Holle et al. 2012). Beat gestures, which highlight particular words in a sentence, seem to be able to disambiguate sentences that are temporarily ambiguous with respect to their syntactic structure. Here we explored the underlying neural substrates of the gesture-syntax interaction with fMRI using similar ambiguous sentence material as Holle et al. (2012). Participants were presented with two types of sentence structures which were either easy (SubjectObject-Verb) or more difficult (Object-Subject-Verb) in their syntactic complexity. A beat gesture was shown either at the first or the second noun phrase (NP). Activations related to syntactic complexity were primarily lateralized to the left (IFG, pre-SMA, pre-central gyrus, and MTG) and bilateral for the Insula. A ROI-based analysis showed interactions of syntax and gesture in the left MTG, left preSMA, and in the bilateral Insula activations. The pattern of the interaction suggests that a beat on NP1 facilitates the easy SOV structure and inhibits the more difficult OSV structure and vice versa for a beat on NP2. Because the IFG was unaffected by beat gestures it seems to play an independent/isolated role in syntax processing.
Interregional connectivity minimizes surprise responses during action perception Sasha Ondobaka, Marco Wittmann, Floris P de Lange, Harold Bekkering Donders Institute for Brain, Cognition and Behavior, Radboud University Nijmegen, Netherlands The perception of other individuals’ goal-directed actions requires the ability to process the observed bodily movements and the surrounding environmental context at the same time. Both action and contextual processing have been studied extensively (Iacoboni et al. 2005; Shmuelof and Zohary 2005; Bar et al. 2008), yet, the neural mechanisms that integrate action and contextual surprise remain elusive. The predictive account describes action perception in terms of a hierarchical inference mechanism which generates prior predictions to minimize surprise associated with incoming action and contextual sensory input (Friston et al. 2011; Koster-Hale and Saxe 2013). Here, we used functional neuroimaging to establish which brain circuits represent action and contextual surprise and to examine the neural mechanisms that are responsible for minimizing surprise-related responses (Friston 2005). Participants judged whether an action was surprising or non-surprising dependent on the context in which the action took place. They first viewed a surprising or non-surprising context, followed by a grasping action. The results showed greater
123
The development of cognitive and motor planning skills in young children Kathrin Wunsch1, Roland Pfister2, Anne Henning3,4, Gisa Aschersleben4, Matthias Weigelt1 1 Department of Sport and Health, University of Paderborn, Germany; 2 Department of Psychology, University of Wu¨rzburg, Germany; 3 Developmental Psychology, University of Health Sciences Gera, Germany; 4 Department of Psychology, Saarland University, Germany The end-state comfort (ESC) effect signifies the tendency to avoid uncomfortable postures at the end of goal-directed movements and can be reliably observed during object manipulation in adults, but only little is known about its development in children. Therefore, the present study investigated the development of anticipatory planning skills in children and its interdependencies with the development of executive functions. Two hundred and seventeen participants in 9 age groups (3-, 4-, 5-, 6-, 7-, 8-, 9-, 10-year-olds, and adults) were tested in three different end-state comfort tasks and three tasks to assess executive functioning (Tower of Hanoi, Mosaic, and the D2 attention endurance task). Regression analysis revealed a robust developmental trend for each individual end-state comfort task across all age groups (all p \ .01). Somewhat surprisingly, there was no indication of generalization across these tasks, as correlations between the three motor tasks failed to reach significance for all age groups (p [ .05). Furthermore, we did not observe any systematic correlation between performance in the end-state comfort tasks and the level of executive functioning. Accordingly, anticipatory planning develops with age, but the impact of executive functions on this development seems to be rather limited. Moreover, motor planning does not seem to be a holistic construct, as the performance in the three different tasks was not correlated. Further research is needed to investigate the interdependencies of sensory-motor skill development with other cognitive abilities.
PREDICTIVE PROCESSING: PHILOSOPHICAL AND NEUROSCIENTIFIC PERSPECTIVES Convenor: Alex Morgan CIN, University of Tu¨bingen, Germany The idea that the brain makes fallible inferences and predictions in order to get by in a world of uncertainty is of considerable vintage, but it is now beginning to achieve maturity due to the development of a range of rigorous theoretical tools rooted in Bayesian statistics that are increasingly being used to explain various aspects of the brain’s structure and function. The emerging ‘Bayesian brain’ approach in neuroscience introduces novel ways of conceptualizing perception, cognition, and action. It also arguably involves novel forms of
Cogn Process (2014) 15 (Suppl 1):S1–S158
S19
neuroscientific explanation, such as an emphasis on statistical optimality. The science is moving rapidly, but philosophers are attempting to keep up, in order to understand how these recent developments might shed light on their traditional concerns about the nature of mind and agency, as well as concerns about the norms of psychological explanation. The purpose of this symposium is to bring together leading neuroscientists and philosophers to discuss how the Bayesian brain approach might reshape our understanding of the mind-brain, as well as our understanding of mind-brain science.
this principle; such as hierarchical message passing in the brain and the perceptual inference that ensues. I hope to illustrate the ensuing brain-like dynamics using models of bird songs that are based on autonomous dynamics. This provides a nice example of how dynamics can be exploited by the brain to represent and predict the sensorium that is—in many instances—generated by ourselves. I hope to conclude with an illustration that illustrates the tight relationship between pragmatics of communication and active inference about the behavior of self and others.
Bayesian cognitive science, unification, and explanation
Learning sensory predictions for perception and action
Matteo Colombo Tilburg Center for Logic and Philosophy of Science, Tilburg University, Netherlands
Axel Lindner Hertie Institute for Clinical Brain Research, University of Tu¨bingen, Germany
It is often claimed that the greatest value of the Bayesian framework in cognitive science consists in its unifying power. Several Bayesian cognitive scientists assume that unification is obviously linked to explanatory power. But this link is not obvious, as unification in science is a heterogeneous notion, which may have little to do with explanation. While a crucial feature of most adequate explanations in cognitive science is that they reveal aspects of the causal mechanism that produces the phenomenon to be explained, the kind of unification afforded by the Bayesian framework to cognitive science does not necessarily reveal aspects of a mechanism. Bayesian unification, nonetheless, can place fruitful constraints on causal-mechanical explanation.
Perception and action are not only informed by incoming sensory information but, also, by predictions about upcoming sensory events. Such sensory predictions allow, for instance, to perceptually distinguish self- from externally- produced sensations: by comparing actionbased predictions with the actual sensory input, the sensory component that is produced by one’s own actions can be isolated (attenuated etc.). Likewise, action-based sensory predictions allow the motor system to react more rapidly to predictable events and, thus, to be less dependent on delayed sensory feedback. I will demonstrate that the cerebellum, a structure intimately linked to plasticity within the motor domain, accounts for learning action- based sensory predictions on a short time scale. I will further show that this plasticity is not solely related to the motor domain—it also influences the way we perceptually interpret the sensory consequences of our behavior. Specifically, I will present experiments in which we use virtual reality techniques to alter the visual direction subjects associate with their pointing movements. While we were able to change the predicted visual consequences of pointing in healthy individuals, such recalibration of a sensory prediction was dramatically comprised in patients with lesions in the Cerebellum. Extending these results on sensory predictions for selfproduced events, I will show that the cerebellum also underlies the learning of sensory predictions about external sensory events—independent of self-action. In contrast to healthy controls, cerebellar patients were significantly impaired in learning to correctly predict the re-occurrence of a moving visual target that temporarily disappeared behind an occluder. In summary, our research suggests that the cerebellum plays a domain-general role in fine-tuning predictive models – irrespective of whether sensory predictions are action-based (efference copies) or sensory-based, and irrespective of whether sensory predictions support action, perception, or both.
The explanatory heft of Bayesian models of cognition Frances Egan, Robert Matthews Department of Philosophy, Rutgers University, USA Bayesian models have had a dramatic impact on recent theorizing about cognitive processes, especially about those brain-environment processes directly implicated in perception and action. In this talk we examine critically explanatory character of these models, especially in light of so-called ‘new mechanist’ claims to the effect that these models are not genuinely explanatory, at least are little more than explanation sketches. We illustrate our points with examples drawn from both classical dynamics and cognitive ethology. We conclude with a discussion of the import of these models for the presumption, common among neuropsychologists, that commonsense folk psychological concepts such as belief and desire have an important role to play in cognitive neuroscience.
Layer resolution fMRI to investigate cortical feedback and predictive coding in the visual cortex Predictive processing and active inference Karl Friston Institute of Neurology, University College London, UK How much about our interaction with—and experience of—our world can be deduced from basic principles? This talk reviews recent attempts to understand the self-organized behavior of embodied agents, like ourselves, as satisfying basic imperatives for sustained exchanges with the environment. In brief, one simple driving force appears to explain many aspects of action and perception. This driving force is the minimization of surprise or prediction error that— in the context of perception—corresponds to Bayes-optimal predictive coding. We will look at some of the phenomena that emerge from
Lars Muckli Institute of Neuroscience and Psychology, University of Glasgow, UK David Mumford (1991) proposed a role for reciprocal topographic cortical pathways in which higher areas send abstract predictions of the world to lower cortical areas. At lower cortical areas, top-down predictions are then compared to the incoming sensory stimulation. Several questions arise within this framework: (1) do descending predictions remain abstract, or do they translate into concrete level predictions, the ‘language’ of lower visual areas? (2) how is incoming sensory information compared to top-down predictions? Are input signals subtracted from the prediction (as proposed in the predictive coding framework) or are they multiplied (as proposed by other
123
S20 models i.e. biased competition or adaptive resonance theory)? Contributing to the debate of abstract or concrete level information, we aim to investigate the information content of feedback projections with functional MRI. We have exploited a strategy in which feedforward information is occluded in parts of visual cortex: i.e. along the nonstimulated apparent motion path, behind a white square that we used to occlude natural visual scenes, or by blindfolding our subjects (Muckli, Petro 2013). By presenting visual illusions, contextual scene information or by playing sounds we were able to capture feedback signals within the occluded areas of the visual cortex. MVPA analysis of the feedback signal reveals that they are more abstract than the feedforward signal. Furthermore, using high resolution MRI we found that feedback is sent to the outer cortical layers of V1. We also show that feedback to V1 can originate from auditory information processing (Vetter, Smith, Muckli 2014). We are currently developing strategies to reveal the precision and potential functions of cortical feedback. Our results link into the emerging paradigm shift that portrays the brain as a ‘prediction machine’ (Clark 2013). References Clark A (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behav Brain Sci 36(3):181–204 Muckli L, Petro L (2013) Network interactions: non-geniculate input to V1. Curr Opin Neurobiol 23(2):195–201 Mumford D (1991) On the computational architecture of the neo cortex—the role of the thalamocortical loop Biol Cybern 65(2):135—145 Vetter S, Muckli L (2014) Decoding sound and imagery content in early visual cortex. Curr Biol 24(11):1256—1262
HOW LANGUAGE AND NUMERICAL REPRESENTATIONS CONSTITUTE MATHEMATICAL COGNITION Convenor: Hans-Christoph Nuerk University of Tu¨bingen, Germany Mathematical or numerical cognition has often been studied with little consideration of language and linguistic processes. The most basic representation, the number magnitude representation has been viewed as amodal and non-verbal. Only in the last years, the influence of linguistic processes has received again more interest in cognitive research. Now we have evidence that even the most basic tasks like magnitude comparison and parity judgment, and even the most basic representations, such as spatial representation of number magnitude, are influenced by language and linguistic processes. The symposium brings together international researchers from different fields (Linguistics, Psychology and Cognitive Neuroscience) with at least three different foci within the general symposium topic: (i) How is spatial representation of number influenced by reading and writing direction? (ii) How do number word structures of different languages influence mathematical and numerical performance? (iii) How are linguistic abilities of children and linguistic complexity of mathematical tasks related to mathematical performance? After an overview given by the organizer, the symposium starts with a presentation by Fischer and Shaki, who have shaped the research about reading and writing influences on the relation between space and number in recent years. They give an update about explicit and implicit linguistic influences on spatial-numerical cognition. Tzelgov and Zohar-Shai may partially challenge this view, because they show that related linguistic effects, namely, the Linguistic Markedness Effect, may mask seemingly observed null effects of
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 number-space relations in Hebrew. Concluding the first part, Soltanlou, Huber, and Nuerk examine how different basic numerical effects including the SNARC (spatial-numerical association of response-codes) effect are influenced by linguistic and other cultural properties. The next three talks are concerned with the question, how number word structure influences numerical and mathematical processing in children and adults. In the last years, it has been shown repeatedly that intransparent number word structures specifically interfere with mathematical performance. Schiltz, Van Rinsveld, and Ugen make use of the fact that all children in Luxemburg are taught bilingual (French, German). They are therefore able to examine the influence of different number word structures in within-participant designs. They observe linguistic influences on mathematical cognition, which, however, are mediated by a child’s proficiency in a given language. Dowker, Lloyd, and Roberts compare performance in English and Welsh language. Welsh number word structure is fully transparent (13 = Ten-three; 22 = two-tens-two) for all two-digit number words, which is commonly only found in Pacific Rim Countries. Since in Wales some children are taught in Welsh, and some in the less transparent English, the impact of the transparency of a number word system can be studied within one culture thereby avoiding confounds of language and culture common in cross-cultural studies. The results suggest that children benefit in specific numerical tasks, but not in arithmetic performance in general. Finally, Bahnmueller, Goebel, Moeller, and Nuerk used eye-tracking methodology to examine in a translingual eye-tracking study, which processes are underlying linguistic influences on numerical cognition. They show that at least for a sub-group language guides attentional processing of multi-digit Arabic numbers in a way consistent with the number word structure. In the final part of the symposium, Szucs examined the differential contribution of language-related and language-independent skills on mathematical performance. He observed that on one hand, phonological decoding skills predict mathematical performance in standardized tests, but that on the other hand, children with pure dyscalculia do not show deficits in verbal and language functions. Finally, Daroczy, Wolska, Nuerk, and Meurers used an interdisciplinary (Linguistics, Psychology) approach to study mathematical word problems. They systematically varied linguistic and numerical complexity within one study and examined how both factors contribute to mathematical performance in this task. The symposium concludes with a general discussion about how language and numerical representations constitute mathematical cognition.
Influences of number word inversion on multi-digit number processing: a translingual eye-tracking study Julia Bahnmueller1,2, Silke Goebel3, Korbinian Moeller1, Hans-Christoph Nuerk1,2 1 IWM-KMRC Tu¨bingen, Germany; 2 University of Tu¨bingen, Germany; 3 University of York, UK Differences in number word systems become most obvious for multidigit numbers. Therefore, the investigation of multi-digit numbers is crucial to identify linguistic influences on number processing. One of the most common specificities of a number word system is the inversion of number words with respect to the digits of a number (e.g., the German number word for 27 is siebenundzwanzig (*seven and twenty). While linguistic influences of the number word system have been reliably observed over the last years, the specific cognitive contributions underlying these processes are still unknown. Therefore present study aimed at investigating the underlying cognitive processes and language specificities of three-digit number
Cogn Process (2014) 15 (Suppl 1):S1–S158 processing. More specifically, it was intended to clarify to which degree three-digit number processing is influenced by parallel and/or sequential processing of the involved digits and modulated by language. English- and German-speaking participants were asked to complete a three-digit number comparison task while their response latencies as well as their eye movements were recorded. Results showed that in both language groups there were indicators of both parallel and sequential processing with clear-cut language-based differences being observed. Reasons for the observed language-specific differences contributing to a more comprehensive understanding of mathematical cognition are discussed.
On the influence of linguistic and numerical complexity in word problems Gabriella Daroczy1, Magdalena Wolska1, Hans-Christoph Nuerk1,2, Detmar Meurers1 1 University of Tu¨bingen, Germany; 2 IWM-KMRC Tu¨bingen, Germany Word problems, in which a mathematical problem is given as a reading text, before arithmetic calculation can begin, belong to the most difficult mathematical problems in children in adults. Different linguistic factors, e.g., text complexity (nominalization vs. verbal phrases), numerical factors, (carry or non-carry, addition or subtraction), and the relation between linguistic text and mathematical problem (order consistency) can all contribute to the difficulty of a word problem. Our interdisciplinary group systematically varied linguistic and numerical factors in a withinparticipant design. The results showed that both linguistic and numerical complexity as well as their interrelation contributed to mathematical performance.
Linguistic influences on numerical understanding: the case of Welsh Ann Dowker1, Delyth Lloyd2, Manon Roberts3 1 Dept of Experimental Psychology, University of Oxford, England; 2 University of Melbourne, Australia; 3 Worcester College, Oxford, England It is sometimes suggested that a reason why children in Pacific Rim countries excel in mathematics is that their counting systems are highly transparent: e.g. 13 is represented by the equivalent of ‘tenthree’; 22 by the equivalent of ‘two-tens-two’ etc. This may make both counting and the representation of place value easier to acquire than in many other languages. However, there are so many cultural and educational differences between, for example, the USA and China that it is hard to isolate the influence of any particular factor. In Wales, both a regular counting system (Welsh) and an irregular counting system (English) are used within a single region. Approximately 20 % of children in Wales receive their education in the Welsh medium, while following the same curriculum as those attending English medium schools. This provides an exceptional opportunity for studying the effects of the regularity of the counting system, in the absence of major confounding factors. Studies so far suggest that Welsh-speaking children do not outperform their English-speaking counterparts in all aspects of arithmetic, but that they do show superiority in some specific aspects: notably in reading and comparing 2-digit numbers, and in the precision of their non-verbal number line estimation.
S21
Reading space into numbers: an update Martin H. Fischer1, Samuel Shaki2 1 University of Potsdam, Germany;
2
Ariel University, Israel
Number-Space associations, and the SNARC effect in particular, were extensively investigated in the past two decades. Still, their origin and directionality remain unclear. We will address the following questions: (a) Does the SNARC effect reflect recent spatial experience or long-standing directional habits? (b) Does the SNARC effect spillover from reading habits for numbers or from reading habits for words? (c) What is the contribution of other directionality cues (e.g., vertical grounding such as ‘‘more is up’’; cultural metaphors)? Finally, we will consider the impact of empirical findings from an Implicit Association Test.
How language and numerical representations constitute mathematical cognition: an introductory review Hans-Christoph Nuerk University of Tu¨bingen and IWM-KMRC Tu¨bingen, Germany Mathematical or numerical cognition has often been studied largely independently of language and linguistic processes. Only in the last years, the influence of such linguistic processes has received more interest. Now it is known that even the most basic tasks like magnitude comparison and parity judgment and even basic representations, such as spatial representation of number magnitude, are influenced by language and linguistic processes. Within the general topic of language contributions to mathematical cognition and performance we can distinguish at least three different foci: (i) How does spatial representation of number influence by reading and writing direction? (ii) How do number word structures of different languages influence mathematical and numerical performance? (iii) How are linguistic abilities of children and linguistic complexity of mathematical tasks related to mathematical performance? A short overview over the state of the research in above topics is given and it is introduced, which open questions are addressed in this symposium.
Language influences number processing: the case of bilingual Luxembourg Christine Schiltz, Amandine Van Rinsveld, Sonja Ugen Cognitive Science and Assessment Institute, University of Luxembourg, Luxembourg In a series of studies we investigated how language affects basic number processing tasks in a German–French bilingual setting. The Luxembourg school system indeed progressively educates pupils to become German–French bilingual adults, thanks to extensive language courses in both German and French, as well as a progressive transition of teaching language from German (dominant in primary school) to French (dominant in secondary school). Studying numerical cognition in children and adults successfully going through the Luxembourg school system thus provides an excellent opportunity to investigate how progressively developing bilingualism impacts numerical representations and computations. Studying this question in Luxembourg’s German–French bilingual setting is all the more
123
S22 interesting, since the decades and units of two-digit number words follow opposite structures in German (i.e. unit-decade) and French (decade-unit). In a series of experiments pupils from grades 7, 8, 10, 11, and adults made magnitude comparisons and additions that were presented in different formats: Arabic digits and number words. Both tasks were performed in separate German and French testing sessions and we recorded correct responses rates and response times. The results obtained during magnitude comparison show that orally presented comparisons are performed differently by the same participants according to task language (i.e. different compatibility effects in German vs. French). For additions it appears that the level of language proficiency is crucial for the computation of complex additions, even in adults. In contrast, adults tend to retrieve simple additions equally well in both languages. Taken together, these results support the view of a strong language influence on numerical representations and computations.
Language differences in basic numerical tasks Mojtaba Soltanlou, Stefan Huber, Hans-Christoph Nuerk University of Tu¨bingen and IWM-KMRC Tu¨bingen, Germany Connections between knowledge of language and knowledge of number have been suggested on theoretical and empirical grounds. Chomsky (1986) noted that both the sentences of a language and the numbers in a counting sequence have the property of discrete infinity, and he suggested that the same recursive device underlies both (Bloom 1994 and Hurford 1987). Numerical researchers have therefore begun to examine the influences of linguistic properties. In this internet study, we explored adults from various countries in some basic numerical tasks consist of symbolic and non-symbolic magnitude comparison and parity judgment, and recorded responses to find the error rate and reaction time. The results suggest that not only distinct languages influence these kinds of tasks differentially, but that the other cultural and individual factors play an important role in numerical cognition.
Cognitive components of the mathematical processing network in primary school children: linguistic and language independent contributions Denes Szucs University of Cambridge, UK We have tested the cognitive components of mathematical skill in more than one hundred 9 year old primary school children. We aimed to separate the contributions of language related and language independent skills. We used 18 cognitive tests and 9 custom experiments. We identified phonological decoding efficiency and verbal intelligence as important contributors to mathematical performance (measured by standardized tests). In addition, spatial ability, visual short term and working memory were also strong predictors of arithmetic performance. Further, children with pure developmental dyscalculia only showed impaired visuo-spatial processing but no impairment in verbal and language function. The results can shed light on the differing role of language and visual function in arithmetic and on co-morbidity of language and arithmetic disorders.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158
It does exist! A SNARC effect amongst native Hebrew speakers is masked by the MARC effect Joseph Tzelgov, Bar Zohar-Shai Ben-Gurion University of the Negev, Israel The SNARC effect has been found mainly with participants who speak Germanic languages. The effect in these studies implies that mental number line spreads from left-to-right. Therefore, it was suggested that the effect derives from the experience of writing from left-to-right. Commonly, studies of spatial-numerical associations in Hebrew speakers report a null SNARC effect when the standard designs in which the participants are asked to perform parity task twice, each time with a different parity-to-hand mapping. It has been argued that this is due to different reading directions of words and numbers. Hebrew is written from right-toleft while numbers are written by Hebrew writers from left-toright as in Germanic languages. In this paper, we show that a SNARC effect in native Hebrew speakers does exists when the design minimizes the MARC effect. Furthermore even Hebrew is written from right-to-left the mental number line as estimated by the SNARC effect spreads from left-to-right as in Germanic languages. These findings challenge the assumption that direction of reading is the main source of the direction of spatial-numerical association.
MODELING OF COGNITIVE ASPECTS OF MOBILE INTERACTION Convenors: Nele Russwinkel, Sabine Prezenski, Stefan Lindner TU Berlin, Germany Interacting with mobile devices is gaining more and more importance in our daily life. Using those devices provides huge comfort, but nevertheless entails specific challenges. In contrast to the classical home computer setting, mobile device usage is more prone to disruptions, more influenced by time pressure and more likely to be affected by earlier interaction experiences. An important issue in this context consists in interfaces fitting best for the users’ cognitive abilities. These abilities display a high variety between different groups of users. How can developers and designers adapt an interface to meet the users’ skills and preferences? For these purposes, cognitive modeling provides an appealing opportunity to gain insights into the users’ skills and cognitive processes. It offers a theoretical framework as well as a computational platform for testing theories and deriving predictions. The scope of this symposium lies in introducing selected approaches to user modeling and showing their application to the domain of mobile interaction. In this context we are particularly interested in criteria like learnability and efficiency from a cognitive as well as a technical point of view. Moreover, research concerning individual differences, interruption and expectancy is presented. Overall, we aim to show that the mobile interaction scenario offers an interesting research area to test model approaches in real life applications, but also discuss cognitive processes that are relevant within those tasks. We will look upon those different cognitive aspects of mobile interaction and the role of modeling to improve cognitive appropriate applications.
Cogn Process (2014) 15 (Suppl 1):S1–S158
Creating cognitive user models on the basis of abstract user interface models Marc Halbru¨gge TU Berlin, Germany The recent explosion of mobile appliances creates new challenges not only for application developers and content creators, but also for usability professionals. Conducting a classical usability study of a mobile user interface (UI) on an exhaustive number of devices is more or less impossible. One approach to tackle the engineering side of the problem is model-based user interface development, where an abstract UI model is adapted to the target device at runtime (Calvary et al. 2003). When this method is applied, the application flow is modeled first and user controls are abstractly identified by their roles therein (e.g. command, choice, output). The elements of the final UI as presented to the users (e.g. buttons, switches, labels) are all representations of those, enriched by physical properties like position, size, and textual content. While knowing the sizes and positions of the UI elements already allows predictions of completion times for previously specified tasks, e.g. by creating simple cognitive models using CogTool (John et al. 2004), the additional information encoded into the abstract UI model allows to go much further. It contains machine readable knowledge about the application logic and the UI elements that are to be visited to attain a specified goal, which creates a significant opportunity for machine translation into more precise cognitive models (Quade et al. 2014). In this talk, I will show how completion time predictions can be improved based on abstract UI model information. Data from two empirical studies with a kitchen assistance application is presented to illustrate the method and quantify the gain in prediction accuracy. References Calvary G, Coutaz J, Thevenin D, Limbourg Q, Bouillon L, Vanderdonckt J (2003) A unifying reference framework for multitarget user interfaces. Interact Comput 15(3):289–308 John BE, Prevas K, Salvucci DD, Koedinger K (2004) Predictive human performance modeling made easy. In: CHI ‘04: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press, pp 455–462 Quade M, Halbru¨gge M, Engelbrecht KP, Albayrak S, Mo¨ller S (2014) Predicting task execution times by deriving enhanced cognitive models from user interface development models. In: Proceedings of EICS 2014, Rome, Italy (in press)
Expectations during smartphone application use Stefan Lindner TU Berlin, Germany Expectations serve a multitude of purposes and play a large role in the adoption and use of new technological devices. I will briefly discuss a classification of expectations, implementation ideas in ACT-R and their role during smartphone app use. In a general sense, expectations coordinate our goals and desires with the current and the future state of the environment. They are necessary for any kind of intentions, help in action preparation (Umbach et al. 2012), and play a prominent role in action-perception feedback loops (Friston, Kiebel 2009). Experience-based expectations are expectations that result from the individual learning history. Both the utility and activation mechanisms of ACT-R can be interpreted as reflecting experience-based expectations about our environment. One possible way to model the formation of experience-based expectations from past experiences using the partial matching and blending algorithms of ACT-R is
S23 described in Kurup et al. (2012). Other implementations are possible (Lindner, Russwinkel 2013). Universal expectations are expectations that result from the universally inherited pre-structuring of the environment. In ACT-R universal expectations are in part already reflected in the modeler’s decisions regarding the content of the model environment, memory items and production elements. Both types of expectations play a dynamic role during the adaptation and use of a technical device. Using a new smartphone app users will first rely on general expectations derived from past use of other smart phone apps or computer programs. Universal expectations, especially in the form of assumed form-function contingencies, play an important role in this phase as well. With time, however, users will increasingly rely on expectations that are in line with specific knowledge acquired during use. References Friston K, Kiebel S (2009) Predictive coding under the free-energy principle. Philos Trans R Soc Biol Sci 364:1211–1221 Kurup U, Lebiere C, Stentz A, Hebert M (2012) Using expectations to drive cognitive behavior. In: Proceedings of the 26th AAAI conference on artificial intelligence Lindner S, Russwinkel N (2013). Modeling of expectations and surprise in ACT-R. In: Proceedings of the 12th international conference on cognitive modeling, pp 161–166. Available online: http://iccmconference.org/2013-proceedings/papers/0027/index.html Umbach VJ, Schwager S, Frensch PA, Gaschler R (2012) Does explicit expectation really affect preparation? Front Psychol 3:378. doi:10.3389/fpsyg.2012.00378
Evaluating the usability of a smartphone application with ACT-R Sabine Prezenski TU Berlin, Germany The potentials of using ACT-R (Anderson 2007) based cognitive models for evaluating different aspects of usability are demonstrated using a shopping list application for an Android application. Smartphone applications are part of our everyday life. A successful application should meet the standard of usability as defined in EN ISO-924-110 (2008) and EN ISO-924-111 (1999). In general, usability testing is capacious and requires vast resources. In this work, we demonstrate how cognitive models can answer important questions concerning efficiency, learnability and experience in a less demanding and rather effective way. Further we outline how cognitive models provide explanations about underlying cognitive mechanisms which effect usability. Two different versions of a shopping list application (Russwinkel and Prezenski 2014) are evaluated. The versions have a similar appearance but differ in menu-depth. User tests were conducted and an ACT-R model, able to interact with the application, was designed. The task of the user respectively the model consists in selecting products for a shopping list. In order to discover potential learning effects, repetition of the task was required. User data show, that for both versions time on task decreases as user experience increases. The version with more menu-depth is less efficient for novice users. The influence of menu-depth decreases as user experience increases. Learning transfers from different versions are also found. Time on task for different conditions is approximately the same for real users and the model. Furthermore, our model is able to explain the effects displayed in the data. The learning effect is explained through the building of application-specific knowledge chunks in the model’s declarative memory. These application-specific knowledge chunks further resolve why expertise is more important than menu-depth.
123
S24
Cogn Process (2014) 15 (Suppl 1):S1–S158
References Anderson JR (2007) How Can the Human Mind Occur in the Physical Universe? (p 304) New York Oxford University Press EN ISO 9241-110 (2008) Ergonomics in Human-System-Interaction— Part 110: Fundamentals in Dialogmanagement (ISO 9241-110: 2006). International Organization for Standardization, Genf EN ISO 9241-11 (1999) Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs). Part 11: Guidance on Usability. International Organization for Standardization, Genf Russwinkel N, Prezenski S (2014) ACT-R meets usability. In: Pro ceedings of the sixth international conference on advanced cognitive technologies and applications. COGNITIVE
possible? In: Stephanidis C (ed) Universal access in human– computer interaction. Users diversity, vol. 6766 of lecture notes in computer science. Springer, Berlin, pp 131–139 Dickinson A, Arnott JL, Prior S (2007) Methods for human–computer interaction research with older people. Behav Inf Technol 26(4):343–352 Hanson VL (2011) Technology skill and age: what will be the same 20 years from now? Univ Access Inf Soc 10:443–452 Hawthorn D (2000) Possible implications of aging for interface designers. Interact Comput 12(5):507–528 Reason, JT (1990) Human error. Ambridge University Press, Cambridge
Simulating interaction effects of incongruous mental models
‘‘Special offer! Wanna buy a trout?’’—Modeling user interruption and resumption strategies with ACT-R
Matthias Schulz TU Berlin, Germany
Maria Wirzberger TU Berlin, Germany
Traditional usability evaluations involving older adults are difficult to conduct (Dickinson et al. 2007) and the results may also be misleading, as often only the cognitively and physically fittest seniors participate (Hawthorn 2000). In addition to this, older adults often lack experience in using modern devices (Hanson 2011). Furthermore, it is reasonable to assume that older adults often have problems operating new devices, if they inappropriately transfer prior experience using other devices (Arning, Ziefle 2007). Such an inappropriately transfer would result in an increase of wrong or redundant interaction steps, which in turn may lead to unintended actions being recognized by the system (Bradley et al. 2011). To simulate the effects of incongruous mental models or the inappropriate transfer of prior experience using other devices, an existing tool for automatic usability evaluation—the MeMo workbench—was extended. The goal of the enhancement was to simulate interaction of users with a smartphone including mistakes and slips; According to Reason (Reason 1990, p 12 ff.), Mistakes, Lapses, and Slips are the primary error types which can be used to classify errors in human computer interaction. To simulate mistakes—errors which result from incongruous mental models or inappropriately transferring prior experience—a new processing module was added. This processing module uses 4 generalized linear models (GLMs) to compute what kind of interaction the user model intends to apply to the touchscreen. To simulate slips we added a new execution module which computes the probability that the user model interaction is not executed as intended (e.g. missing a button when trying to hit it). Our results show that it is possible to simulate interaction errors (slips and mistakes) and describe interaction parameters for younger and older adults operating a touchscreen by using the improved MeMo workbench.
Interruption is a frequently appearing phenomenon users have to deal with in interactions with technical systems. Especially when using mobile applications on Smartphones they are confronted with a variety of distractors, induced by the system itself (e.g., product advertisement, system crash) or resulting from the mobile context (e.g., motion, road traffic). Such interruptions might be critical especially in periods of already enhanced demands on working memory, resulting in increased experienced workload. Based on a time course model of interruption and resumption to a main task, developed by Altmann and colleagues (e.g., Altmann, Trafton 2004), this research explores an interruption scenario due to product advertisement while using a simple shopping app Product advertisement is an omnipresent and at the same time cognitively demanding kind of interruption, as it forces a decision for or against the offered product. We developed an ACT-R model, able to perform an interrupted product selection task under alternating workload conditions, resuming by either cognitively or visually tying in with the product selection. In brief, the task consists of searching and selecting a set of predefined products in several runs, and meanwhile being interrupted by product advertisement at certain times. Different levels of workload are induced by shopping for one vs. three people. Model validation is performed experimentally with a sample of human participants, assessing workload by collecting pupil dilation data. Our main focus of analysis consists in how execution and resumption performance differ in case of workload, and what strategies users apply in this terms to react to interruptions. In detail, we expect an impaired task performance and extended resumption times with increasing workload. Moreover, strategies while resuming to the product selection might differ in terms of varying workload levels. Important results concerning the assumed effects will be addressed within this talk.
References Arning K, Ziefle M, (2007) Understanding age differences in PDA acceptance and performance. Comput Human Behav 23(6):2904– 2927 Bradley M, Langdon P, Clarkson P (2011) Older user errors in handheld touchscreen devices: to what extent is prediction
123
References Altmann, EM, Trafton JG (2004) ‘‘Task interruption: resumption lag and the role of cues’’. In: Proceedings of the 26th annual conference of the Cognitive Science Society, Chicago, Illinois
Cogn Process (2014) 15 (Suppl 1):S1–S158
Tutorials Introduction to probabilistic modeling and rational analysis Organizer: Frank Ja¨kel University of Osnabru¨ck, Germany The first part of the course is a basic introduction to probability theory from a Bayesian perspective, covering conditional probability, independence, Bayes’ rule, coherence, calibration, expectation, and decision-making. We will also discuss how Bayesian inference differs from frequentist inference. In the second part of the course we will discuss why Bayesian Decision Theory provides a good starting point for probabilistic models of perception and cognition. The focus here will be on rational analysis and ideal observer models that provide an analysis of the task, the environment, the background assumptions and the limitations of the cognitive system under study. We will go through several examples from signal detection to categorization to illustrate the approach.
Modeling vision Organizer: Heiko Neumann University of Ulm, Germany Models of neural mechanisms underlying perception can provide links between experimental data from different modalities such as, e.g., psychophysics, neurophysiology, and brain imaging. Here we focus on visual perception. The tutorial is structured into three parts. In the first part the role of models in vision science is motivated. Models can be used to formulate hypotheses and knowledge about the visual system that can be subsequently tested in experiments which, in turn, may lead to model improvements. Modeling vision can be described at various levels of abstraction and using different approaches (first principles approaches, phenomenological models, dynamical systems). In the second part specific models of early and mid-level vision are reviewed, addressing topics such as, e.g., contrast and motion detection, perceptual grouping, motion integration, figure-ground segregation, surface perception, and optical flow. The third part focuses on higherlevel form and motion processing and building learning-based representations. In particular, object recognition, biological/articulated motion perception, and attention selection are considered.
Visualization of eye tracking data Organizer: Michael Raschke Contributors: Tanja Blascheck, Michael Burch, Kuno Kurzhals, Hermann Pflu¨ger University of Stuttgart, Germany Apart from measuring completion times and recording accuracy rates of correctly given answers during performance of visual tasks, eye tracking experiments provide an additional technique to analyze how the attention of an observer is changing on a presented stimulus. Besides using statistical algorithms to compare eye tracking metrics, visualization techniques allow us to visually analyze different aspects of the recorded data. However, in most times only state of the art visualization techniques are usually used, such as scan path or heat map visualizations.
S25 In this tutorial we will present an overview on further existing visualization techniques for eye tracking data and demonstrate their application in different user experiments and use cases. The tutorial will present three topics of eye tracking visualization: 1.) Visualization for supporting the general analysis process of a user experiment. 2.) Visualization for static and dynamic stimuli. 3.) Visualization for understanding cognitive and perceptual processes and refining parameters for cognition and perception simulations. This tutorial is designed for researchers who are interested in eye tracking in general or in applying eye tracking techniques in user experiments. Additionally, the tutorial could be of interest for psychologists and cognitive scientists who would like to evaluate and refine cognition and perception simulations. It is suitable for PhD students as well as for experienced researchers. The tutorial requires a minimal level of pre-requisites. Fundamental concepts of eye tracking and visualization will be explained during the tutorial.
Introduction to cognitive modelling with ACT-R Organizers: Nele Russwinkel, Sabine Prezenski, Fabian Joeres, Stefan Lindner, Marc Halbru¨gge Contributors: Fabian Joeres, Maria Wirzberger; Technische Universita¨t Berlin, Germany ACT-R is the implementation of a theory of human cognition. It has a very active and diverse community that uses the architecture in laboratory tasks others in applied research. ACT-R is oriented on the organization of the brain and is called hybrid architecture because it holds symbolic and subsymbolic components. The aim of working on cognitive models with a cognitive architecture is to understand how humans produce intelligent behavior. In this tutorial the cognitive architecture ACT-R is introduced (Anderson 2007). In the beginning we will give a short introduction of the background, structure and scope of ACT-R. Then we would like to start with hands-on examples how cognitive models are written in ACT-R. In the end of the tutorial we will give a short overview about recent work and its benefit for applied cognitive science. References Anderson JR (2007) How can the human mind occur in the physical universe? Oxford University Press, New York
Dynamic Field Theory: from sensorimotor behaviors to grounded spatial language Organizers: Yulia Sandamirskaya, Sebastian Schneegans Ruhr University Bochum, Germany Dynamic Field Theory (DFT) is a conceptual and mathematical framework, in which cognitive processes are grounded in sensorimotor behavior through continuous in time and in space dynamics of Dynamic Neural Fields (DNFs). DFT originates in Dynamical Systems thinking which postulates that the moment-to-moment behavior of an embodied agent is generated by attractor dynamics, driven by sensory inputs and interactions between dynamic variables. Dynamic Neural Fields add representational power to the Dynamical Systems framework through DNFs, which formalize the dynamics of neuronal populations in terms of activation functions defined over behaviorally relevant parameter spaces. DFT has been successfully used to account
123
S26 for development of visual and spatial working memory, executive control, scene representation, spatial language, and word learning, as well as to guide behavior of autonomous cognitive robots. In the tutorial, we will cover the basic concepts of Dynamic Field Theory in several short lectures. The topics will be: the attractors and instabilities that model elementary cognitive functions; the couplings between DNFs and multidimensional DNFs; coordinate transformations and coupling DNFs to sensory and motor systems; autonomy
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 within DFT. We will show on an exemplary architecture for generation of flexible spatial language behaviors how the DNF architectures may be linked to sensors and motors and generate realworld behavior autonomously. The same architecture may be used to account for behavioral findings on spatial language. The tutorial will include a hands-on session to familiarize participants with a MATLAB software framework COSIVINA, which allows to build complex DNF architectures with little programming overhead.
Cogn Process (2014) 15 (Suppl 1):S1–S158
Poster presentations The effect of language on spatial asymmetry in image perception Zaeinab Afsari, Jose´ Ossando´n, Peter Ko¨nig Osnabru¨ck University, Germany Image viewing studies recently revealed that healthy participants demonstrated leftward spatial bias while performing free viewing task. This leftward gaze bias has been suggested to be due to the lateralization in the cortical attention system, but might be alternatively explained or influenced by reading direction on the horizontal spatial bias while freely viewing images. Four eye-tracking experiments were conducted by using different bilingual subjects and different direction of reading paragraphs primes. Participants first read a text and subsequently freely viewed nine images while the eye movements were recorded. Experiment 1 investigates the effect of reading direction among bilingual participants with right-to-left (RTL) and left-to-right (LTR) text primes. Those participants were native Arabic/Urdu speakers. In concordance with previous studies, after reading LTR prime, a leftward shift in the first second of image exploration was observed. In contrast, after reading RTL text primes, participants displayed a rightward spatial bias. This result demonstrates that reading direction of text primes influences later exploration of complex stimuli. In experiment 2, we investigated whether this effect was due to a systematic influence of native vs. secondary language, independently of the direction of reading. For this purpose, we measured German/English bilinguals with German/ English LTR reading direction text stimuli. Here, participants showed leftward spatial bias after reading LTR texts in either cases. This demonstrates that for the present purpose, the difference between primary and secondary language is not important. In Experiment 3, we investigated the relative influence of scanning direction and actual reading direction. LTR bilingual participants were presented with normal (LTR) and mirrored left-to-right (mLTR) texts. Upon reading the primes, reading direction differed markedly, reflecting mirrored and not mirrored conditions. However, we did not observe significant differences in the leftward bias. The bias is even slightly stronger after reading mLTR. This experiment demonstrates that the actual scanning direction did not influence the asymmetry on later complex image stimuli. In Experiment 4, we studied the effect of reading direction among bilingual participants with LTR as primary language and RTL as secondary language. These participants were native Germans and Arabic Germans who learned Arabic mainly later in life. They showed a leftward bias after reading both LTR and RTL text primes. In conclusion, although it seems like the reading direction was the main factor for modulating the perceptual bias, there could be another explanation. The innate laterality systems in our brain (left lateralized linguistic and right lateralized attention) play a role in increasing/decreasing the bias.
Towards formally founded ACT-R simulation and analysis Rebecca Albrecht1, Michael Gießwein2, Bernd Westphal2 1 Center for Cognitive Science, University of Freiburg, Germany; 2 Software Engineering, University of Freiburg, Germany Abstract The semantics of the ACT-R cognitive architecture is today defined by the ACT-R interpreter. As a result, re-implementations of ACT-R
S27 which, e.g., intend to provide a more concise syntax cannot be proven correct. We present a re-implementation of ACT-R which is based on a formal abstract semantics of ACT-R. Keywords ACT-R Implementation, Formal Semantics Introduction ACT-R (Anderson 1983, 2007) is a widely used cognitive architecture. It provides an agent programming language to create a cognitive model and an interpreter to execute the model. A model consists of a set of chunk types, a set of production rules, and the definition of an initial cognitive state. An execution of a model is a sequence of time-stamped cognitive states where one cognitive state is obtained by the execution of a production rule on its predecessor in the sequence. Over the past thirty years the ACT-R interpreter has been extended and changed immensely based on findings in psychological research. Unfortunately, the relation between concepts of the ACT-R theory and the implementation of the ACT-R interpreter are not always clear. So today, strictly speaking, only the Lisp source code of the ACT-R interpreter is defining the exact semantics of an ACT-R model, so ‘‘it is often felt that modelers merely write computer code that mimics the human data’’ (Stewart, West 2007). Due to this situation, it is unnecessarily hard to compare different ACT-R models for similar tasks and ACT-R modelling is often perceived to be rather inefficient and error prone (Morgan, Haynes, Ritter, Cohen 2005) in the literature. To overcome these problems, we propose a formal abstract syntax and semantics for the ACT- R cognitive architecture (Albrecht 2013; Albrecht, Westphal 2014b). The semantics of an ACT-R model is the transition system which describes all possible computations of an ACT-R model. In this work, we report on a proof-of-concept implementation of the formal semantics given in Albrecht (2013) which demonstrates a formally founded approach to ACT-R model execution and provides a basis for new orthogonal analyzes of (partial) ACT-R models, e.g., for the feasibility of certain sequences of rule executions (Albrecht, Westphal 2014a). Related Work Closest to our work is the deconstruction and reconstruction of ACTR by Stewart and West (2007). Their work aims to ease the evaluation of variations in the structure of computational models of cognition. To this end, they analyzed the Lisp implementation of the ACT- R 6 interpreter and re-engineered it, striving to clarify fundamental concepts of ACT-R. To describe these fundamental concepts they use the Python programming language and obtain another working ACT-R in- terpreter called Python ACT-R. To validate Python ACT-R, they statistically compare predictions of both implementations on a set of ACT-R models. In our opinion, firstly, there should be an abstract, formal definition of ACT-R syntax and semantics to describe fundamental concepts. And only secondly, another interpreter should be implemented based on this formal foundation which may, as Python ACT-R does, also offer a more convenient concrete syntax for ACT-R models. This twostep approach in particular allows to not only test but formally verify that each interpretation implements the formal semantics. The ACT-UP (Reitter, Lebiere 2010) toolbox for rapid prototyping of complex models is also not based on a formal basis. ACT-UP offers higher level means to access fundamental concepts of the ACTR theory for more efficient modelling, but the aim is not to clarify these fundamental concepts. Re-implementations of ACT-R in the Java programming language (jACT-R, 2010; ACT-R: The Java Simulation and Development Environment 2013) have the main purpose to make ACT-R accessible for other applications written in Java. They do not contribute to a more detailed understanding of basic concepts of the ACT-R theory.
123
S28 Implementation We implemented the formal ACT-R semantics provided by (Albrecht 2013; Albrecht, Westphal 2014b) in the Lisp dialect Clojure, which targets the Java Virtual Machine (JVM). As a Lisp dialect, it is possible to establish a close relation between the formalization and the implementation. By targeting the JVM, our approach subsumes the work of (Bu¨ttner 2010) without the need for TCP/IP based interprocess communication. In the formal semantics the signature for the abstract syntax is described using relation symbols, function symbols, and variables. Chunk types are given as functions and production rules as tuples over the signature. An ACT-R architecture is defined as a set of interpretation functions for symbols used in the signature. The components can be directly translated into a Clojure implementation. The current implementation supports ACT-R tutorial examples for base level learning and spreading activation using an own declarative module (Gießwein 2014). The results of the ACT-R 6 interpreter are reproduced up to small rounding errors. Conclusion Our implementation of an ACT-R interpreter based on a formal semantics of ACT-R demonstrates the feasibility of the two-step approach to separate the clarification of fundamental concepts and a re-implementation. In future work, we plan to extend our implementation to support further models. Technically, our choice of Clojure allows to more conveniently interface Java code and cognitive models. Conceptually, we plan to use our implementation as a basis for more convenient modelling languages and as an intermediate format for new, exhaustive analyzes of cognitive models based on model- checking techniques and constraint solvers. References ACT-R The Java Simulation and Development Environment (2013) Retrieved from http://cog.cs.drexel.edu/act-r/about.html, 16 May 2014 Albrecht R (2013) Towards a formal description of the ACT-R unified theory of cognition. Unpublished master’s thesis, Albert-Ludwigs-Universita¨t Freiburg Albrecht R, Westphal B (2014a) Analyzing psychological theories with F-ACT-R. In: Proceedings of KogWis 2014, to appear Albrecht R, Westphal B (2014b) F-ACT-R: defining the architectural space. In: Proceedings of KogWis 2014, to appear Anderson JR (1983) The architecture of cognition, vol 5. Psychology Press Anderson JR (2007) How can the human mind occur in the physical universe? Oxford University Press, Oxford Bu¨ttner P (2010) Hello Java! Linking ACT-R 6 with a Java simula tion. In: Proceedings of the 10th international conference on cognitive modeling, pp 289–290 Gießwein M (2014) Formalisierung und Implementierung des de klarativen Moduls der kognitiven Architektur ACT-R. (Bachelor’s Thesis, Albert- Ludwigs-Universita¨t Freiburg) jACT-R (2010). Retrieved from http://jactr.org, 16 May 2014 Morgan GP, Haynes SR, Ritter FE, Cohen MA (2005) Increasing effi ciency of the development of user models. In SIEDS, pp 82–89 Reitter D, Lebiere C (2010) Accountable modeling in ACT-UP, a scalable, rapid-prototyping ACT-R implementation. In: Proceedings ofthe 10th international conference on cognitive modeling (ICCM), pp 199–204 Stewart TC, West RL (2007) Deconstructing and reconstructing ACT-R: exploring the architectural space. Cogn Syst Res 8(3):227–236
123
Cogn Process (2014) 15 (Suppl 1):S1–S158
Identifying inter-individual planning strategies Rebecca Albrecht, Marco Ragni, Felix Steffenhagen Center for Cognitive Science, University of Freiburg, Germany Abstract Finding solutions to planning problems can be very complex as they may consist of hundreds of problem states to be searched by an agent. In order to analyze human planning strategies cognitive models can be used. Usually the quality of a cognitive model is evaluated w.r.t. quantitative criteria such as overall planning time. In complex planning tasks, however, this may not be enough as different strategies may need the amount of same time to be solved. We present an integration of different AI methods from knowledge representation and planning to qualitatively evaluate a cognitive model with respect to inter-individual factors. Keywords Qualitative Analysis, Model Evaluation, Strategies, Graph-based Representations, Planning Introduction In cognitive modeling, a computer model based on psychological assumptions is used to describe human behavior in a certain task. In order to evaluate the quality of a cognitive model average results from behavioral experiments, e.g. response times, are compared to average results predicted by a cognitive model. However, this method does not accommodate for modeling qualitative and inter-individual differences. We present a method for analyzing qualitative differences in user strategies w.r.t. psychological factors which are different in individuals, e.g. working memory capacity. A qualitative representation of a user strategy is given by a path, i.e. a sequence of states, in the problem space of a task. Individual factors are represented by numerical parameters controlling user strategies. In order to evaluate a cognitive model strategies used by participants in a behavioral experiment are compared to strategies predicted by the cognitive model w.r.t. different parameter values. The cognitive model is evaluated by identifying for each participant a set of parameter values such that the execution of a model best predicts the participants strategies. Method Sketch Firstly, we represent strategies of participants and the cognitive model w.r.t. different parameter settings in so-called strategy graphs. Formally, a strategy graph for a problem instance p is a directed, labelled multigraph which includes a set of vertices Vp which represent all states traversed by any participant or the cognitive model, a set of edges Ep which represent the application of legal actions in the task, a set of initial states Sp , Vp and a set of goal states Gp , Vp Note that the strategy graph may include multiple edges (for different agents) between two states. An example for a partial strategy graph with a planning depths of three in a task from the Rush Hour planning domain (Flake, Baum 2002) is shown in Fig. 1. Secondly, parameter values for which the cognitive model best replicates human participants’ strategies are identified based on similarity measures calculated for each pair of parameter values and human participants. The similarity of two strategies is restricted to values between 0 and 1 and is calculated based on strategies given in the strategy graph. In the evaluation, each participant is assigned to a set of parameter values where the cognitive model’s strategy is maximally similar to the participant’s strategy. The parameter values assigned to participants are identified as the planning profile of the participant. In this step, several similarity measures are possible, e.g. the Waterman-Smith algorithm (Smith, Waterman 1981).
Cogn Process (2014) 15 (Suppl 1):S1–S158
S29 Flake GW, Baum EB (2002) Rush hour is PSPACE-complete, or ‘‘Why you should generously tip parking lot attendants’’. Theor Comput Sci 270:895–911 Mcdermott D (1996) A heuristic estimator for means-ends analysis in planning. In: Proceedings of the 3rd international conference on AI planning systems, pp 142–149 Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197 Steffenhagen F, Albrecht R, Ragni M (2014) Automatic identification of human strategies by cognitive agents. In: Proceedings of the 37th German conference on artificial intelligence, to appear
Fig. 1 Example of a partial strategy graph for a planning depth of three in the Rush Hour problem domain. States are possible Rush Hour board configurations. Dashed edges indicate moves on optimal solution paths. Solid edges represent moves of participants in a behavioral experiment or moves of cognitive agents. The circle around the state in the center of the figure indicates a so-called decision point where several moves can be considered optimal. The dashed game objects in problem states on the bottom of the figure indicate the game object which was moved
With respect to the presented method, the quality of the cognitive model is given by the mean similarity of strategies used by participants and strategies used by the cognitive model for best replicating parameter settings. Preliminary Evaluation Results We evaluated the proposed method preliminarily in the Rush Hour planning domain (Steffenhagen, Albrecht, Ragni 2014). Human data was collected in a psychological experiment with 20 participants solving 22 different Rush Hour tasks. The cognitive model was programmed to use means-end analysis (Faltings, Pu 1992; Mcdermott 1996) with different parameters to control local planning behavior with respect to assumed individual factors. The similarity was calculated with the Waterman-Smith algorithm for calculating local sequence alignments (Smith, Waterman 1981). For each of the 20 participants a set of parameter values controlling the cognitive model was identified (1) constantly over all tasks and (2) for each task separately. The evaluation reveals that this cognitive model can predict 44 % of human strategies for (1) and 76 % of human strategies for (2). Conclusion We present a method to qualitatively evaluate cognitive models by analyzing user strategies, i.e. sequences of states traversed in the solution of a task. The state space of a planning problem, e.g. the Rush Hour problem space, might be very complex. As a result, user strategies and, therefore, underlying cognitive processes cannot be analyzed by hand. With the presented method human strategies are analyzed automatically by identifying cognitive models which traverse the same problem states as human participants. In cognitive architectures often numerical parameters are used to control the concrete behavior of a cognitive model, e.g. decay rate in ACT-R. Often, these parameters also influence planning strategies of a model. Although parameter values might be different in individuals, usually they are set constantly over all executions of the model. With respect to the outlined similarity measure it is possible to analyze which parameter values induce strategies similar to an individual. Acknowledgment This work has been supported by a grant to Marco Ragni within the project R8-[CSpace] within the SFB/TR 8 ‘‘Spatial Cognition’’. References Faltings B, Pu P (1992) Applying means-ends analysis to spatial planning. In: Proceedings of the 1991 IEEE/RSJ international workshop on intelligent robots and systems, pp 80–85
Simulating events. The empirical side of the event-state distinction Simone Alex-Ruf University of Tu¨bingen, Germany Since Vendler (1957) an overwhelming amount of theoretical work on the categorization of situations concerning their lexical aspect has emerged within linguistics. Telicity, change of state, and punctuality vs. durativity are the main features used to distinguish between events and states. Thus, the VPs (verbal phrases) in (1) describe atelic stative situations, the VPs in (2) telic events: (1) to love somebody, to be small (2) to run a mile, to reach the top Although there are so many theories about what constitutes an event or a state, the empirical studies concerning this question can be counted on one hand. This is quite surprising, since the notion of lexical aspect is a central issue within verb semantics. Even more surprising is the fact that these few studies provide results pointing in completely opposite directions: The studies in Stockall et al. (2010) and Coll-Florit and Gennari (2011) report shorter RTs (reaction times) to events than to states and, therefore, suggest that the processing of events is easier. In contrast, Gennari and Poeppel (2003) found shorter RTs after reading states than after reading events. They explain this result by the higher level of complexity in the semantics of verbs describing events, which requires longer processing times. A closer look at these studies, however, reveals that in nearly all of them different verbs or VPs were compared: Gennari and Poeppel (2003), for example, used eventive VPs like to interrupt my father and stative VPs like to resemble my mother. One could argue that these two VPs not only differ in their lexical aspect, but perhaps also in their emotional valence and in the way the referent described by the direct object is affected by the whole situation, and that these features therefore occurred as confounding variables, influencing the results in an undesirable way. To avoid this problem, in the present study German ambiguous verbs were used: Depending on the context, verbs like fu¨llen (to fill), schmu¨cken (to decorate) and bedecken (to cover) lead to an eventive or a stative reading. With these verbs sentence pairs were created, consisting of an eventive (3) and a stative sentence (4) (= target items). The two sentences of one pair differed only in their grammatical subject, but contained the same verb and direct object: Target items: (3) Der Konditor/fu¨llt/die Form/[…]. The confectioner/fills/the pan/[…]. (4) Der Teig/fu¨llt/die Form/[…]. The dough/fills/the pan/[…]. In a self-paced reading study participants had to read these sentences phrase-by-phrase and in 50 % of all trials answer to a comprehension question concerning the content of the sentence.
123
S30 Note that in the event sentences all referents described by the grammatical subjects were animate, whereas in the state sentences all subjects were inanimate. Many empirical studies investigating animacy suggest that animate objects are remembered better than inanimate objects (see, for example, Bonin et al. 2014). Therefore, shorter RTs on the subject position of event sentences than of state sentences were expected, resulting in a main effect of animacy. Since this effect could influence the potential event-state effect measured on the verb position as a spillover effect, control items containing the same subjects, but different, non-ambiguous verbs like stehen (to stand) were added: Control items: (5) Der Konditor/steht/hinter der Theke/[…]. The confectioner/stands/behind the counter/[…]. (6) Der Teig/steht/hinter der Theke/[…]. The dough/stands/behind the counter/[…]. The results confirmed this assumption: Mean RT measured on the subject position was significantly shorter for the animate than for the inanimate referents, F(1, 56) = 9.65, p = .003 (587 vs. 602 ms). Within the control items, this animacy effect influenced the RTs on the verb position: After animate subjects RTs of the (non-ambiguous) verb were shorter than after inanimate subjects (502 vs. 515 ms), revealing the expected spillover effect. However, within the target items mean RT measured on the position of the (ambiguous) verb showed the opposite pattern: After animate subjects it was significantly longer than after inanimate subjects, F(1, 56) = 4.12, p = .047 (534 vs. 520 ms). Here no spillover effect emerged, but a main effect which can be attributed to the different lexical aspect of the two situation types. If indeed processing times are longer for events than for states, how could this effect be explained? The simulation account, proposed, for example, by Glenberg and Kaschak (2002) und Zwaan (2004), provides an elegant solution. A strong simulation view of comprehension suggests that the mental representation of a described situation comes about in exactly the same way than when this situation is perceived in real-time. This means that ‘‘language is made meaningful by cognitively simulating the actions implied by sentences’’ (Glenberg and Kaschak 2002:595). Imagine what is simulated during the processing of a state like the dough fills the pan: The simulation contains a pan and some dough in this pan, but nothing more. In contrast, the simulation of an event like the confectioner fills the pan not only requires additional participants like the confectioner and perhaps a spatula, but also action (of the confectioner), movement (of the confectioner, the dough, and the spatula), situation change (from an empty to a full pan) and a relevant time course. The simulation of a state can be envisioned as a picture, for the imagination of an event’s simulation a film is needed. In short, the simulation evoked by an event is more complex than that of a state. Under the assumption that a simulation constitutes at least a part of the mental representation of a situation, it seems comprehensible that the complexity of such a simulation has an influence on its processing and that the higher degree of complexity in the simulation of events leads to longer RTs. References Bonin P, Gelin M, Bugaiska A (2014) Animates are better remembered than inanimates: further evidence from word and picture stimuli. Mem Cognit 42: 370–382. doi:10.3758/s13421-013-0368-8 Coll-Florit M, Gennari SP (2011) Time in language: Event duration in language comprehension. Cogn Psych 62:41–79 Gennari SP, Poeppel D (2003) Processing correlates of lexical semantic complexity. Cognition 89:B27–B41 Glenberg AM, Kaschak MP (2002) Grounding language in action. Psychon Bull Rev 9:558–565
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Stockall L, Husband EM, Beretta A (2010) The online composition of events. Queen Mary’s Occasional Papers Advancing Linguistics 19 Vendler Z (1957) Verbs and times. Philosoph Rev 66:143–160 Zwaan, RA (2004) The immersed experiencer: Toward an embodied theory of language comprehension. In: Ross BH (ed) The psychology of learning and motivation, vol 44. Academic Press, New York, pp 35–62
On the use of computational analogy-engines in modeling examples from teaching and education Tarek R. Besold Institute of Cognitive Science, University of Osnabru¨ck, Germany Abstract The importance of analogy for human cognition and learning has widely been recognized, and analogy-based methods are also being explicitly integrated into the canon of approved education and teaching techniques. Still, the actual level of knowledge about analogy as instructional means and device as of today is rather low. In this summary report on preliminary results from an ongoing project, I propose the use of computational analogy-engines as methodological tools in this domain of research, additionally motivating this attempt at connecting AI and the learning sciences by two worked application case studies. Keywords Computational Analogy-making, Artificial Intelligence, Education, Cognitive Modeling, Computational Modeling Introduction: Analogy in Education and Cognitive Modeling Analogical reasoning (i.e., the ability of perceiving and operating on dissimilar domains as similar with respect to certain aspects based on shared commonalities in relational structure or appearance) is considered essential for learning abstract concepts (Gentner et al. 2001) and in general for children’s process of learning about the world (Goswami 2001). Concerning an educational context, analogies facilitate learners’ construction processes of new ideas and conceptions on the grounds of already available concepts (Duit 1991), and can be used for facilitating the understanding of concepts and procedures in abstract and formal domains such as mathematics, physics or science (GuerraRamos 2011). Still, it is not a cure-all as unsuccessful analogies may produce misunderstandings and can result in harmful misconceptions (Clement 1993). Analogy has also been actively investigated in artificial intelligence (AI), bringing forth numerous computational frameworks and systems for automated analogy-making and analogical reasoning. And indeed, computational analogy frameworks also have found entrance into the context of education and teaching: For instance in Thagard et al. (1989), the authors present a theory and implementation of analogical mapping that applies to explanations of unfamiliar phenomena as e.g. used by chemistry teachers, and Forbus et al. (1997) show how an information-level model of analogical inferences can be incorporated in a case-based coach that is being added to an intelligent learning environment. Siegler (1989) conjectures how the Structure-Mapping Engine (Falkenhainer et al. 1989) could be used to gain insights about developmental aspects of analogy use. Analogy Engines in the Classroom: Worked Examples Building on the outcome of these and similar research efforts, in Besold (2013), I firstly advocated to expand research applying analogy-engines to problems from teaching and education into a proper program in its own right, opening up a new application domain to computational analogy-making.
Cogn Process (2014) 15 (Suppl 1):S1–S158 In order to provide factual grounding and initial worked examples for the possible applications of computational analogy-engines, Besold (2013) and Besold et al. (2013) feature two case studies. In both cases, the Heuristic-Driven Theory Pro jection (HDTP) analogymaking framework (Schmidt et al. 2014) was applied to modeling real-world examples taken from a classroom context. Besold (2013) provides an HDTP model of the string circuit analogy for gaining a basic understanding of electric current (Guerra-Ramos 2011) used in science classes for 8 to 9 year old children. Besold et al. (2013) give a detailed and fairly complex formal model of the analogy-based Calculation Circular Staircase (Schwank et al. 2005), applied in teaching basic arithmetics and the conception of the naturals as ordinal numbers to children attending their initial mathematics classes in primary school. The Calculation Circular Staircase, (i.e., a teaching tool shaped like a circular staircase with the steps being made up by incrementally increasing stacks of balls, grouped in expanding circles of ten stacks per circle corresponding to the decimal ordering over the naturals) offers children a means of developing an understanding of the interpretation of numbers as results of transformation operations by enabling a mental functional motor skill-based way of accessing the foundational construction principles of the number space and the corresponding basic arithmetic operations. The HDTP model gives a precise account of how the structure of the staircase and the declarative counting procedure memorized by children in school interact in bringing forth the targeted conception of the natural number space. Summarizing, in both case studies, the respective formal model shows to be highly useful in uncovering the underlying structure of the method or teaching tool, together with the consecutive steps of reasoning happening on the level of computational theory. Conclusion By providing a detailed formal description of the involved domains and their relation in terms of their joint generalization and the corresponding possibility for knowledge transfer our models try to explicate the structural relations and governing laws underlying the respective teaching tools. Also we point out how the identified constructive and transformation-based conceptualizations then can provide support and a deeper- rooted model for the children’s initially very flat and sparse conceptions of the corresponding domains. In general, modeling educational analogies sheds new light on a particular analogy, in terms of which information is transferred, what the limitations of the analogy are, or whether it makes unhelpful mappings; and what potential extensions might be needed. On this basis, we hope to acquire a deeper understanding of the basic principles and mechanisms underlying analogy- based learning in fairly high-level and abstract domains. References Besold TR (2013) Analogy engines in classroom teaching: modeling the string circuit analogy. In: Proceedings of the AAAI Spring 2013 symposium on creativity and (early) cognitive development Besold TR, Pease A, Schmidt M (2013) Analogy and Arithmetics: An HDTP-based model of the calculation circular staircase. In: Proceedings of the 35th annual meeting of the cognitive science society, Cognitive Science Society, Austin, TX Clement J (1993) Using bridging analogies and anchoring intuitions to deal with students’ preconceptions in physics. J Res Sci Teach 30:1241–1257 Duit R (1991) The role of analogies and metaphors in learning sci ence. Sci Educ 75(6):649–672 Falkenhainer B, Forbus K, Gentner D (1989) The structure-mapping engine: algorithm and examples. Artif Intell 41:1–63 Forbus K, Gentner D, Everett J, Wu M (1997) Towards a computa tional model of evaluating and using analogical inferences. In: Proceedings of the 19th annual conference of the cognitive science society, pp 229–234
S31 Gentner D, Holyoak K, Kokinov B (eds) (2001) The analogical mind: perspectives from cognitive science. MIT Press Goswami U (2001) The analogical mind: perspectives from cognitive science, MIT Press, chap Analogical reasoning in children, pp 437–470 Guerra-Ramos M (2011) Analogies as tools for meaning making in elementary science education: how do they work in classroom settings? Eurasia J Math Sci Technol Educ 7(1):29–39 Schmidt M, Krumnack U, Gust H, Ku¨hnberger KU (2014) Heuristicdriven theory projection: an overview. In: Prade H, Richard G (eds) Computational approaches to analogical reasoning: current trends. Springer, Berlin, pp 163–194 Schwank I, Aring A, Blocksdorf K (2005) Beitra¨ge zum Mathematikunterricht, Franzbecker, Hildesheim, chap Betreten erwu¨nscht—die Rechenwendeltreppe Siegler R (1989) Mechanisms of cognitive development. Annu Rev Psychol 40:353–379 Thagard P, Cohen D, Holyoak K (1989) Chemical analogies: two kinds of explanation. In: Proceeding of the 11th international joint conference on artificial intelligence, pp 819–824
Brain network states affect the processing and perception of tactile near-threshold stimuli Christoph Braun1,2,3,4, Anja Wu¨hle1,5, Gianpaolo Demarchi3, Gianpiero Monittola3, Tzvetan Popov6, Julia Frey3, Nathan Weisz3 1 MEG-Center, University of Tu¨bingen, Germany; 2 CIN, Werner Reichardt Centre for Integrative Neuroscience, University of Tu¨bingen, Germany; 3 CIMeC, Center for Mind/Brain Sciences, University of Trento, Italy; 4 Department of Psychology and Cognitive Science, University of Trento, Italy; 5 CEA, DSV/I2BM, NeuroSpin Center, F-91191 Gif-sur-Yvette, France, INSERM, U992, Cognitive Neuroimaging Unit, F-91191 Gif-sur-Yvette, France, Univ Paris-Sud, Cognitive Neuroimaging Unit, F-91191 Gif-sur-Yvette, France; 6 Radboud University Nijmegen, Donders Institute for Brain, Cognition, and Behavior, 6500 HE Nijmegen, The Netherlands Introduction Driving a biological or technical system to its limits reveals more detailed information about its functional principles than testing it at its standard range of operation. We applied this idea to get a better understanding of how tactile information is processed along the somatosensory pathway. To get insight into what makes a stimulus to become conscious i.e. reportable, we studied the cortical processing of near-threshold touch stimuli that are either perceived (hits) or not (misses). Following the concept of win2con proposed by Weisz et al. (2014) we tested the hypothesis that the state of functional connectivity within the sensory network determines up to which level in the sensory processing hierarchy misses are processed as compared to hits. The level of sensory processing was inferred by studying somatosensory evoked responses elicited by the near-threshold stimuli. Since the amplitudes of near-threshold somatosensory stimuli are low, a paired pulse paradigm was used in which inhibitory effects of the near-threshold stimuli onto a subsequently applied suprathreshold stimulus was assessed. Results show that the state of a widespread cortical network prior to the application of the tactile stimulus is crucial for a tactile stimulus to elicit activation of SII and to be finally perceived. Subjects and Methods Twelve healthy subjects participated in the study. Using a piezoelectric stimulator (Quaerosys, Schotten, Germany) we applied tactile stimuli to the tip of the index finger of the left hand. Intensities of the near threshold stimuli were adjusted to subjects’ personal sensory threshold using a staircase procedure. The near-threshold stimulus
123
S32 was followed by a supra-threshold stimulus to probe the cortical activation of the first stimulus. Control conditions in which the first stimulus was omitted and in which the first stimulus was delivered at supra-threshold intensities were also added. Subjects reported in all trials how many stimuli they had perceived. Pre-stimulus network states and post-stimulus cortical processing of the sensory input were studied by means of magnetoencephalography. To assess the cortical network prior to stimulation, source activity was estimated for nodes of an equally spaced grid and all-toall imaginary coherence was calculated. Alterations in power and graph theoretical network parameters were estimated. Since secondary somatosensory cortex (SII) appears to play a crucial role in the processing of consciously perceived tactile stimuli we used it as a seed region for identifying the related brain network. In order to assess post-stimulus processing and its dependency from the prestimulus network state, evoked responses were recorded. Since evoked responses to near-threshold stimulation are rather weak, the activation induced by near-threshold stimulus was probed by subsequently applying a suprathreshold test stimulus. To determine the source activity, a spatio-temporal dipole model with one source for primary somatosensory cortex (SI) contralateral to the stimulation site and two dipoles for ipsi- and contralateral SII were used. The model was applied to both, the direct evoked responses of the near-threshold stimuli and to the activation evoked by the probe stimulus. Since the duration of activation differs across the different sensory brain areas in the paired pulse approach varying ISIs of 30, 60, and 150 ms between the near-threshold and the test stimulus allowed for probing the sensory processing of the near-threshold stimulus at different levels (Wu¨hle et al. 2010). Results Network analysis for the prestimulus period yielded increased alpha power in trials in which the near-threshold stimulus was not detected. On a global level brain networks appeared to be more strongly clustered for misses than for hits. In contrast, on a local level, clustering coefficients were stronger for hits than for misses in particular for contralateral SII. A detailed analysis of the connectedness of SII revealed that except for connections to the precuneus SII was more strongly connected to other brain areas such as ipsilateral inferior frontal/anterior temporal cortex and middle frontal gyrus for hits than for misses. Results suggest that the state of the prestimulus somatosensory network involving particularly middle frontal gyrus, cingulate cortex and fronto temporal regions determine whether near-threshold tactile stimuli elicit activation of SII and are subsequently perceived and reported. Studying poststimulus activation, no significant difference between hits and misses was found on the level of SI, neither for the direct evoked response of the near-threshold stimulus nor for its effects on the subsequent probe stimulus. In contrast, on the level of SII a significant difference between hits and misses could be shown in response to the near-threshold stimuli. Moreover, the SII response of the probe stimulus was inhibited by the primarily applied nearthreshold for exclusively an ISI of 150 ms, but not for shorter ISIs (Wu¨hle et al. 2011). Discussion The here reported study emphasizes the importance of the prestimulus state of brain networks for the subsequent activation of brain regions involved in higher level stimulus processing and for the conscious perception of sensory input. In tactile stimulus processing, secondary somatosensory cortex appears to be the critical region that is embedded in a wide brain network and that is relevant for the gating of sensory input to higher level analysis. This finding corresponds with the established view that processing of sensory information SII is strongly modulated by top-down control. Network analyzes indicated that the sensory network involving SII, middle frontal gyrus, cingulated cortex and fronto temporal brain regions has to be distinguished
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 from the global brain network. For stimuli to be perceived consciously, it seems that the sensory network has to reveal increased coupling in a local (clustering) as well as a long-range (efficiency) sense. Combining a sensory task at the limit of sensory performance with elaborated techniques for brain network analyzes and the study of brain activation, the current study provided insight into the interaction between brain network states, brain activation and conscious stimulus perception. References Weisz N, Wu¨hle A, Monittola G, Demarchi G, Frey J, Popov T, Braun C (2014) Prestimulus oscillatory power and connectivity patterns predispose conscious somatosensory perception. Proc Natl Acad Sci USA 111(4):E417–E425 Wu¨hle A, Mertiens L, Ru¨ter J, Ostwald D, Braun C (2010) Cortical processing of near-threshold tactile stimuli: an MEG study. Psychophysiology 47(3):523–534 Wu¨hle A, Preissl H, Braun C (2011) Cortical processing of nearthreshold tactile stimuli in a paired-stimulus paradigm–an MEG study. Eur J Neurosci 34(4):641–651
A model for dynamic minimal mentalizing in dialogue Hendrik Buschmeier, Stefan Kopp Social Cognitive Systems Group, CITEC and Faculty of Technology, Bielefeld University, Germany Spontaneous dialogue is a highly interactive endeavor in which interlocutors constantly influence each other’s actions. As addressees they provide feedback of perception, understanding, acceptance, and attitude (Allwood et al. 1992). As speakers they adapt their speech to the perceived needs of the addressee, propose new terms and names, make creative references, draw upon established and known to be shared knowledge, etc. This makes dialogue a ‘joint activity’ (Clark 1996) whose outcome is not determined up front but shaped by the interlocutors while the interaction unfolds over time. One of the tasks interlocutors need to carry out while being engaged in a dialogue is keeping track of the dialogue information state. This is usually considered to be a rich representation of the dialogue context, most importantly including which information is grounded and which is still pending to be grounded (and potentially much more information; see, e.g., Ginzburg 2012). Whether such a detailed representation of the information state is necessary—and whether it is a cognitively plausible assumption—for participating in dialogue is a topic of ongoing debate. On the one hand, Brennan and Clark (Brennan and Clark 1996; Clark 1996) state that speakers maintain a detailed model of common ground and design their utterance to the exact needs of their communication partners—even to the extent that approximate versions of mutual knowledge may be necessary to explain certain dialogue phenomena (Clark and Marshall 1981). On the other hand, Pickering and Garrod (2004) argue that—for reasons of efficiency—dialogue cannot involve heavy inference on common ground, but is an automatic process that relies on priming and activation of linguistic representations and uses interactive repair upon miscommunication. A position that falls in between this dichotomy is Galati and Brennan’s (2010) lightweight one-bit partner model (e.g., has the addressee heard this before or not) that can be used instead of full common ground when producing an utterance.
Cogn Process (2014) 15 (Suppl 1):S1–S158 We propose that interlocutors in dialogue engage in dynamic minimal mentalizing, a process that goes beyond the single properties in the focus of Galati and Brennan’s (2010) ‘one-bit’ model, but is comparable in computational efficiency. We assume that speakers maintain a probabilistic, multidimensional (consisting of a fixed number of state variables), and dynamic ‘attributed listener state’ (Buschmeier and Kopp 2012). We model this as a dynamic Bayesian network representation (see Fig. 2) that is continuously updated by the addressees’ communicative feedback (i.e., short verbal-vocal expressions such a ‘uh-huh,’ ‘yeah,’ ‘huh?’; head gestures; facial expressions) seen as evidence of understanding in response to ongoing utterances. The proposed model is multidimensional because it represents the listeners’ mental state of listening in terms of the various communicative functions that can be expressed in feedback (Allwood et al. 1992): is the listener in contact?; is he or she willing and able to perceive and understand what is said?; and does he or she accept the message and agrees to it? Instead of making a decision conditioned on the question whether the interlocutor has heard something before, this model allows to make use of the still computationally feasible but richer knowledge of whether he or she has likely perceived, understood, etc. a previously made utterance. Further, the model is fully probabilistic since the attributed mental states are modelled in a Bayesian network. Each dimension is represented as a random variable and the probabilities over the state of each variable (e.g., low, medium, high understanding) are interpreted in terms of the speaker’s degree of belief in the addressee being in a specific state. This is a graded form of common ground (BrownSchmidt 2012) and presupposition (e.g., this knowledge is most likely in the common ground; see variables GR and GR0 in Fig. 2), which can be accommodated by, e.g., interactively leaving information out or adding redundant information; or by making information pragmatically implicit or explicit. Finally, since the model is based on a dynamic Bayesian network, the interpretation of incoming feedback signals from the addressee is influenced by the current belief state, and changes of the attributed listener state are tracked over time. Representing these dynamics provides speakers with a broader basis for production choices as well as enabling strategic placement of feedback elicitation cues based on informational needs. It also allows for a prediction of the addressee’s likely future mental state, thus enabling anticipatory adaptation of upcoming utterances. In current work, the model of dynamic minimal mentalizing is being applied and evaluated in a virtual conversational agent that is able to interpret its user’s communicative feed- back and adapt its own language accordingly (Buschmeier and Kopp 2011, 2014).
S33 Acknowledgments This research is supported by the Deutsche Forschungsgemeinschaft (DFG) through the Center of Excellence EXC 277 ‘Cognitive Interaction Technology.’ References Allwood J, Nivre J, Ahlse´n E (1992) On the semantics and pragmatics of linguistic feedback. J Semant 9:1–26. doi:10.1093/jos/9.1.1 Brennan SE, Clark HH (1996) Conceptual pacts and lexical choice in conversation. J Exp Psychol Learn Memory Cogn 22:1482–1493. doi:10.1037/0278-7393.22.6.1482 Brown-Schmidt S (2012) Beyond common and privileged: gradient representations of common ground in real-time language use. Lang Cogn Process 62–89. doi10.1080/01690965.2010.543363 Buschmeier H, Kopp S (2011) Towards conversational agents that attend to and adapt to communicative user feedback. In: Proceedings of the 11th international conference on intelligent virtual agents, Reykjavı´k, Iceland, pp 169–182, doi:10.1007/978-3-642-23974-8_19 Buschmeier H, Kopp S (2012) Using a Bayesian model of the listener to unveil the dialogue information state. In: SemDial 2012: proceedings of the 16th workshop on the semantics and pragmatics of dialogue, Paris, France, pp 12–20 Buschmeier H, Kopp S (2014) When to elicit feedback in dialogue: towards a model based on the information needs of speakers. In: Proceedings of the 14th International Conference on Intelligent Virtual Agents, Boston, MA, USA, pp 71–80 Clark HH (1996) Using language. Cambridge University Press, Cambridge. doi:10.1017/CBO9780511620539 Clark HH, Marshall CR (1981) Definite reference and mutual knowledge. In: Joshi AK, Webber BL, Sag IA (eds) Elements of discourse understanding. Cambridge University Press, Cambridge, pp 10–63 Galati A, Brennan SE (2010) Attenuating information in spoken communication: For the speaker, or for the addressee? J Memory Lang 62:35–51. doi10.1016/j.jml.2009.09.002 Ginzburg J (2012) The interactive stance. Oxford University Press, Oxford Pickering MJ, Garrod S (2004) Toward a mechanistic psychology of dialogue. Behav Brain Sci 27:169– 226. doi:10.1017/S0140 525X04000056
Actions revealing cooperation: predicting cooperativeness in social dilemmas from the observation of everyday actions Dong-Seon Chang, Heinrich H. Bu¨lthoff, Stephan de la Rosa Max Planck Institute for Biological Cybernetics, Dept. of Human Perception, Cognition and Action, Tu¨bingen, Germany
Fig. 1 The dynamic Bayesian network model for dynamic minimal mentalizing. The network consists of the mental state variables for contact (C), perception (P), understanding (U), acceptance (AC), agreement (AG), and groundedness (GR) attributed to the listener
Introduction Human actions contain an extensive array of socially relevant information. Previous studies have shown that even brief exposure to visually-observed human actions can lead to accurate predictions of goals or intentions accompanying human actions. For example, motion kinematics can enable predicting the success of a basketball shot, or whether a hand movement is carried out with cooperative or competitive intentions. It has been also reported that gestures accompanying a conversation can serve as a rich source of information for decision making to judge about the trustworthiness of another person. Based on these previous findings we wondered whether humans could actually predict the cooperativeness of another individual by identifying visible social cues. Would it be possible to predict the cooperativeness of a person by just observing everyday actions such as walking or running? We hypothesized that even brief excerpts of human actions depicted and presented as biological motion cues (i.e. point-light-figures) would
123
S34 provide sufficient information to predict cooperativeness. Using motion-capture technique and a game-theoretical interaction setup we explored whether prediction of cooperation was possible merely by observing biological motion cues of everyday actions, and which actions were enabling these predictions. Methods We recorded six different human actions—walking, running, greeting, table tennis playing, choreographed dancing (Macarena) and spontaneous dancing—in normal participants using an inertia-based motion capture system. We used motion capture technology (MVN Motion Capture Suit from XSense, Netherlands) to record all actions. A total number of 12 participants (6 male, 6 female) participated in motion recording. All actions were then post-processed to short movies (ca. 5 s) showing point light stimuli. These actions were then evaluated by 24 other participants in terms of personality traits such as cooperativeness and trustworthiness, on a Likert scale ranging from 1 to 7. The original participants who provided the recorded actions then returned a few months later to be tested for their actual cooperativeness performance. They were given standard social dilemmas used in game theory such as the give some game, stag hunt game, and public goods game. In those interaction games, they were asked to exchange or give tokens to another player, and depending on their choices they were able to win or lose an additional amount of money. The choice of behavior for each participant was then recorded and coded for cooperativeness. This cooperativeness performance was then compared with the perceived cooperativeness based on the different ratings of their actions performed and evaluated by other participants. Results and Discussion Preliminary results showed a significant correlation between cooperativeness ratings and actual cooperativeness performance. The actions showing a consistent correlation were Walking, Running and Choreographed Dancing (Macarena). No significant correlation was observed for actions such as Greeting, Table tennis playing or Spontaneous Dancing. A similar tendency was consistently observed across all actions, although no significant correlations were found for all social dilemmas. The ratings of different actors and actions were highly consistent across different raters and high inter-rater-reliability was achieved. It seems possible that natural and constrained actions carry more social cues enabling prediction of cooperation than actions showing more variance across different participants. Further studies with higher number of actors and raters are planned to confirm whether accurate prediction of cooperation is really possible.
The use of creative analogies in a complex problem situation Melanie Damaskinos1, Alexander Lutsevich1, Dietrich Do¨rner1, Ute Schmid1, C. Dominik Gu¨ss1,2 1 Otto-Friedrich Universita¨t Bamberg, Germany; 2 University of North Florida, USA Keywords Analogy, creativity, dynamic decision making, complex problem solving, strategies Analogical reasoning is one key element of creative thinking and one of the key human abilities of domain-general cognitive mechanisms (Keane 1988). A person takes the structure and elements of one domain and tries to apply them to the new problematic domain. Experimental research has shown the transfer of knowledge from one domain to another (e.g., Wiese et al. 2008). Analogical reasoning has been studied often in classrooms and related to mathematical problems, but the use of analogies in complex and uncertain domains has been studied rarely in the laboratory. Yet, the study of creative analogy use in complex problem solving would be
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 highly relevant considering the demands of most real-life problems. The goal of the current study is to examine how helpful or hindering certain analogies can be for solving a complex and dynamic problem such as improving living conditions of a fictional tribe in the MORO simulation (Do¨rner 1996). We expect an analogy story that highlights a dynamic system (blood sugar) prime participants and facilitate problem solving more than an analogy story that highlights linear processing (visual perception) or no analogy story at all (control). The facilitating analogy story will make participants more sensitive to the interconnectedness of the system variables in the complex problem and therefore will lead to more reflection time at the beginning of the simulation and more in-depth information collection and fewer actions. Method Participants were 29 psychology students from Otto-Friedrich University Bamberg, Germany. (More data will be collected.) We used three different analogy stories (facilitating systems analogy story—blood sugar, distracting linear analogy story—visual perception, control—no story). Participants received either the blood-sugar story, the visual perception story, or no story prior to working with the MORO simulation. The stories were 1.5 pages long including two figures each. The blood-sugar story described the changes in blood sugar dependent on food intake. It also showed the long-term consequences of high sugar consumption. It showed that the body is a dynamic system. The visual perception story described the linear process of perception from stimulus till processing in cortex. The blood-sugar story will prime systemic thinking considering side- and long-term effects of actions and considering the balance of the system. The visual-perception story will prime liner one-dimensional thinking. The control group will not receive any story and not be primed. MORO is the computer simulation of a tribe of semi-nomads in the Sahel zone (Lutsevich 2013). MORO is especially suited to study complex problem solving due to the huge number of variables involved and the demand to come up with novel solutions and to coordinate the decisions. Participants take the role of developmental aid assistants and try to help improve the living conditions of the MORO tribe. Participants sit in front of a computer screen and can select information and make decisions using the mouse. A file documenting all of the participant’s decisions is automatically saved to the hard drive. For the current study we focused only on the first 12 min of played time, because we postulated that especially the initial time would be influenced mostly by the analogy story presented. Later, demands of the problem situation will become more influential. A demographic questionnaire was administered to control for potential confounding variables, assessing age, sex, major, and student status. Results We are still continuing with data collection, but preliminary results refer to the three dependent variables: A—Number of actions, IS— Number of information searches, and RT—Reflection time periods greater than 15 s where no action and no information search took place. These variables were assessed for the first 12 min in intervals of 4 min each from the participants’ MORO log files and then combined. Results did not confirm our expectations. Initial data analysis showed that the group primed with the ‘system’ blood sugar analogy story compared to the two other groups did not show more reflection periods and more information searches in the first 12 min. For the first 12 min, participants of the systems analogy story followed a more ‘‘balanced’’ strategy compared to the two other groups. The control group followed ‘‘blind’’ actionism, engaging in most actions and information searches, but fewest reflection times. The group primed with the linear analogy spent most time reflecting, but made the fewest actions and information searches. The means for actions, information searches, and reflection times of the systems analogy group were between the means of the linear prime group and
Cogn Process (2014) 15 (Suppl 1):S1–S158
S35
control group (see Fig. 1). Mean differences among the three groups were not significant for Actions, F(2, 26) = 1.81, p = .18; but for Information Searches, F(2, 26) = 5.63, p = .009; and for Reflection Times, F(2, 26) = 4.81, p = .02. An alternative explanation for the strategic differences among the three groups could be individual difference variables. We assessed the need to reduce cognitive uncertainty, final high school leaving examination grade, age, and gender. None of the four variables correlated significantly with either actions, or information searches, or reflection time periods (see Table 1). Yet the three decision-making measures correlated significantly with each other. Obviously, the more time spent for reflection, the fewer actions and the fewer information searches took place. Or, the more actions and the more information searches, the fewer reflection times took place. Conclusion Creativity has been rarely studied in relation to complex microworlds. Thus, a process-analysis of creative analogical reasoning in a complex, uncertain, and dynamic microworld is a novel research topic and other researchers expressed the need to experimentally assess creativity in a complex and novel problem situation and to focus on idea evaluation and implementation (Funke 2000). Further data analysis will also include the correlation of strategy and performance in MORO. Preliminary results of the current study showed that the presented analogy stories primed decision making and problem solving but not in the expected direction. Participants primed with the systems story followed a balanced approach where number of actions, information searches, and reflection times were similarly frequent. Participants primed with the linear story spent most time reflecting and searching information, perhaps because they were primed that a decision leads to a linear consequence. Participants who did not receive any story showed most actions and fewest reflection times. It is possible that no story provided no ‘helpful’ cues and lead to most uncertainty and to actionism (see Do¨rner 1996). These findings could have implications for training programs and education which focus on teaching children, students, and experts to be sensitive to the characteristics of complex, uncertain, and dynamic problems.
Actions - InformationSearches - Reflection Times 16 14 12 10 8 6 4 2 0 Linear System Control Linear System Control Linear System Control Prime Prime Prime Prime Prime Prime Actions
Information Searches **
Reflection Times **
Fig. 1 Means of actions, information searches, and reflection time periods of 15 s or longer for the first 12 min of participants working on the MORO simulation
Acknowledgments This research was supported through a Marie-Curie IIF Fellowship to the last author. References Do¨rner D (1996) The logic of failure. Metropolitan Books, New York Funke J (2000) Psychologie der Kreativita¨t [Psychology of creativity]. In: Holm-Hadulla RM (ed) Kreativita¨t. Springer, Heidelberg, pp 283–300 Keane MT (1988) Analogical problem solving. Ellis Horwood, Chichester Lutsevich A, Do¨rner D (2013) MORO 2 (completely revised new version). Program documentation. Otto-Friedrich Universita¨t Bamberg Wiese E, Konerding U, Schmid U (2008) Mapping and inference in analogical problem solving—as much as needed or as much as possible? In: Love BC, McRae K, Sloutsky VM (eds) Proceedings of the 30th annual conference of the cognitive science society. Lawrence Erlbaum, Mahwah pp 927–932
Yes, that’s right? Processing yes and no and attention to the right vs. left Irmgard de la Vega, Carolin Dudschig, Barbara Kaup University of Tu¨bingen, Germany Recent studies suggest that positive valence is associated with the dominant hand’s side of the body and negative valence with the nondominant hand’s side of the body (Casasanto 2009). This association is also reflected in response times, with right- and left-handers responding faster with their dominant hand to positive stimuli (e.g., love), and with their non-dominant hand to negative stimuli (e.g., hate; de la Vega et al. 2012; see also de la Vega et al. 2013). Interestingly, a similar finding emerges for yes- and no-responses: righthanded participants respond faster with their dominant hand to yes, and with their non-dominant hand to no (de la Vega et al. in prep). The present study tested whether the association between yes/no and (non-)dominant hand is reflected in a visual attention shift. Spatial attention has been shown to be influenced by various categories. For example, the association between numbers and horizontal space (SNARC effect; Dehaene et al. 1993) is also reflected in visual attention: in a target detection task, participants responded faster to a target presented on the left after a low digit, and to a target on the right after a high digit (Fischer et al. 2003; see also Dudschig et al. 2012). We adapted the target detection task from Fischer et al. (2003) to investigate visuospatial attention shifts after yes or no. In line with the results obtained by Fischer et al. (2003), we expected faster detections of a target located on the right after yes, and of a target on the left after no. Twenty-two volunteers (1 male; MAge = 23.0, SDAge = 5.3) participated in the study. The word yes (in German: Ja) or no (in German: Nein) appeared centrally on the computer screen for 300 ms, followed by a target on the right or on the left. Participants’ task was
Table 1 Correlations of individual difference variables and behavioral complex-problem solving measures Cognitive uncertainty
Final high school grade
Age
Gender
Actions
-.05
-.11
-.20
-.14
Information searches Reflection times
-.24 .24
.17 .04
-.18 .30
.09 .07
Actions
Information searches
.27 -.70***
-.53**
*** p \ .001; ** p \ .005; * p \ .05
123
S36
Cogn Process (2014) 15 (Suppl 1):S1–S158 de la Vega I, Dudschig C, Kaup B (in prep.) Faster responses to yes with the dominant hand and to no with the non-dominant hand: a compatibility effect Dudschig C, Lachmair M, de la Vega I, De Filippis M, Kaup B (2012) From top to bottom: spatial shifts of attention caused by linguistic stimuli. Cogn Process 13:S151–S154 Fischer MH, Castel AD, Dodd DD, Pratt J (2003) Perceiving numbers causes spatial shifts of attention. Nat Neurosci 6:555–556 Posner MI, Cohen Y (1984) Components of visual orienting. In: Bouma H, Bouwhuis D (eds) Attention and Performance Vol. X. Erlbaum, pp 531–556
Perception of background color in head mounted displays: applying the source monitoring paradigm Nele M. Fischer, Robert R. Brauer, Michael Unger University of Applied Sciences Leipzig, Germany
Fig. 1 Mean response times in the target detection task. Error bars represent confidence intervals (95 %) for within-subject designs and were computed as recommended by Masson and Loftus (2003)
to press a key as soon as they had detected the target in the left or right box. Responses under 100 ms were excluded from analysis (1.1 %). The remaining RTs were submitted to a 2 (word: yes vs. no) x 2 (target location: left vs. right) ANOVA. Visuospatial attention was influenced by the words yes or no, as indicated by an interaction between word and target location. However, contrary to our hypothesis, an interference showed (see Fig. 1): Target detection occurred faster on the left after yes, and faster on the right after no, F(1,21) = 6.80, p = .016. One explanation for this unexpected pattern might be inhibition of return (see Posner, Cohen 1984). Upon perceiving the word yes or no, attention might move immediately to the right or to the left, but after it is withdrawn, participants might be slower to detect a stimulus displayed in this location. Using variable delays between word and target presentation should clarify this issue. Another possibility is that the observed pattern does not result from an association between yes/ no and right/left stemming from handedness, but rather corresponds to the order in which the words yes and no are usually encountered. When used together in a phrase, yes is usually used before no (e.g., ‘‘What’s your answer—yes or no?’’); as a result, in left-to-right writing cultures, yes might become associated with the left side, no with the right side. We are planning to investigate this possibility, as well as the question under which conditions an association between yes and the left side vs. yes and the right hand becomes activated, in future studies. References Casasanto D (2009) Embodiment of abstract concepts: good and bad in right- and left-handers. J Exp Psychol Gen 138:351–367 Dehaene S, Bossini S, Giraux P (1993) The mental representation of parity and number magnitude. J Exp Psychol Gen 122:371–396 de la Vega I, De Filippis M, Lachmair M, Dudschig C., Kaup B (2012) Emotional valence and physical space: limits of interaction. J Exp Psychol Hum Percept Perform 38:375–385 de la Vega I, Dudschig C, De Filippis M, Lachmair M, Kaup B (2013) Keep your hands crossed: the valence-by-left/right interaction is related to hand, not side, in an incongruent hand-response key assignment. Acta Psychol 142:273–277
123
Monocular look-around Head Mounted Displays (HMDs), for instance the Smart Glasses Vuzix M100, are wearable devices that enrich visual perception with additional information by placing a small monitor (e.g., LCD) in front of one eye. While having access to various kinds of information, users can engage in other tasks, such as reading assembly instructions on the HMD while performing a manual assembly task. To reduce the distraction from the main task, the information should be presented in a way that is perceived comfortable with as little effort as possible. It is likely that display polarity has an impact on information perception since positive polarity (i.e. black font on white background) is widely recognized for better text readability. However, in specific viewing conditions the bright background illumination of a positive polarity was found to reduce word recognition and induce discomfort compared to negative polarity (white font on black background) (Tai, Yan, Larson, Sheedy 2013). Since perception of HMDs might differ to some extend from stationary displays (e.g., Naceri, Chellali, Dionet, Toma 2010) and color has an impact on information perception (e.g., Dzulkifli, Mustafar 2013), we investigated the impact of polarity on perception in a monocular look-around HMD. If one type of polarity (positive or negative) is less distracting from the presented content, we would expect enhanced recognition due to a deeper processing of the material (Craik, Lockhart 1972). Meanwhile, the memory of the polarity itself should decrease when it is less distracting (source monitoring: Johnson, Hashtroudi, Lindsay 1993). Furthermore, subjective preference ratings should match the less distracting polarity (Tai et al. 2013). To test this, we conducted a recognition test within the source monitoring paradigm (Johnson et al. 1993) and asked participants for their polarity preference. In our experimental setting, 32 single-item words were presented in sequence with either positive or negative polarity on the LCD screen of the monocular look-around HMD Vuzix M100. Directly afterwards participants rated their preferred polarity. Following a short distraction the recognition and source memory test was conducted. All previously presented (old) words were mixed with the same amount of new distracter words. For each item, participants decided whether the item was previously presented or new and, if assigned old, they had to determine the item’s polarity (positive or negative). The results of our study on polarity for the monocular look-around display Vuzix M100 indicated that negative polarity increased word recognition and was preferred by participants. Despite our assumptions, the recognition of negative polarity (source monitoring) increased as well, which might be the effect of the higher recognition rate for items having negative polarity. These results do not only support a design decision, they also correspond to the subjective preference ratings of participants with data from memory research. Thus, preference ratings
Cogn Process (2014) 15 (Suppl 1):S1–S158 appear to be a good indicator for issues of user perception. Based on these results, we recommend the use of negative polarity to display short text information, e.g. assembly instructions, in monocular look-around HMDs with near-to-eye LCD display (e.g., approximately 4 cm distance to the eye in Vuzix M100), since it appears to be less distracting and more comfortable than positive polarity. Due to the small sample size, further examination is needed on this topic. References Craik FIM, Lockhart RS (1972) Levels of processing: a framework for memory research. J Verbal Learn Verbal Behav 11:671–684 Dzulkifli MA, Mustafar MF (2013) The influence of colour on memory performance: a review. Malaysian J Med Sci 20:3–9 Johnson MK, Hashtroudi S, Lindsay DS (1993) Source monitoring. Psychol Bull 114:3–28 Naceri A, Chellali R, Dionnet F, Toma S (2010) Depth perception within virtual environments: comparison between two display technologies. Int J Adv Intell Syst 3:51–64 Tai YC, Yan SN, Larson K, Sheddy J (2013) Interaction of ambient lighting and LCD display polarity on text processing and viewing comfort. J Vis 13(9) article 1157
Continuous goal dynamics: insights from mouse-tracking and computational modeling Simon Frisch, Maja Dshemuchadse, Thomas Goschke, Stefan Scherbaum Technische Universita¨t Dresden, Germany Goal-directedness is a core feature of human behavior. Therefore, it is mandatory to understand how goals are represented in the cognitive system and how these representations shape our actions. Here, we will focus on the time-dependence of goal-representations (Scherbaum, Dshemuchadse, Ruge, Goschke 2012). This feature of goal-representations is highlighted by numerous task-switching studies which demonstrate that setting a new goal is associated with behavioral costs (Monsell 2003; Vandierendonck, Liefooghe, Verbruggen 2010). Moreover, participants have difficulties to ignore previously relevant goals (perseveration, PS) or to attend to previously irrelevant goals (learned irrelevance, LI, c.f. Dreisbach, Goschke 2004). Thus, goals are not ‘‘switched on’’ or ‘‘off’’ instantaneously but take time to build up and decay. This is also assumed by connectionist models of task switching (e.g. Gilbert, Shallice 2002), where goal units need time to shift between different activation patterns. While both empirical evidence and theory underline the dynamic nature of goals, models and empirical findings have mostly been linked by comparing modelled and behavioral outcomes (e.g. response times). However, these discrete values provide only loose constraints for theorizing about the processes underlying these measures. Here, we aim towards a deeper understanding of continuous goal dynamics by comparing the continuous performance of a dynamic neural field (DNF) model with a continuous measure of goal-switching performance, namely mouse movements. Originally, the two phenomena of PS and LI were studied by Dreisbach and Goschke (2004) in a set switching task: participants categorized a number presented in a cued color (target) while ignoring a number in another color (distracter). After several repetitions, the cue indicated to attend to a new color. Two kinds of switches occurred: In the PScondition, the target was presented in a new color while distracters were presented in the previous target color (e.g. red). In the LI-condition, the target was presented in the previous distracter color while distracters were presented in a new color (e.g. green). While the results indicated typical switch patterns in response times for both conditions, the processes underlying the observed switch costs
S37 remained unclear. For example, Dreisbach and Goschke (2004) could only speculate whether the LI-effect was driven by difficulties to activate a goal that had been ignored beforehand or by a novelty boost that draws attention towards the distracting color. Addressing these open questions, we created a DNF- model of the task. Instead of including additional mechanisms to incorporate processes like attentional capture or goal-specific inhibition, we built the most parsimonious model that relies exclusively on continuously changing levels of goal activation. In this respect, DNFs are suited exceptionally well to model dynamic goal-directed behavior as they embrace cognition as a deeply continuous phenomenon that is tightly coupled to our sensorimotor systems (Sandamirskaya, Zibner, Schneegans, Scho¨ner 2013). Our model consists of three layers, similarly to previous models of goal-driven behavior and goal-switching (c.f. Gilbert, Shallice 2002; Scherbaum et al. 2012). A goal layer represents the cued target color by forming a peak of activation at a specific site. When activation reaches a threshold, it feeds into an associations-layer representing colors and magnitudes of the current stimuli. The emerging pattern of activation is then projected into a response layer, resulting in a tendency to move to the left or right. Notably, as is typical for DNF-models, all layers are continuous in representational space. This allowed us to study the model’s behavior continuously over time instead of obtaining discrete threshold responses. Crucially, the inert activation dynamics inherent to DNFs provide a simple mechanism for the time-consuming processes of goal-setting and shifting observed in behavioral data. A simulation study of the original paradigm indicated similar costs in response times for PS- and LI-switches as observed by Dreisbach and Goschke (2004). However, continuous response trajectories provided differential patterns for PS- and LI- trials: PS-switches yielded response trajectories that were deflected towards the previously relevant information, while LI-switches yielded a tendency to keep the response neutral for a longer time before deciding for one alternative. We validated these predictions in a set-switching experiment that was similar to the one conducted by Dreisbach and Goschke (2004). However, instead of responding with left or right key presses, participants moved a computer mouse into the upper left or right corner of the screen. As expected, goal switches induced switch costs in response times. More intriguingly, mouse movements replicated the model’s dynamic predictions: PS-switches yielded movements strongly deflected to the alternative response, whereas LIswitches yielded indifferent movements for a longer time than in repetition trials. In summary, our DNF-model and mouse-tracking data suggest that continuously changing levels of goal activation constitute the core mechanism underlying goal-setting and –shifting. Therefore, we advocate the combination of continuous modelling with continuous behavioral measures, as this approach offers new and deeper insights into the dynamics of goals and goal-directed action. References Dreisbach G, Goschke T (2004) How positive affect modulates cognitive control: reduced perseveration at the cost of increased distractibility. J Exp Psychol Learn Memory Cogn 30(2):343–353. doi:10.1037/0278-7393.30.2.343 Gilbert SJ, Shallice T (2002) Task switching: A PDP model. Cogn Psychol 44(3):297–337. doi:10.1006/cogp.2001.0770 Monsell S (2003) Task switching. Trend Cogn Sci 7(3):134–140. doi: 10.1016/S1364-6613(03)00028-7 Sandamirskaya Y, Zibner SKU, Schneegans S, Scho¨ner G (2013) Using dynamic field theory to extend the embodiment stance toward higher cognition. New Ideas Psychol 31(3):322–339. doi: 10.1016/j.newideapsych.2013.01.002 Scherbaum S, Dshemuchadse M, Ruge H, Goschke T (2012) Dynamic goal states: adjusting cognitive control without
123
S38
Cogn Process (2014) 15 (Suppl 1):S1–S158
conflict monitoring. NeuroImage 63(1):126–136. doi:10.1016/ j.neuroimage.2012.06.021 Vandierendonck A, Liefooghe B, Verbruggen F (2010) Task switching: Interplay of reconfiguration and interference control. Psychol Bull 136(4):601–626. doi:10.1037/a0019791
Looming auditory warnings initiate earlier eventrelated potentials in a manual steering task Christiane Glatz, Heinrich H. Bu¨lthoff, Lewis L.Chuang Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany Automated collision avoidance systems promise to reduce accidents and relieve the driver from the demands of constant vigilance. Such systems direct the operator’s attention to potentially critical regions of the environment without compromising steering performance. This raises the question: What is an effective warning cue? Sounds with rising intensities are claimed to be especially salient. By evoking the percept of an approaching object, they engage a neural network that supports auditory space perception and attention (Bach et al. 2008). Indeed, we are aroused by and faster to respond to ‘looming’ auditory tones, which increase heart rate and skin conductance activity (Bach et al. 2009). Looming sounds can differ in terms of their rising intensity profiles. While it can be approximated by a sound whose amplitude increases linearly with time, an approaching object that emits a constant tone is better described as having an amplitude that increases exponentially with time. In a driving simulator study, warning cues that had a veridical looming profile induced earlier braking responses than ramped profiles with linearly increasing loudness (Gray 2011). In the current work, we investigated how looming sounds might serve, during a primary steering task, to alert participants to the appearance of visual targets. Nine volunteers performed a primary steering task whilst occasionally discriminating visual targets. Their primary task was to minimize the vertical distance between an erratically moving cursor and the horizontal mid-line, by steering a joystick towards the latter. Occasionally, diagonally oriented Gabor patches (108 tilt; 18 diameter; 3.1 cycles/deg; 70 ms duration) would appear on either the left or right of the cursor. Participants were instructed to respond with a button-press whenever a pre-defined target appeared. Seventy percent of the time, these visual stimuli were preceded by a 1,500 ms warning tone, 1,000 ms before they appeared. Overall, warning cues resulted in significantly faster and more sensitive detections of the visual target stimuli (F1,8 = 7.72, p \ 0.05; F1,8 = 9.63, p \ 0.05). Each trial would present one of three possible warning cues. Thus, a warning cue (2,000 Hz) could either have a constant intensity of 65 dB, a ramped tone with linearly increasing intensity from 60 dB to approximately 75 dB or a comparable looming tone with an exponentially increasing intensity profile. The different warning cues did not vary in their influence of the response times to the visual targets and recognition sensitivity (F2,16 = 3.32, p = 0.06; F2,16 = 0.10, p = 0.90). However, this might be due to our small sample size. It is noteworthy that the different warning tones did not adversely affect steering performance (F2,16 = 1.65, p \ 0.22). Nonetheless, electroencephalographic potentials to the offset of the warning cues were significantly earlier for the looming tone, compared to both the constant and ramped tone. More specifically, the positive component of the event- related potential was significantly earlier for the looming tone by about 200 ms, relative to the constant and ramped tone, and sustained for a longer duration (see Fig. 1). The current findings highlight the behavioral benefits of auditory warning cues. More importantly, we find that a veridical looming tone induces earlier event-related potentials than one with a linearly
123
Fig. 1 The topographical plot shows the 500 ms after sound offset, with scalp maps plotted every 50 ms, for the constant (row 1), the ramped (row 2), and the looming tone (row 3). The looming cues evoked a strong positive deflection about 200 ms earlier than the other sounds. The black bar at the bottom of the figure indicates where the significance level of 0.01 was exceeded using a parametric test on the combined Fz, FCz, Cz, and Pz activity increasing intensity. Future work will investigate how this benefit might diminish with increasing time between the warning tone and the event that is cued for. References Bach DR, Schchinger H, Neuhoff JG, Esposito F, Salle FD, Lehmann C, Herdener M, Scheffler K, Seifritz E (2008) Rising sound intensity: an intrinsic warning cue activating the amygdala. Cerebral Cortex 18(1):145–150 Bach DR, Neuhoff JG, Perrig W, Seifritz E (2009) Looming sounds as warning signals: the function of motion cues. Int J Psychophysiol 74(1):28–33 Gray R (2011) Looming auditory collision warnings for driving. Human Factors 53(1):63–74
The creative process across cultures Noemi Go¨ltenboth1, C. Dominik Gu¨ss1,2, Ma. Teresa Tuason2 1 Otto-Friedrich Universita¨t Bamberg, Germany; 2 University of North Florida, USA Keywords Creativity, Culture, Artists, Cross-cultural comparison Creativity is the driving force of innovation in societies across the world, in many domains such as science, business, or art. Creativity means to come up with new and useful ideas (e.g., Funke 2008). Past research has focused on the individual, the creative process and its product, and the role of the social environment when evaluating creative products. According to previous research, individual difference variables such as intelligence and extraversion can partially predict creativity (e.g., Batey and Furnham 2006). Researchers have also shown the importance of the social environment when labeling products as creative or not (e.g., Csikszentmihalyi 1988). Although, creativity could be influenced by and differ among cultures, the influence of culture on creativity has been rarely studied. Creativity and Culture Culture can be defined as the knowledge base used to cope with the world and each other, shared by a group of people and transmitted from generation to generation (e.g., Gu¨ss et al. 2010). This knowledge encompasses, for example, declarative world knowledge, values and behaviors (e.g., norms, rituals, problem-solving strategies). Following this definition, different cultures could value different aspects of creativity (e.g., Lubart 1990).
Cogn Process (2014) 15 (Suppl 1):S1–S158 The current study is based on two recommendations of creativity researchers. First, it is important to study creativity across cultures as Westwood and Low (2003, p 253) summarized: ‘‘Clearly personality and cognitive factors impact creativity and account for individual differences, but when it comes to differences across cultures the picture is far from clear.’’ Second, researchers recommend ethnographic or socio-historical analyzes and case studies of creativity in different countries to study emic conceptions and to study the interaction of societal, family and other factors in creativity (e.g., Simonton 1975). The current study addresses these recommendations by investigating creativity across cultures focusing on experts from Cuba, Germany, and Russia. Method Going beyond traditional student samples, we conducted semi-structured interviews with experts, i.e., 10 Cuban, 6 Russian, and 9 German artists. Informed consent was obtained. All of the artists have received awards and fellowships for their creative work (i.e., compositions, books, poems, paintings). The interviews focused on a) their personal history, b) the creative process, and c) the role of culture during the creative process. These interviews lasted between 30 min and 1 h 43 min. They were transcribed verbatim and domains and themes were derived from these interviews using consensual qualitative research methodology (Hill et al. 2005). This means that at least 3 raters independently read and coded each transcribed interview. Then the raters met and discussed the codings until they obtained consensus. Results Several categories were mentioned by more than three quarters of all 25 participants. These categories refer to the following domains: 1) How I became an artist, 2) What being an artist means to me, 3) Creating as a cognitive process, 4) Creating as a motivational process, and 5) The role of culture in creating. Table 1 shows that German artists generally talk about financial problems and the problem of selling their work, a topic rarely mentioned by Cuban and Russian artists. Russian and German artists generally recognize persistence and hard work in creativity, and how a daily routine is helpful. A daily routine is rarely mentioned by Cuban artists. All artists, regardless of culture, recognize the universality of creativity, but acknowledge culture specific expressions. Discussion The current study is innovative as it investigates cultural differences among famous artists from Cuba, Russia, and Germany including different groups of artists. The semi-structured interviews reveal a wealth of different domains and categories related to creativity, and highlight the need for a holistic, action oriented, and system-oriented
S39 approach when studying creativity. The findings also broaden a narrow cognitive view on creativity, highlighting also the role of motivational and socio-cultural factors during the creative process (for the role of societal context in creativity see also Nouri et al. 2014). Whereas most artists experience similar creative processes, we also found themes highlighting the influence of the artists’ cultural background. Results are beneficial for further developing a comprehensive theory of the creative process taking cultural differences into consideration and perhaps integrating them in computational creativity models (e.g., Colton and Wiggins 2012). Acknowledgments This research was supported through a Marie-Curie IIF Fellowship to the second author and to a Fellowship of the Studienstiftung des deutschen Volkes to the first author. We would like to thank the artists for participating and allowing us a glimpse into their world that we may learn from their experiences. References Batey M, Furnham A (2006) Creativity, intelligence, and personality: a critical review of the scattered literature. Genetic Soc Gen Psych Monogr 132:355–429 Colton S, Wiggins GA (2012) Computational creativity: the final frontier? In: Proceedings of 20th European conference on artificial intelligence (ECAI). Montpellier, France, pp 21–26 Csikszentmihalyi M (1988) Society, culture and person: a systems view of creativity. In: Sternberg RJ (ed) The nature of creativity: contemporary psychological perspectives. Cambridge University Press, New York, pp 325–339 Funke J (2008) Zur Psychologie der Kreativita¨t. In Dresler M, Baudson TG (eds) Kreativita¨t. Beitra¨ge aus den Natur- und Geisteswissenschaften [Creativity: Contributions from natural sciences and humanities] (pp 31–36). Hirzel, Stuttgart Gu¨ss CD, Tuason MT, Gerhard C (2010) Cross-national comparisons of complex problem-solving strategies in two microworlds. Cogn Sci 34:489–520 Hill CE, Knox S, Thompson BJ, Williams EN, Hess SA, Ladany N (2005) Consensual qualitative research. J Counslg Psych 52:196–205. doi:10.1037/0022-0167.52.2.196 Lubart TI (1990) Creativity and cross-cultural variation. Int J Psych 25:39–59 Nouri R, Erez M, Lee C, Liang J, Bannister BD, Chiu W (2014) Social context: key to understanding culture’s effects on creativity. J Org Behav. doi:10.1002/job.1923
Table 1 Some cultural differences in category frequencies Cuba
Russia
Germany
Being an artist means being financially uncertain
Variant
Typical
General
Being an artist means to deal with the necessary evil of marketing and selling the work
Rare
Rare
Typical
Being creative is natural to human beings
Variant
Typical
Variant
Creativity is persistence and hard work
Variant
General
General
It helps me to have a daily regular routine
Rare
Variant
Typical
Variant
Typical
Creativity is universal, but culture provides specific expressions (forms and circumstances) Typical for creativity General * [90 %
General = 9–10 General = 6
Typical * 50–89 %
Typical = 5–8
Variant * 11–49 %
Variant = 3–4
Variant = 2–3 Variant = 3–4
Rare * \10 %
Rare = 1–2
Rare = 1
General = 8–9
Typical = 4–5 Typical = 5–7 Rare = 1–2
123
S40 Simonton DK (1975) Sociocultural context of individual creativity: a trans-historical time-series analysis. J Pers Soc Psych 32:1119–1133 Westwood R, Low DR (2003) The multicultural muse: culture, creativity and innovation. Int J Cross Cult Manag 3:235–259
How do human interlocutors talk to virtual assistants? A speech act analysis of dialogues of cognitively impaired people and elderly people with a virtual assistant Irina Grishkova1, Ramin Yaghoubzadeh2, Stefan Kopp2, Constanze Vorwerg1 1 University of Bern, Switzerland; 2 Bielefeld University, Germany An artificial daily calendar assistant was developed to provide valuable support for people with special needs (Yaghoubzadeh et al. 2013). Users may interact differently when they communicate with an artificial system. They normally tend to adapt their linguistic behavior (Branigan et al. 2009), but different users may have different interaction styles (Wolters et al. 2009). In this study, we investigated how people with cognitive impairments and elderly people talk to their virtual assistant, focusing on pragmatic aspects: speech acts performed, and the linguistic means use to perform them. A starting point of our analysis is the observation that the patterns in which linguistic actions occur, and which provide socially shaped potentials for achieving goals, (Ehlich, Rehbein 1979) are not necessarily linear, but often manifest characteristic recursivity, decision points, supportive accessory patterns, and omissions of pattern elements (Grießhaber 2001). In addition, the linguistic means used to perform linguistic action units may vary considerably. We addressed two questions: (1) What communication patterns between a human and an artificial assistant occur in each of three groups of users (elderly people, people with cognitive impairments, control group) when making a request to enter an appointment? (2) What linguistic forms are typically used by the three user groups for making those requests? To answer these questions, we carried out a pragmatic analysis of conversations between participants of these three groups and the artificial assistant based on Searle’s speech act theory (Searle 1969; 1976), and techniques of the functional-pragmatic discourse analysis (Grießhaber 2001). Three user groups participated in the study: cognitively impaired people (A) were all participants had light to medium mental retardation (approximately F70-F71 on the APA DSM scale [American Psychiatric Association 2000]), elderly people (B), and a control group (C) (Yaghoubzadeh et. al. 2013). The participants were handed cards with appointments and asked to plan the appointments for the following week by speaking to the virtual assistant if it were a human being. The assistant was presented on a TV screen and as being able to understand the user and speak to him, using a Wizard-of-Oz technique. All interactions between the participants and the assistant were recorded and transcribed. We split all dialogues in dialogue phases and annotated the speech acts performed by both the human interlocutor and the artificial assistant within a conversation. Therefore each dialogue phase was split in minimal communication items—speech acts (Searle 1969), using a pattern oriented description (Hindelang 1994). For each speech act, we provided its definition in terms of illocutionary force and rules for performance (Searle 1969), as well as the complete list of linguistic forms used in the conversations. We modeled the structures of the pertinent dialogue phases (greeting, making an appointment, farewell) for each of the three
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 groups, as sequence patterns in the form of network structures (with speech acts as nodes and possible reactions as linking arrows). The smallest units in these structures were the speech acts determined by the definitions provided. Based on this, sequences of speech acts were analyzed. We also investigated the range and frequency of reactions found in the dialogues to a particular speech act. The relative frequencies of speech act sequences were determined for greeting and farewell phases as well for particular speech acts, such as expressives and assertives, for each of the user groups. The politeness of discourse was determined by number of expressive speech acts and complexity of speech in terms of number of assertive speech acts (used to specify a request or explain an appointment) following a directive speech act. Results show that the elderly interlocutors have a more complicated dialogue structure when communicating with an artificial assistant. They use more assertive utterances like explaining, repeating, specifications. Furthermore, we have found that some of the elderly speakers use more different expressive speech acts, compared to cognitively impaired people, demonstrating more politeness towards an artificial assistant. The analysis of linguistic means has yielded a number of different forms when requesting a virtual assistant to enter an appointment in their virtual calendar. The linguistic forms used in the dialogues were classified as I- and we-form, form of third person, or neutral form. The most frequently used forms were I-form and neutral form. Participants from A use the neutral form twice as much as the I-form. In contrast, C users use the I-form twice as much as the neutral form. Participants from B also use I-form most frequently, but in contrast to A or C, they also use the we-form and the form of third person. Altogether, the results show that there are no fundamental differences in dialogue patterns between groups; however, there is a larger heterogeneity in the group A, and especially in the group B, as compared to the group C. The group B does also seem to display a larger diversity in linguistic means. References American Psychiatric Association (2000) Diagnostic and statistical manual of mental disorders DSM-IV-TR, 4th ed. American Psychiatric Publ., Arlington, VA Branigan HP, Pickering JM, Pearson J, McLean JF (2010) Linguistic alignment between people and computers. J Pragmat 42: 2355–2368 Ehlich K, Rehbein J (1979 a) Sprachliche Handlungsmuster. In: Soeffner HG (Hrsg.) Interpretative Verfahren in den Sozial- und Textwissenschaften. Metzler, Stuttgart, pp 243–274 Grießhaber W (2001) Verfahren und Tendenzen der funktionalpragmatischen Diskursanalyse. In: Ivanyi Z, Kertesz A (Hrsg.) Gespra¨chsforschung. Tendenzen und Perspektiven. Peter Lang, Frankfurt am Main, pp 75–95 Hindelang G (1994) Sprechakttheoretische Dialoganalyse. In: Fritz G, Hundsnurscher F (Hrsg.), Handbuch der Dialoganalyze. Niemeyer, Tu¨bingen, pp 95–112 ¨ berSearle J (1969) Sprechakte. Ein sprachphilosohischer Essay. U setzt von Wiggershaus R und R. Suhrkamp Taschenbuch Wissenschaft, Frankfurt am Main Searle J (1976) A classification of illocutionary acts. Lang Soc 5(1):1–23 Wolters M, Georgila K, Moore JD, MacPherson SE (2009) Being old doesn’t mean acting old: how older users interact with spoken dialog systems. ACM Transa Accessible Comput 2(1):2 Yaghoubzadeh R, Kramer M, Pitsch K, Kopp S (2013) Virtual agents as daily assistants for elderly or cognitively impaired people. In: Intelligent virtual agents. Springer, Berlin, pp 79–91
Cogn Process (2014) 15 (Suppl 1):S1–S158
Effects of aging on shifts of attention in perihand space Marc Grosjean1, Nathalie Le Bigot2 1 Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany; 2 University of Bretagne Occidentale & CNRS (Lab-STICC—UMR 6285), Brest, France It is well established that visual processing is altered for stimuli that appear near the hands, that is in perihand space (for a recent review, see Brockmole et al. 2013). For example, placing one’s hands near a display has been shown to increase visual sensitivity (Dufour and Touzalin 2008), enhance attentional engagement, such as the ability to detect changes in dynamic displays (Tseng and Bridgeman 2011), but also to slow down attentional disengagement, as evidenced by longer search times when trying to find a target stimulus in a cluttered display (Abrams et al. 2008). A number of studies suggest that these hand-proximity effects, as they are known, are modulated by the functionality of the hands and that visual processing is altered at locations where action is more likely to occur (e.g., Le Bigot and Grosjean 2012; Reed et al. 2010). Although it is well documented that cognitive processing generally becomes slower and less accurate over the lifespan (e.g., Verhaegen and Salthouse 1997), hand-proximity effects have rarely been studied with regard to aging. Of particular relevance for the present study, sensorimotor abilities are also known to deteriorate with age, especially for hand movements (Ranganath et al. 2001). These age-related changes presumably reduce the overall functionality of the hands, which in turn could influence how visual processing changes in perihand space. To test this notion, we sought to examine whether visual processing, in general, and shifts of attention, in particular, are affected by hand proximity in the same way for younger and older individuals. In a covert-orienting task (Posner 1980), younger (mean age \ 25 years) and older (mean age [ 65 years) right-handed adults were asked to discriminate between a target (letter) and distractor stimulus that could appear at a peripheral left or right location. The stimulus was preceded by an uninformative peripheral cue (stimulus-onset asynchrony = 100 ms) that was presented either at the upcoming stimulus location (valid trial) or at the opposite location (invalid trial). Participants performed the task under four hand-position configurations: Left only, right only, both hands, or no hands (control condition) near the display. As expected, older adults were overall slower to respond than younger adults, and both age groups showed a reliable cueing effect: Responses were faster on valid than on invalid trials. Interestingly, younger adults also revealed an interaction between cue validity and hand position, which reflected that the cueing effects were larger when their dominant hand was near the display. The latter finding is in line with those of Llyod et al. (2010), who also observed that involuntary shifts of attention are affected by hand proximity (for younger adults) and that this effect seems to be limited to the right (dominant) hand. More generally, these findings suggest that hand proximity affects visual processing in different ways for younger and older adults. This may reflect how the functionality of the hands and people’s representation of peripersonal space changes when cognitive and motor skills become slower and less accurate over the lifespan. Consistent with this notion, it has been shown that older individuals tend to have a more compressed representation of peripersonal space (Ghafouri and Lestienne 2000) than younger adults and tend to spatially allocate their attention more around the trunk of their body than around their hands (Bloesch et al. 2013). Both age groups also showed evidence of a right hemi-field advantage (i.e., faster responses to stimuli presented to the right than to the left of fixation), which is most likely due to a left-hemisphere (right-hemifield) advantage in processing linguistic stimuli (Geffen et al. 1971). However the latter effect was modulated by hand position
S41 for older adults only. In particular, the advantage was larger when their dominant hand was near the display. These results further suggest that visual processing is differentially affected by hand proximity in younger and older adults. In contrast to younger adults, which showed an effect of hand proximity on the involuntary shifting of attention, hand position seems to only affect the attentional prioritization of space in older adults (Reed et al. 2006). References Abrams RA, Davoli CC, Du F et al. (2008) Altered vision near the hands. Cognition 107:1035–1047 Bloesch EK, Davoli CC, Abrams RA (2013) Age-related changes in attentional reference frames for peripersonal space. Psychol Sci 24:557–561 Brockmole JR, Davoli CC, Abrams RA, Witt JK (2013) The world within reach: effects of hand posture and tool-use on visual cognition. Curr Direction Psychol Sci 22:38–44 Dufour A, Touzalin P (2008) Improved visual sensitivity in the perihand space. Exp Brain Res 190:91–98 Geffen G, Bradshaw JL, Wallace G (1971) Interhemispheric effects on reaction time to verbal and nonverbal visual stimuli. J Exp Psychol 87:415–422 Ghafouri M, Lestienne FG (2000) Altered representation of peripersonal space in the elderly human subject: a sensorimotor approach. Neurosci Lett 289:193–196 Le Bigot N, Grosjean M (2012) Effects of handedness on visual sensitivity in perihand space. PLoS ONE 7(8): e43150 Llyod DM, Azanon E, Poliakoff E (2010) Right hand presence modulates shifts of exogenous visuospatial attention in near perihand space. Brain Cogn 73:102–109 Posner MI (1980) Orienting of attention. Quart J Exp Psychol 32:3–25 Ranganath VK, Siemionow V, Sahgal VS, Yue GH (2001) Effects of aging on hand function. J Am Geriatrics Soc 49:1478–1484 Reed CL, Betz R, Garza JP, Roberts RJ Jr (2010) Grab it! Biased attention in functional hand and tool space. Attention Perception Psychophys 72:236–245 Reed CL, Grubb JD, Steele C (2006) Hands up: attentional prioritization of space near the hand. J Exp Psychol Human Percept Performance 32:166–177 Tseng P, Bridgeman B (2011) Improved change detection with nearby hands. Exp Brain Res 209:257–269 Verhaegen P, Salthouse TA (1997) Meta-analyzes of age-cognition relations in adulthood: estimates of linear and nonlinear age effects and structural models. Psychol Bull 122:231–249
The fate of previously focused working memory content: decay or/and inhibition? Johannes Großer, Markus Janczyk Department of Psychology III, University of Wu¨rzburg, Germany Working memory is thought to allow short term storage of information in a state in which this information can be manipulated by ongoing cognitive processes. Evidence from various paradigms suggests that at any time only one item held in working memory is selected for possible manipulation. Oberauer (2002) has thus suggested a 1-item focus of attention within his model of working memory. Conceivably, this focus of attention needs to shift between several items during task performance and the following question is unresolved: What happens to a formerly selected, but now de-selected, item? Several studies have addressed this question, with opposing results. Bao, Li, Chen, and Zhang (2006) investigated verbal
123
S42 working memory with an updating task where participants count the number of occurrences of (three) different sequentially presented geometric objects (e.g., Garavan 1998; see also Janczyk, Grabowski 2011). In particular, they employed the logic typically used to show n - 2 repetition costs in task-switching experiments and found slower updating in ABA than in CBA sequences, i.e., evidence for an active inhibition of de-selected items (but see Janczyk, Wienrich, Kunde 2008, for no signs of inhibition with a different paradigm). Rerko and Oberauer (2013) investigated visual working memory with the retro-cue paradigm. Participants first learned an array of briefly presented colored items. Long after encoding, one, two, or three retro-cues (arrows) were presented one after another, with always the last one pointing to the particular location that is subsequently tested with a change detection task. (The retro-cue effect refers to the finding of improved performance after valid compared with neutral cues.) In the critical condition, Rerko and Oberauer presented three retro-cues to employ the n - 2 repetition logic and found evidence for passive decay of de-selected items. These diverging results obviously come with many differences between experiments: verbal vs. visual working memory, three working items vs. six working items, two different groups of participants, and so on. Here we present ongoing work aiming at identifying the critical factor(s). As a first step, we attempted to replicate the results of Bao et al. (2006) and Rerko and Oberauer (2013) within one sample of participants. A group of n = 24 students took part in two experiments (we excluded participants with less than 65 % correct trials; 10 in Exp 1/3 in Exp 2). In Experiment 1, participants performed in a three-objects updating task and we compared performance in ABA and CBA trials. ABA trials yielded longer RTs (see Fig. 1, left panel), thus pointing to inhibitory mechanisms just as Bao et al. (2006) reported. In Experiment 2, participants performed in a retrocue task with 1, 2, or 3 retro-cues presented one after another. Most importantly, in the ‘‘3 retro-cue condition’’ the cues either pointed to three different locations (CBA) or the first and the third cue pointed to the same location (ABA). We did not observe a difference in accuracy in this case, but RTs were longer in CBA than in ABA trials (see Fig. 1, right panel), thus pointing to passive decay but not to inhibitory mechanisms. After all, with one single sample of participants we were able to largely replicate the diverging results from two tasks that were designed to answer the same research question. Given this, it appears worthwhile to us to continue this work and to isolate critical factors. This work is currently in progress.
Fig. 1 Response times (RT) in milliseconds (ms) of Experiments 1 and 2 as a function of trial sequence (CBA [control] vs. ABA [inhibition])
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 References Bao M, Li ZH, Chen XC, Zhang DR (2006) Backward inhibition in a task of switching attention within verbal working memory. Brain Res Bull 69: 214–221 Garavan H (1998) Serial attention within working memory. Mem Cogn 26:263–276 Janczyk M, Grabowski J (2011) The focus of attention in working memory: evidence from a word updating task. Memory 19:211–225 Janczyk M, Wienrich C, Kunde W (2008) On the costs of refocusing items in working memory: a matter of inhibition or decay? Memory 16:374–385 Oberauer K (2002) Access to information in working memory: exploring the focus of attention. J Exp Psychol Learn 28:411–421 Rerko, L., Oberauer, K. (2013) Focused, unfocused, and defocused information in working memory. J Exp Psychol Learn 39 1075–1096
How global visual landmarks influence the recognition of a city Kai Hamburger, Cate Marie Trillmich, Franziska Baier, Christian Wolf, Florian Ro¨ser University of Giessen, Giessen, Germany Abstract What happens if characteristic landmarks are taken out of a city scene or being interchanged? Are we still able to recognize the city scene itself or are we fooled by the missing or misleading information? What information is then represented in our mind and how? Findings are discussed with respect to attentional capture and decision making. Keywords Spatial cognition, Visual landmarks, Recognition, Attention, Decision making Introduction Famous cities are hard to recognize if the characteristic global landmark is taken out of the city scene. In this context we define a global landmark as a (famous) building that may be used for orientation purposes from multiple viewpoints (however, other objects such as trees, mountains, rivers, etc. may also represent landmarks). Here, we focus on visual information processing and show that a global landmark in form of a famous building by itself does not necessarily lead to successful recognition of major city scenes. Thus, we assume that the landmark (object) alone is very helpful for spatial representations and spatial orientation, but the context/surrounding (city scene) is often required for a full and correct mental representation. Thus, the isolated objects sometimes lead to inappropriate mental representations and may also lead us totally astray, especially when they are interchanged. Evans et al. (1984) stated that landmarks and the pathway’s grid configuration facilitates geographic knowledge and that especially visual landmarks improve comprehension of place locations. But, the authors also noted that manipulations of the grid configuration and landmark placement in a simulated environment setting cause changes in environmental knowledge. According to Clerici and Mironowicz (2009) it is important to distinguish between landmarks acting as markers, which could therefore be replaced by direction signs and indicators, and landmarks acting as marks and brands of a specific city, which can be considered as a key factor for the quality of urban life (e.g., Big Ben in London or Golden Gate Bridge in San Francisco). So, what are the relevant visual information characterizing a city scene? Methods The experiment to examine the influence of a famous landmark on city recognition was conducted on a standard PC presenting the different
Cogn Process (2014) 15 (Suppl 1):S1–S158
S43
Fig. 1 Original (left): city scenes of Berlin with the TV Tower (‘‘Alex’’) and Paris with the Eiffel Tower; Modified (center and right): without TV and Eiffel Tower, and (right) Berlin with the Eiffel Tower of Paris and vice versa
combinations of (isolated/interchanged) landmarks and their corresponding cities. Each city scene/landmark only occurred once (betweensubject factor). Participants were assigned to the different combinations randomly. An example is given in Fig. 1, while Table 1 presents the questions raised with all further experimental details and results. Results To summarize the results: 1. In general, many city scenes (46 %) could be identified correctly if landmark and surrounding were a match (original city scene); 2. Participants had severe difficulties recognizing some of the given cities when the characteristic landmark was missing (e.g., Berlin without the TV Tower, Paris without the Eiffel Tower, Sydney without the opera); 3. Some cities could still be recognized very well without the characteristic landmark (London, Venice); and 4. Most participants were totally fooled when other (deceptive) landmarks were shown instead of the original ones. Discussion We demonstrate that a city scene without a characteristic global landmark may be recognized correctly in some cases and wrongly in others; while an object presented in a new context may lead to incorrect or inappropriate information retrieval from memory (semantic network).
Presented in a different context the most prominent landmark is more important (e.g., dominates the decision/judgment) than its immediate surroundings (including other potential landmarks and landscapes, e.g., mountains). But, sometimes the city scene seems to contain more important information than just one characteristic landmark and it can still be recognized successfully without it (e.g., London, Venice). In our experiment, the object pops out from the city scene and captures our attention (bottom-up). This attentional capture might prevent that information from the visual scene/surrounding city is considered for recognition. The recognition process is therefore only based on information about the deceptive landmark (top-down). In this case, the attentional capture might be caused by the high contextual salience of the landmark (Caduff, Timpf 2008) as it is clearly distinguishable from the rest of the scenery. This phenomenon could as well be explained within a semantic network with two contradicting associations: One is based on the deceptive landmark while the other is based on the surroundings. The attentional capture on the deceptive landmark inhibits any information of the further city scene to be considered for recognition. Another possible interpretation could come from the research field of decision making: According to dual process theories (type 1 versus type 2
Table 1 Research questions and results for the 31 observers Experimental design and results Category
Example
Questions
Result (%, ms)
Cities and landmarks
Paris with Eiffel Tower (Figure 1 bottom, left)
1. Do you know this city? (affirmations [%])
64 % 2,386 ms
2. What is the name of the city? (correct labeling [%])
46 % 1,887 ms
3. How confident are you with your answer? (Scale from 1 to 7; 1 = very confident–7 = very insecure/unsecure)
2.10
1. Do you know this city? (affirmations [%])
35 % 2,801 ms
2. What is the name of the city? (correct labeling [%])
19 % 2,037 ms
3. How confident are you with your answer? (Scale from 1 to 7; 1 = very confident)
2.83
1. Do you know this city? (affirmations [%])
50 % 3,268 ms
2. What is the name of the city? (correct labeling [%])
8%
1,982 ms
3. How confident are you with your answer? (Scale from 1 to 7; 1 = very confident–7 = very insecure/unsecure)
3.05
2,864 ms
Correct labeling the city the landmark is really located in [%]
31 %
Cities without landmarks
Cities with deceptive landmarks
Paris without Eiffel Tower (Figure 1 bottom, middle)
Paris with TV Tower Berlin (Figure 1 bottom, right)
2,024 ms
2,744 ms
Participants answered three questions in the three conditions. N = 31 (students of the University of Giessen) 18f:13 m, mean age: 25 years (SD = 4.4)
123
S44 processing), decisions (here: what city is represented?) could be made consciously and unconsciously (e.g., Markic 2009). One key aspect of the unconscious, automatic process is associative learning (Evans 2003), which might explain that a single landmark stores all of the relevant information for the context (object = city = explicit knowledge). This experiment shows some important connections between perception and recognition of spatial information on one side and theories of attention and decision making on the other. This could serve as a valuable basis for future research on visuo-spatial information processing. References Caduff D, Timpf S (2008) On the assessment of landmark salience for human wayfinding. Cogn Process 9(4):249–267 Clerici A, Mironowicz I (2009) Are landmarks essential to the city— its development? In: Schrenk M, Popovich V V, Engelke D, Elisei P (eds) REAL CORP 2009: Cities 3.0—Smart, sustainable, integrative: strategies, concepts and technologies for planning the urban future. Eigenverlag des Vereins CORP—Competence Center of Urban and Regional Planning, pp 23–32 Evans J St B (2003) In two minds: dual-process accounts of reasoning Cogn Sci 7(10): 454–458 Evans G W, Skorpanich M A, Ga¨rling T, Bryant K J, Bresolin B (1984) The effects of pathway configuration, landmarks and stress on environmental cognition. J Exp Psychol 4:323–335 Markic O (2009) Rationality and emotions in decision making. Interdiscip Descrip Complex Syst 7(2):54–64
Explicit place-labeling supports spatial knowledge in survey, but not in route navigation Gregor Hardiess, Marc Halfmann, Hanspeter Mallot Cognitive Neuroscience, University of Tu¨bingen, Germany The knowledge about the navigational space develops with landmark and route knowledge as the precursors of survey (map-like) knowledge (Siegel, White 1975)—a scheme that is widely accepted as the dominant framework. Route knowledge is typically based on an egocentric reference frame and learning a route is simply forming place-action associations between locations (places) and the actions to take in the sequence of the route. On the other hand, in survey knowledge, places need to be represented independently of viewing direction and position. Furthermore, survey representations include configural knowledge about the relations (topologic, action-based, or graph-like) between the places in the environment. In wayfinding, it seems that navigators can draw upon different memory representations and formats of spatial knowledge depending on the task at hand and the time available for learning. The hierarchy of spatial representation comprises different levels of granularity. At the finest level, the recognition of landmarks (i.e., salient and permanent patterns or objects, available in the environment) has to be considered. Grouping spatially related landmarks together leads to the concept of a place, the fundamental unit of routes and maps. Building a route involves the connection of places with the corresponding spatial behavior. At this intermediate level, several routes can exit in parallel also with spatial overlap but without interactions to each other (Mallot, Basten 2009). Route combination occurs first at the level of survey representations. Here, the embedding of places as well as routes in a so called ‘cognitive map’ as a configural representation of the environment enables the creation of novel routes and shortcuts to find the goal (Tolman 1948). On top of the hierarchy, the coarsest level of granularity is provided by the formation of regions (Wiener, Mallot 2003), where spatially related
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 parts of the map cluster together. Depending on task demands and the time available for spatial learning, the coding of space can be supported at each of the levels of granularity or in combination. The interaction of language and space has been studied on a wide variety of aspects including the acquisition of spatial knowledge from verbal descriptions, verbal direction giving, influences of spatial reference frames which are employed in specific languages on the judgment of similarity of spatial configurations, or retrospective reports of spatial thinking. Little is known, however, about possible functions of language-based or language-supported representations in actual navigational, or wayfinding behavior. In a dual task study, Meilinger et al. (2008) showed that verbal distractor tasks are more detrimental to route navigation than distractor tasks involving visual imagery or spatial hearing. In an ongoing study, Meilinger et al. (2009) investigate advantages of different types of verbal place codes, i.e. names describing local landmarks vs. arbitrary names. In this study, descriptive naming leads to better navigational results than arbitrary naming. In this study, the role of language-supported representations of space was assessed in two wayfinding experiments (using virtual reality) with labeling of places using a route and a survey knowledge task, respectively. In the association phase of both tasks, subjects are requested to label the places either with semantically meaningful names (word condition) or icons (icon condition) to build up a link between sensorimotor and language representation. In a control condition no labeling was required. In the route task, subjects simply learned to repeat a route (containing 10 places) from a given starting point to a goal location in a stereotyped way (route phase). In the survey task, subjects first had to learn a set of four intersecting routes (containing 4–5 places) and then are asked to infer four novel routes by recombining sections of learned routes (survey phase). Wayfinding performance was assessed by the distance subjects travelled to find the goal in PAO (percentage above optimum). Overall, we found no differences between word-based and icon-based labeling. Labeling supported wayfinding not in in the route (no effect of label condition on distance), but in the survey knowledge task. There, subjects performed the survey phase in the word as well as in the icon condition with reduced walking compared to the control condition. Furthermore, this supporting effect was more pronounced in subjects with good wayfinding scores. We conclude that the associated place-labels supported the formation of abstract place concepts and further the inference of novel routes from known route segments which are useful in the more complex (higher hierarchy and representational level) survey, but not in the simple route tasks where just stereotyped stimulus–response associations without planning are needed. References Mallot HA, Basten K (2009) Embodied spatial cognition: Biological and artificial systems. Image Vision Comput 27(11):1658–1670 Meilinger T, Knauff M, Bu¨lthoff HH (2008) Working memory in wayfinding—a dual task experiment in a virtual city. Cogn Sci 32(4):755–770 Meilinger T, Schulte-Pelkum J, Frankenstein J, Laharnar N, Hardiess G, Mallot HA, Bu¨lthoff HH (2009) Place naming—examining the influence of language on wayfinding. In: Taatgen N, van Rijn H (eds) Proceedings of the thirty-first annual conference of the cognitive science society. Cognitive Science Society Siegel AW, White SH (1975) The development of spatial representations of largescale environments. Adv Child Dev Behav 10:9–55 Tolman EC (1948) Cognitive maps in rats and man. Psychol Rev 55:189–208 Wiener JM, Mallot HA (2003) ‘Fine-to-coarse’ route planning and navigation in regionalized environments. Spatial Cogn Comput 3(4):331–358
Cogn Process (2014) 15 (Suppl 1):S1–S158
How important is having emotions for understanding others’ emotions accurately? Larissa Heege, Albert Newen Ruhr-University Bochum, Germany Mirror neuron theory for understanding others’ emotions According to the research group, which discovered mirror neurons in Parma, emotions can be understood through cognitive elaborations of visual emotional expressions and without a major involvement of mirror neuron mechanisms. They assume, though, that this provides only a ‘pale’ and ‘detached’ account of others’ emotions (Rizzolatti et al. 2004): ‘It is likely that the direct viscero-motor mechanism scaffolds the cognitive description, and when the former mechanism is not present or malfunctioning, the latter provides only a pale, detached account of the emotions of others.’ (Rizzolatti et al. 2004). Mirror neurons in reference to emotions are neurons that fire when we have an emotion as well as when we observe somebody else having the same emotion. It is assumed that mirror neuron mechanisms evoke in the observer an understanding of others’ emotions, which is based on resonances of the observer’s emotions. This way an automatic first-person-understanding of others’ emotions is originated (Rizzolatti and Sinigaglia 2012; Rizzolatti et al. 2004): ‘Side by side with the sensory description of the observed social stimuli, internal representations of the state associated with these […] emotions are evoked in the observer, ‘as if’ they […] were experiencing a similar emotion.’ (Rizzolatti et al. 2004). Thus somebody, who is not able to have a specific emotion, would also not be able to have a first person ‘as if’ understanding of this emotion in others. Resonances of own emotions could not be produced, the mirror neuron mechanism would not be present or could not work appropriately. If this person used instead primarily cognitive elaborations to understand this emotion in others, his emotion understanding should be ‘pale’ and ‘detached’, according to mirror neuron theory. Psychopaths and having the emotion of fear Primary (low-anxious) psychopaths, demonstrated in the PANAS (‘positive affect, negative affect scales’) a significant negative correlation with having the emotion of fear (-.297) (Del Gaizo and Falkenbach 2008). Furthermore, an experiment showed that psychopaths, in contrast to non-psychopaths, do not get anxious when they breathe in the stress sweat of other people (Dutton 2013). Psychopaths also have a reduced amygdala activity (Gordon et al. 2004) and a reduced startle response (Herpertz et al. 2001). Psychopaths and understanding fear in others In a study 24 photographs showing different facial expressions (happy, sad, fearful, angry, disgusted and neutral) were presented to psychopathic inmates and non-psychopaths. The psychopathic inmates demonstrated a greater skill in recognizing fearful faces than the non-psychopaths (Book et al. 2007): ‘[A] general tendency for psychopathy [is] to be positively associated with increased accuracy in judging emotional intensity for facial expressions in general and, more specifically, for fearful faces.’ (Book et al. 2007). Psychopaths also identify bodily expressions, which are based on fear/anxiety, significantly better than non-psychopaths: Ted Bundy, a psychopathic serial killer, stated that he could identify a ‘good victim’ due to her gait. Relating to this statement, in a study twelve videos of people walking through a corridor were shown to psychopaths and non-psychopaths; six of the walking people had been victims in their past. The psychopaths and non-psychopaths had to decide how likely
S45 the persons in the videos were to get mugged. The study found a ‘robust, positive correlation’ between primary (low-anxious) psychopathic traits and accuracy in naming the persons, who had been victims in their past, to be the ones most likely to get mugged. Secondary (high-anxious) psychopaths did not demonstrate such a skill (Wheeler et al. 2009). In a similar study five students had to walk through a lecture hall, in front of other students with many and few psychopathic traits. One of the walking students carried a hidden handkerchief. The students with many and few psychopathic traits had to guess who hid the handkerchief. Seventy percent of the students with many psychopathic traits named the right student; of the students with few psychopathic traits just thirty percent named the student with the handkerchief (Dutton 2013). In another study, people with many psychopathic traits, showed a decreased amygdala activity during emotion-recognizing tasks. The people with primary psychopathic traits showed also an increased activity in the visual and the dorsolateral prefrontal cortex. So primary psychopaths use much more brain areas, which are associated with cognition and perception, when they solve emotion-recognizing tasks (Gordon et al. 2004). Conclusions Primary psychopaths use primarily cognitive elaborations to understand others’ emotions and (almost) do not have the emotion of fear. Thus according to mirror neuron theorists, psychopaths should have a ‘pale, detached’ account of fear in others (see end of first paragraph). Psychopaths are surely not able to have a first person ‘as if’ understanding of others’ fear: They cannot feel fear with others. In this way it is possible to say that psychopaths have a ‘pale, detached’ account of others’ emotions. However, it cannot be said that the outcome of their understanding of others’ fear is ‘pale’ and ‘detached’. In fact, they recognize others’ fear often more accurately than people, who are able to have fear. Also we can conclude that (at least for psychopaths) having an emotion is not important for understanding this emotion in others accurately. References Book AS, Quinsey VL, Langford D (2007) Psychopathy and the perception of affect and vulnerability. Crim Justice Behav 34(4):531–544. doi:10.1177/0093854806293554 Del Gaizo AL, Falkenbach DM (2008) Primary and secondary psychopathic-traits and their relationship to perception and experience of emotion. Pers Indiv Differ 45:206–212. doi: 10.1016/j.paid.2008.03.019 Dutton K (2013) The wisdom of psychopaths. Arrow, London Gordon HL, Baird AA, End A (2004) Functional differences among those high and low on a trait measure of psychopathy. Biol Psychiatry 56(7):516–521. doi:10.1016/j.biopsych.2004.06.030 Herpertz SC, Werth U et al. (2001) Emotion in criminal offenders with psychopathy and borderline personality disorder. Arch Gen Psychiatry 58:737–745. doi:10.1001/archpsyc.58.8.737 Rizzolatti G, Sinigaglia C (2008) Mirrors in the brain. Oxford University Press, Oxford Wheeler S, Book AS, Costello K (2009) Psychopathic traits and perceptions of victim vulnerability. Crim Justice Behav 36:635–648. doi:10.1177/0093854809333958 Rizzolatti G, Gallese V, Keysers C (2004) A unifying view of the basis of social cognition. Trends Cogn Sci 8(9):396–403. doi: 10.1016/j.tics.2004.07.002
123
S46
Prosody conveys speakers’ intentions: acoustic cues for speech act perception Nele Hellbernd, Daniela Sammler Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany Recent years have seen a major change in views on language and language use. During the last decades, language use has been more and more recognized as an intentional action (Grice 1957). In the form of speech acts (Austin 1962; Searle 1969), language expresses the speaker’s attitudes and communicative intents to shape the listener’s reaction. Notably, the speaker’s intention is often not directly coded in the lexical meaning of a sentence, but rather conveyed implicitly, for example via nonverbal cues such as mimics, body posture, and speech prosody. The theoretical work of intonational phonologists seeking to define the meaning of specific vocal intonation profiles (Bolinger 1986; Kohler 1991) demonstrates the role of prosody in conveying the speaker’s conversational goal. However, to date only little is known about the neurocognitive architecture underlying the comprehension of communicative intents in general (Holtgraves 2005; Egorova, Shtyrov, Pulvermu¨ller 2013), and the distinctive role of prosody in particular. The present study aimed, therefore, to investigate this interpersonal role of prosody in conveying the speaker’s intents and its underlying acoustic properties. Taking speech act theory as a framework for intention in language (Austin 1962; Searle 1969), we created a novel set of short (non-)word utterances intoned to express different speech acts. Adopting an approach from emotional prosody research (Banse, Scherer 1996; Sauter, Eisner, Calder, Scott 2010), this stimulus set was employed in a combination of behavioral ratings and acoustic analyzes to test the following hypotheses: If prosody codes for the communicative intention of the speaker, we expect 1) above-chance behavioral recognition of different intentions that are merely expressed via prosody, 2) acoustic markers in the prosody that identify these intentions, and 3) independence of acoustics and behavior from the overt lexical meaning of the utterance. The German words ‘‘Bier’’ (beer) and ‘‘Bar’’ (bar) and the nonwords ‘‘Diem’’ and ‘‘Dahm’’ were recorded from four (two female) speakers expressing six different speech acts in their prosody—criticism, wish (expressives), warning, suggestion (directives), doubt, and naming (assertives). Acoustic features for pitch, duration, intensity, and spectral features were extracted with PRAAT. These measures were subjected to discriminant analyzes—separately for words and non-words—in order to test whether the acoustic features have enough discriminant power to classify the stimuli to their corresponding speech act category. Furthermore, 20 participants were tested for the behavioral recognition of the speech act categories with a 6 alternative-forced-choice task. Finally, a new group of 40 participants performed subjective ratings of the different speech acts (e.g. ‘‘How much does the stimulus sound like criticism?’’) to obtain more detailed information on the perception of different intentions and allow, as quantitative variable, further analyzes in combination with the acoustic measures. The discriminant analyzes of the acoustic features yielded high above chance predictions for each speech act category, with an overall classification accuracy of about 90 % for both words and nonwords (chance level: 17 %). Likewise, participants were behaviorally very well able to classify the stimuli into the correct category, with a slightly lower accuracy for non-words (73 %) than for words (81 %). Multiple regression analyzes of participants’ ratings of the different speech acts and the acoustic measures further identified distinct patterns of physical features that were able to predict the behavioral perception. These findings indicate that prosodic cues convey sufficient detail to classify short (non-)word utterances according to their underlying intention, at acoustic as well as perceptual levels. Lexical meaning
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 seems to be supportive but not necessary for the comprehension of different intentions, given that participants showed a high performance for the non-words, but scored higher for the words. In total, our results show that prosodic cues are powerful indicators for the speaker’s intentions in interpersonal communication. The present carefully constructed stimulus set will serve as a useful tool to study the neural correlates of intentional prosody in the future. References Austin JL (1962) How to do things with words. Oxford University Press, Oxford Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers Soc Psychol 70(3):614–636 Bolinger D (1986) Intonation and its parts: melody in spoken English. Stanford University Press, Stanford Egorova N, Shtyrov Y, Pulvermu¨ller F (2013) Early parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence. Front Human Neurosci 7 Grice HP (1957) Meaning. Philos Rev 66(3):377–388 Holtgraves T (2005) The production and perception of implicit performatives. J Pragm 37(12):2024–2034 Kohler KJ (ed) (1991) Studies in German intonation (No. 25). Institut fu¨r Phonetik und digitale Sprachverarbeitung, Universita¨t Kiel. Sauter D, Eisner F, Calder A, Scott S (2010) Perceptual cues in nonverbal vocal expressions of emotion. Quart J Exp Psychol 63(11):2251–2272 Searle JR (1969) Speech acts: an essay in the philosophy of language. Cambridge University Press, Cambridge
On the perception and processing of social actions Matthias R. Hohmann, Stephan de La Rosa, Heinrich H. Bu¨lthoff Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany Action recognition research has mainly focused on investigating the perceptual processes in the recognition of isolated actions from biological motion patterns. Surprisingly little is known about the cognitive representation underlying action recognition. A fundamental question concerns whether actions are represented independently or interdependently. Here we examined, whether cognitive representation of static (action image) and dynamic (action movie) actions are dependent on each other and whether cognitive representations for static and dynamic actions overlap. Adaptation paradigms are an elegant way to examine the presence of relationship between different cognitive representations. In an adaptation experiment, participants view a stimulus, the adaptor, for a prolonged amount of time and afterwards report their perception of a second, ambiguous test stimulus. Typically, the perception of the second stimulus will be biased away from the adaptor stimulus. The presence of an antagonistic perceptual bias (adaptation effect) is often taken as evidence for the interdependency of the cognitive representation between test and adaptor stimulus. We manipulated the dynamic content (dynamic vs. static) of the test and adaptor stimulus independently. The ambiguous test stimulus was created by a weighted linear morph between the spatial positions of the two adapting actions (hand shake, high five). 30 participants categorized the ambiguous dynamic or static action stimuli after being adapted to dynamic or static actions. Afterwards, we calculated the perceptual bias for each participant by fitting a psychometric function to the data. We found an action-adaptation after-effect in some but not all experimental conditions. Specifically, the effect was only present if the presentation of the adaptor and the test stimulus was congruent, i.e. if both were presented in either a dynamic or a static manner (p \ 0.001). This action-adaptation after-effect indicates a
Cogn Process (2014) 15 (Suppl 1):S1–S158 dependency between cognitive representations when adaptor and test stimuli have the same dynamic content (i.e. both static or dynamic). Future studies are needed to relate those results to other findings in the field of action recognition and to incorporate a neurophysiological perspective.
Stage-level and individual-level interpretation of multiple adnominal adjectives as an epiphenomenon—theoretical and empirical evidence Sven Kotowski, Holden Ha¨rtl Institut fu¨r Anglistik und Amerikanistik, Universita¨t Kassel, Germany As observed by various authors (among others Bolinger 1967; Cinque 2010; Larson 1998), certain adjectives in several languages are semantically ambiguous in different adnominal positions. These ambiguities concern semantic oppositions, such as intersective vs. non-intersective, restrictive vs. non-restrictive, or individual-level vs. stage-level. Thus, the time-honored examples in (1a/b) are argued to have two distinct interpretations: (1) a. the visible stars b. the stars visible In (1a), visible can either have an occasion/stage-level (SL) or a characterizing/individual-level (IL) reading. The postnominal adjective in (1b), however, is non-ambiguous and allows for the SL-reading only (cf. Kratzer 1995 for test environments). Furthermore, when the ‘‘same’’ adjective occurs twice prenominally (2), the two interpretations are linked to rigidly ordered positions (cf. Cinque 2010; Larson 1998): (2) the visible[SL] visible[IL] stars In this paper, we argue that the order of multiple prenominal adjectives in German (and possibly cross-linguistically) cannot be derived on the basis of an inherent dichotomy between SL- and ILpredicates, but requires a more general analysis of adnominal adjective order. SL and IL are not intrinsically ordered along the lines of (2), i.e. SL IL. Rather, they are found in this very order due to different adjectival functions in a layered structure around the NP’s head. Crucially, in such adjective doublets, the second adjective always receives a generic reading, i.e. the [A2N] in such [A1[A2N]] expressions functions as a complex common name that denotes a subkind of the kind denoted by the head noun (Gunkel, Zifonun 2009): In (1)/(2) above, if star denotes the kind STAR, (1a) is ambiguous between a subkind and a qualifying reading, while in (2) the cluster visible2 stars is interpreted as a subkind VISIBLE STAR and thus disambiguated. Accordingly, we assume that doublets increase in general acceptability if A2Ns fulfil establishedness-conditions and pass tests with kindselecting predicates (like INVENT etc.; see e.g. Krifka et al. 1995). For example, a taxonomic subkind reading is triggered by the indefinite NP in (3a), while no such downward-projecting taxonomic inference occurs for non-established complex expressions (3b): (3) a. John invented a potato peeler. ? a kind of POTATO PEELER b. John invented a cauliflower steamer. ? the kind CAULIFLOWER STEAMER As regards general restrictions on adnominal adjective order, we assert a lack of descriptive adequacy for purely formal/syntactic (in particular cartographic) as well as for purely semantic and/or communicative-functional NP models. Instead, we argue prenominal adjective sequences to include at least three distinct semantic-syntactic layers: a classifying (CLAS; e.g. relational adjectives like musical), an absolute-qualifying (QA; e.g. basic color terms), and a relative-qualifying (QR; e.g. dimensional adjectives) one. The former two share certain semantic and morphosyntactic features (-gradable),
S47 yet are set apart with respect to possible occurrence in predicative position. The latter two’s relation shows the reverse characteristics (both +predicative use, yet differ in gradability). Adjective order at the right prenominal edge of Germanic NPs tends to follow the sequence: QR QA CLAS N. At the same time, classifying adjectives (either inherent classifiers such as relational adjectives or other adjectives functioning as classifiers in established names) typically function as modifiers of complex names—just as in, e.g., NNcompounds, where the modifying N is non-referential, CLAS-adjectives do not locate the NP-referent spatio-temporally but classify it as a member of a certain kind. Therefore, the IL-interpretation of A2 in e.g. (2) is an epiphenomenon of more global constraints on modifier order—in doublets they are interpreted as CLAS, a layer for which SL-interpretations are not available. To test our hypothesis, we conducted two questionnaire studies on German adjective order. Study 1 was a 100-split task designed to test subjects’ order preferences when confronted with polysemous adjectives in combination with another adjective (i.e. either a timestable (e.g. wild & [not domesticated]) or a temporary reading (wild & [furious]) in combination with a time-stable adjective, e.g. big). Introductory context paragraphs promoted both adjectives’ readings within an item. Subjects had to distribute 100 points over two possible follow-up sentences, with alternating A1-A2-orders, according to which sentence was more natural given the context. Crucially, time-stable AN-syntagms did not denote established kinds, i.e. the task tried to elicit order preferences based on a potential ILSL-distinction only. While control group items following clear-cut order regularities described in the literature irrespective of temporality, e.g. small French car, score significantly better than either of the above test categories, differences between IL- and SL-categories are clearly insignificant. In a follow-up study conducted currently, subjects are presented with introductory sentences containing AAN-clusters not further specified as regards interpretation. Again, alternating adjectival senses are utilized. In each test sentence one A is a deverbal ending in –bar (the rough German equivalent of English –ible/able; e.g. ausziehbar ‘extendable’), which displays a systematic ambiguity between an occasion and a habitual reading (Motsch 2004). Combined with respective nouns, these adjectives in one reading encode established kinds (e.g. ausziehbarer Tisch ‘pull-out table’; CLAS modification), while the respective second adjective encodes a timestable property that does not exhibit a kind reading in an ANsyntagm (e.g. blauer ausziehbarer Tisch ‘blue pull-out table’). Subjects are then asked to rate follow-up sentences according to their naturalness as discourse continuations—these systematically violate the occasion reading and we hypothesize that continuations will score higher for [A=KIND AKIND N] than for [AOCCASION(POTENTIAL KIND) A=KIND N] expressions. Should this hypothesis be confirmed, we take the results—together with the findings from study 1—as support for the above reasoning that observed adjective interpretations as in (2) do not derive primarily from a grammatical distinction between IL- and SL-predicates, but need to be understood as an epiphenomenon of more general constraints on adjective order and kind reference. References Bolinger D (1967) Adjectives in English: attribution and predication. Lingua 18:1–34 Cinque G (2010) The syntax of adjectives. A comparative study. MIT Press, Cambridge, MA Fernald T (2000) Predicates and temporal arguments. Oxford University Press, Oxford. Kratzer A (1995) Stage-level and Individual-level Predicates. In: Carlson GN, Pelletier FJ (eds) The generic book. The University of Chicago Press, Chicago, pp 125–175
123
S48 Krifka M, Pelletier FJ, Carlson GN; ter Meulen A, Chierchia G, Link G (1995) Genericity: an introduction. In: Carlson GN, Pelletier FJ (eds) The generic book. The University of Chicago Press, Chicago, pp 1–124 Larson R (1998) Events and modification in nominals. In: Strolovitch D, Lawson A (eds) Proceedings from semantics and linguistic theory (Salt) VIII, Cornell University Press, Ithaca, pp 145–168 Motsch W (2004) Deutsche Wortbildung in Grundzu¨gen. Walter de Gruyter, Berlin
What happened to the crying bird? –Differential roles of embedding depth and topicalization modulating syntactic complexity in sentence processing Carina Krause1, Bernhard Sehm1, Anja Fengler1, Angela D. Friederici1, Hellmuth Obrig1,2 1 Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany; 2 University Hospital Leipzig, Clinic for Cognitive Neurology, Germany ‘‘The rat the cat the dog bit chased escaped.’’ Previous studies provide evidence that the processing of such hierarchical syntactic structures involves a network including the inferior frontal gyrus and temporo-parietal regions (Friederici 2009; Fengler et al. in press) as two key players. While most studies locate the processing of syntactically complex sentences in Broca’s area (BA44/45), some studies also report the involvement of BA47 and BA6 (Friederici 2011), and temporo-parietal areas (Shetreet et al. 2009). Why is there so much variation in localizing the syntactic complexity effect? The interpretation of multiple embedded sentence structures represents a particular challenge to language processing requiring syntactic hierarchy building and verbal working memory. Thereby syntactic operations may differentially tax general verbal working memory capacities, preferentially relying on TP-regions (Meyer et al. 2012), and more syntax-specific working memory domains, preferentially relying on IFG structures (Makuuchi et al. 2009). To disentangle the specific contribution of each subsystem, we developed stimulus material that contrasts syntactic complexity and the working memory aspects. The goal of our project is to use this material in facilitation (tDCS study) and impairment (lesion study) to allow ascribing causal roles of the above brain areas to these three aspects of syntax processing. Methods 20 healthy participants (Ø age: 24) performed an auditory sentence– picture-matching task. Both reaction times and error rates were recorded. Paradigm In a number of pilot studies (each 10–15 participants), task complexity was varied (number of choice options, distractors, presentation order).Our stimulus set is based on material used in previous studies (Antonenko et al. 2013; Fengler et al. in press) and consists of 132 German transitive sentences. It has a 2x3-factorial design tapping argument order (A: subject- vs. B: object-first) and depth of syntactic embedding (0: no, 1: single, 2: double embedding):
A0: Der Vogel ist braun, er wäscht den Frosch, und er weint. B0: Der Vogel ist braun, ihn wäscht der Frosch, und er weint. A1: Der Vogel, der braun ist, und der den Frosch wäscht, weint B1: Der Vogel, der braun ist, und den der Frosch wäscht, weint A2: Der Vogel, der den Frosch, der braun ist, wäscht, weint. B2: Der Vogel, den der Frosch, der braun ist, wäscht, weint.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Results and Conclusion In healthy subjects only successive presentation of auditorily presented sentences and the ensuing pictures (three distracters) yields robust behavioral differences. As a function of both (i) level of embedding and (ii) topicalization, we find highly significant effects in terms of increasing reaction times (embedding: F(2,32) = 46.610, p = .000; topicalization, F(1,16) = 25.003, p = .000) as well as decreased accuracy (embedding depth, F(2,32) = 20.826, p = .000; topicalization, F(1,16) = 10.559, p = .005). Interestingly, factors do not interact, suggesting partially independent factorial influence on syntactic processing. Currently the paradigm is used in a study with facilitatory transcranial direct current stimulation (tDCS) of each key area (IFG vs. TP-region). Additionally, patients with circumscribed acquired brain lesions are tested on different versions of the paradigm adapted to the requirements of language-compromised patients. References Antonenko D, Brauer J, Meinzer M, Fengler A, Kerti L, Friederici A, Flo¨el A (2013) Functional and structural syntax networks in aging. Neuroimage 83:513–523 Friederici A (2009) Pathways to language: fiber tracts in the human brain. Trends Cogn Sci 13(4): 175–181 Friederici A (2011). The brain basis of language processing: from structure to function. Physiol Rev 91(4):1357–1392 Makuuchi M, Bahlmann J, Anwander A, Friederici A (2009) Segregating the core computational faculty of human language from working memory. PNAS 106(20):8362–8367 Meyer L, Obleser J, Anwander A, Friederici A (2012) Linking ordering in Broca’s area to storage in left temporo-parietal regions: the case of sentence processing, Neuroimage 62(3):1987–1998 Shetreet E, Friedmann N, Hadar U (2009) An fMRI study of syntactic layers: Sentential and lexical aspects of embedding. Neuroimage 48(4):707–716
fMRI-evidence for a top-down grouping mechanism establishing object correspondence in the Ternus display Katrin Kutscheidt1, Elisabeth Hein2, Manuel Jan Roth1, Axel Lindner1 1 Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research, Tu¨bingen, Germany; 2 Department of Psychology, University of Tu¨bingen, Germany Our visual system is constantly confronted with ambiguous sensory input. However, it is rarely perceived as being ambiguous. It is, for instance, possible to keep track of multiple moving objects in parallel, even if occlusions, or eye blinks might prevent the unique assignment of objects’ identity based on sensory input. Hence, neural mechanisms—bottom-up or top-down—must disambiguate conflicting sensory information. The aim of this study was to shed light on the underlying neural mechanisms establishing object correspondence across space and time despite such ambiguity. To this end, we performed a functional magnetic resonance imaging (fMRI) study using a variant of the Ternus display (Ternus 1926). The Ternus display is an ambiguous apparent motion stimulus, in which two sets of three equidistant disks are presented in the following way: while two disks are always presented at the same position, a third disk is alternating between a position to the left and to the right of these two central disks. This display either leads to the
Cogn Process (2014) 15 (Suppl 1):S1–S158 percept of ‘‘group motion’’ (GM), where an observer has the impression that all three disks move coherently as one group or, alternatively, to the percept of ‘‘element motion’’ (EM), in which the outermost disk is seen as jumping back and forth over stationary central disks. The way the Ternus display is perceptually interpreted thereby depends on both low-level features (e.g. inter-frame interval [IFI]; Petersik, Pantle,1979) and higher-level factors (e.g. context information; He, Ooi 1999). Our Ternus display consisted of three white disks presented on a grey background. The disks were shown for 200 ms in alternating frames. Each stimulus block lasted five minutes during which participants (n = 10) had to fixate a central fixation cross and to manually indicate their respective motion percept, GM or EM, using a button box. Due to the ambiguous nature of the stimulus, participants’ perceptual interpretation constantly changed during the course of the experiment. The average percept duration across individuals was *11 s for GM and *8 s for EM. To guarantee comparable percept durations also within participants, we individually estimated the IFI at which EM and GM were perceived equally often in a pre-experiment. The IFI in the MRI experiment was then adjusted accordingly. The experiment comprised six blocks, each preceded by a 30 s baseline period without stimulus presentation. Functional (TR = 2 s) and anatomical MRI images were acquired on a 3 T Siemens TRIO scanner and processed using SPM8. In participant-specific first level analyzes, we specified general linear models including three regressors: (i) onset of GM percept; (ii) onset of EM percept; (iii) stimulus presentation. All regressors were convolved with the canonical haemodynamic response function. The initial fixation period was not explicitly modelled and served as a baseline. In each participant, we individually identified task-related regions of interest (ROIs) by contrasting stimulus presentation (iii) vs. baseline. Only those areas were considered ROIs that also surfaced in a second-level group analysis of the same contrast. Task-related bilateral ROIs were the lingual gyrus (LG), V3a, V5 and intraparietal sulcus (IPS). For each individual and for each ROI, we then extracted the time course of fMRI-activity in order to perform time-resolved group analyzes for activity differences between EM and GM percepts. Analyzes of the simultaneously recorded eye data helped to exclude influences of eye blinks, saccades, eye position and eye velocity on the motion percepts, as no difference between conditions was revealed. In all ROIs a perceptual switch was accompanied by a significant peak in fMRI-activity around the time of the indicated switch (p \ .05). While the amplitude of these peaks did not differ between perceived GM and EM across all ROIs (p [ .05, n.s.) we observed significant differences in the temporal onset of the switch-related fMRI-response in GM and EM (p \ .01). Specifically, there was a particularly early rise in switch-related fMRI-activity in IPS for GM, which occurred about three seconds before the participant finally switched from EM to GM. In the case of EM, on the other hand, this switch-related increase in fMRI-activity in IPS seemed to rather occur after the perceptual switch. Area V5 exhibited comparable results but showed less of a temporal difference between GM and EM (p \ .05). In contrast, in areas LG and V3a the rise in fMRI-activity was rather time-locked to the perceptual switch per se, being indistinguishable between GM and EM (p [ .05, n.s.). Our results revealed significant peaks of fMRI activity that were correlated with a switch between two perceptual interpretations (GM or EM) of a physically identical stimulus in LG, V3a, V5 and IPS, brain regions, which are also involved in visual motion processing (e.g. Sunaert, Van Hecke, Marchal, Orban 1999). Importantly, the time course of switch-related activity in IPS additionally suggests a potential top-down influence on other areas (cf. Sterzer, Russ, Preibisch, Kleinschmidt 2009), here to mediate the perception of GM. The specific role of IPS could thereby relate to spatial binding of individual objects into a group (cf. Zaretskaya, Anstis, Bartels 2013). This idea is consistent with the theory of Kramer and Yantis (1997) suggesting that
S49 object correspondence in the Ternus display could be determined by top-down spatial binding of the discs within particular frames. Acknowledgments This work was supported by a grant from the BMBF (FKZ 01GQ1002 to A.L.). References He ZJ, Ooi TL (1999) Perceptual organization of apparent motion in the ternus display. Percept Lond 28:877–892 Kramer P, Yantis S (1997) Perceptual grouping in space and time: evidence from the Ternus display. Percept Psychophys 59:87–99 Petersik JT, Pantle A (1979) Factors controlling the competing sensations produced by a bistable stroboscopic motion display. Vision Res 19(2):143–154 Sterzer P, Kleinschmidt A, Rees G (2009) The neural bases of multistable perception. Trends Cogn Sci 13:310–318 Sunaert S, Van Hecke P, Marchal G, Orban GA (1999) Motion-responsive regions of the human brain. Exp Brain Res 127:355–370 Ternus J (1926) Experimentelle Untersuchungen u¨ber pha¨nomenale Identita¨t [Experimental investigations of phenomenal identity]. Psychologische Forschung 7:81–136 Zaretskaya N, Anstis S, Bartels A (2013) Parietal cortex mediates conscious perception of illusory gestalt. J Neurosci 33:523–531
Event-related potentials in the recognition of scene sequences Stephan Lancier, Julian Hofmeister, Hanspeter Mallot Cognitive Neuroscience Unit, Department of Biology, University of Tu¨bingen, Germany Many studies investigated Event-Related Potentials (ERP) associated to the recognition of objects and words. Friedmann (1990) showed in an old/new task that correctly recognized new pictures of objects evoked a larger frontal-central N300 amplitude than familiar pictures of objects. This indicates that participants are able to discriminate between old and new pictures 300 ms after stimulus onset. Rugg et al. (1998) found different neural correlates for the recognition of implicitly and explicitly learned words. In the so-called mid-frontal old/new effect, recognized, implicitly learned words were characterized by lower N400 amplitude in contrast to the recognized new words. The explicitly learned words could be dissociated by their larger P600 amplitude from implicitly learned words, which was called the left parietal old/new effect. Rugg et al. concluded that recognition memory can be divided into two distinct processes, a familiarity process for implicit learning and a recollection process for explicit learning. These neural correlates were also shown for the recognition of pictures of objects (Duarte et al. 2004). In fast recognition tasks, pictures of scenes are identified as fast as pictures of isolated objects. Schyns and Oliva (1994) suggest that a coarse-to-fine process extracts a coarse description for scene recognition before finer information is processed. In this case the workload for recognizing a scene would not differ substantially from the workload required in object recognition. In the present study, we investigate the recognition of target scenes from scene sequences and compare the elicited neural correlates to those of former studies. We hypothesize that the recognition of scene identity and scene position in the sequence evoke dissociable neural correlates. At the current stage of this study five students of the University of Tu¨bingen participated. Each of them completed two sessions at different days. The experiment consisted of 100 trials. Each trial was divided into a learning phase and test phase (see Fig. 1). During the learning phase eight hallways each with two doors were shown. In
123
S50
Cogn Process (2014) 15 (Suppl 1):S1–S158 results showed also a parietal old/new effect but without a left lateralization. The results of our experiment cannot be assigned conclusively to one of the postulated memory processes. Furthermore in the tasks involving non-sequence matching scenes, time course of the ERP was reversed after 650 ms. We assume that this effect is a neural correlate of sequence recognition processing.
Fig. 1 Schematic illustration of the learning phase and the test phase. After the learning phase the lettering ‘‘test phase’’ was presented on the display for three seconds. ERPs were triggered with the onset of the test scene each hallway the participants had to choose one door which they wanted to pass. This decision had no impact on the further presentation but was included to focus attention on the subsequent scene. After this decision two pictures of indoor scenes were presented each for 600 ms. The first was the target scene which the participants had to detect in the test phase. This picture was marked with a green frame. The second picture showed the distractor scene and was marked with a red frame. The test phase followed immediately after all eight hallways had been presented. During the test phase, all hallways were tested. The number of the current hallway was presented as a cue followed by a test scene. In a Yes/No task, participants were asked to hit the corresponding mouse button if the presented scene was the target scene they encountered in the corresponding hallway during the learning phase. Fifty percent of presented test scenes were sequence matching target scenes known from the learning phase (correct identity and position) and the other 50 percent were homogenously distributed into distractor scenes of the same hallway (false identity, correct position), new scenes which were not presented in the learning phase (false identity and position) and target scenes which did not match the corresponding hallway (correct identity, false position). In addition to the psychophysical measurements, ERPs were measured by EEG and were triggered with the test scene presentation. Behaviorally, the hit rate (correct recognition of scene identity and position) was about 80 %. Overall correct rejection (either identity or position incorrect) was about 85 %. Correct target scenes appearing at incorrect positions were rejected at a rate of about 60 %. Target scenes appearing at a false position were more likely to be rejected as the distance between their presentation in the learning and test sequences increased. ERPs depend on the combination of decision and task condition. The hit condition differs from all other task/ response combinations in a relatively weak N300. Especially at the frontal sites the non-hit combinations lacked a P300 wave except for the false alarms of non-sequence matching target scenes where the ERP approached the level of the P300 of the hit condition abruptly after the peak of the N300. Note that in both conditions, scene identity was correctly judged whereas position is ignored. They also differ from the other task/response combinations also in the N400 which is relatively weak. The parietal P600 wave of the hits differed only from the correct rejections of distractor scenes, the novel scenes and the missed target scenes. Between 650 and 800 ms, the parietal electrodes recorded a positive voltage shift for the correct rejections of the nonsequence matching scenes and a negative voltage shift for the false alarms of the non-sequence matching scenes. No such potentials were found for the other task/response combinations. The mid-frontal old/new effect of Rugg et al. (1998) seems to be comparable to the N400 effect of our preliminary data. In addition our
123
References Duarte A, Ranganath C, Winward L, Hayward D, Knight RT (2004) Dissociable neural correlates for familiarity and recollection during the encoding and retrieval of pictures. Cogn Brain Res 18:255–272 Friedmann D (1990) Cognitive event-related potential components during continuous recognition memory for pictures. Psychophysiology 27(2):136–148 Rugg MD, Mark RE, Walla P, Schloerscheidt AM, Birch CS, Allan K (1998) Dissociation of the neural correlates of implicit and explicit memory. Nature 392:595–598 Schyns PG, Oliva A (1994) From blobs to boundary edges for timeand spatial-scale-dependent scene recognition. Psychol Sci 5(4):195–200
Sensorimotor interactions as signaling games Felix Leibfried1,2,3, Jordi Grau-Moya1,2,3, Daniel A. Braun1,2 1 Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany; 2 Max Planck Institute for Intelligent Systems, Tu¨bingen, Germany; 3 Graduate Training Centre of Neuroscience, Tu¨bingen, Germany In our everyday lives, humans not only signal their intentions through verbal communication, but also through body movements (Sebanz et al. 2006; Obhi and Sebanz 2011; Pezzulo et al. 2013), for instance when doing sports to inform team mates about one’s own intended actions or to feint members of an opposing team. We study such sensorimotor signaling in order to investigate how communication emerges and on what variables it depends on. In our setup, there are two players with different aims that have partial control in a joint motor task and where one of the two players possesses private information the other player would like to know about. The question then is under what conditions this private information is shared through a signaling process. We manipulated the critical variables given by the costs of signaling and the uncertainty of the ignorant player. We found that the dependency of both players’ strategies on these variables can be modeled successfully by a game-theoretic analysis. Signaling games are typically investigated within the context of non-cooperative game theory, where each player tries to maximize their own benefit given the other player’s strategy (Cho and Kreps 1987). This allows defining equilibrium strategies where no player can improve their performance by changing their strategy unilaterally. These equilibria are called Bayesian Nash equilibria, which is a generalization of the Nash equilibrium concept in the presence of private information (Harsanyi 1968). In general, signaling games allow both for pooling equilibria, where no information is shared, and for separating equilibria with reliable signaling. In our study we translated the job market signaling game into a sensorimotor task. In the job market signaling game (Spence 1973), there is an applicant—the sender—who has private information about his true working skill, called the type. The future employer—the receiver—cannot directly know about the working skill, but only through a signal—for example, educational certificates—that are the more costly to acquire, the less working skill the applicant has. The sender can choose a costly signal that may or may not transmit information about the type to the receiver. The receiver uses this
Cogn Process (2014) 15 (Suppl 1):S1–S158
S51
signal to make a decision by trying to match the payment—the action—to the presumed type (working skill) that she infers from the signal. The sender’s decision about the signal trades off the expected benefits from the receiver’s action against the signaling costs. To translate this game into a sensorimotor task, we designed a dyadic reaching task that implemented a signaling game with continuous signal, type and action space. Two players sat next to each other in front of a bimanual manipulandum, such that they could not see each others’ faces. In this task, each player controlled one dimension of a two-dimensional cursor position. No other communication than the joint cursor position was allowed. The sender’s dimension encoded the signal that could be used to convey information about a target position (the type) that the receiver wanted to hit, but did not know about. The receiver’s dimension encoded her action that determined the sender’s payoff. The sender’s aim was to maximize a point score that was displayed as a two-dimensional color map The point score increased with the reach distance of the receiver—so there was an incentive to make the receiver believe that the target is far away. However, the point score also decreased with the magnitude of the signal—so there was an incentive to signal as little as possible due to implied signaling costs. The receiver’s payoff was determined by the difference between his action and the true target position that was revealed after each trial. Each player was instructed about the setup, their aim and the possibility of signaling. The question was whether players’ behavior converged to Bayesian Nash Equilibria under different conditions where we manipulated the signaling cost and the variability of the target position. By fitting participants’ variance of their signaling, we could quantitatively predict the influence of signaling costs and target variability on the amount of signaling. In line with our game-theoretic predictions, we found that increasing signaling costs and decreasing target variability leads in most dyads to less signaling. We conclude that the theory of signaling games provides an appropriate framework to study sensorimotor interactions in the presence of private information.
agents in our environment or controlling technology via voice interfaces. Here we investigate SoA during verbal control of the external environment using the intentional binding paradigm. Intentional binding is a phenomenon where the perceived actionoutcome interval for voluntary actions is shorter than for equivalent passive movements (Haggard, Clark, Kalogeras 2002). In this experimental paradigm participants report the perceived time of voluntary action initiation and the consequent effects using the socalled Libet clock. Haggard, Clark, Kalogeras (2002) found that when participants caused an action, their perceived time of initiation and the perceived time of the outcome where brought closer together, i.e. the perceived interval between voluntary actions and outcomes was smaller than the actual interval. In the case of involuntary actions the perceived interval was found to be longer than the actual interval. Importantly, intentional binding is thought to offer a reliable implicit measure of SoA (Moore, Obhi 2012). In this study we developed a novel adaptation of the intentional binding paradigm where participants performed both verbal commands (saying the word ‘‘go’’) and limb movements (key-presses) that were followed by an outcome (auditory tone) after a fixed 500 ms interval. Participants sat at a desk in front of a 24’’ monitor, which displayed the Libet clock. The experimental design was a within-subjects design with one independent variable: input modality—speech input or keyboard input. A keyboard and microphone were used to register their actions. The trials were separated into individual blocks—operant blocks require the participant to act (either via button press or verbal command) to cause a beep During the operant trials, participants reported the time of the critical event (either the action or outcome). Baseline trials had either an action from the participant (with no outcome) or the beep occurring in isolation. During baseline conditions, the participant is required to report the time of the critical event—action or outcome. We investigated:
References Cho I, Kreps D (1987) Signaling games and stable equilibria. Quart J Econ 102(2):179–222 Harsanyi J (1968), Games with incomplete information played by ‘‘Bayesian’’ players, I–III. Part II. Bayesian equilibrium points. Manag Sci 14(5):320–334 Obhi SS, Sebanz N (2011), Moving together: toward understanding the mechanisms of joint action. Exp Brain Res 211(3–4):329–336 Pezzulo G, Donnarumma F, Dindo H (2013) Human sensorimotor communication: a theory of signaling in online social interactions. PloS ONE 8(11):e79876 Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and minds moving together. Trends Cogn Sci 10(2):70–76 Spence M (1973) Job market signaling. Quart J Econ 87(3):355–374
Firstly, we found that the average perceived time of action corresponded to the beginning of the utterance. This offers an intriguing insight concerning the cognitive processes underlying action perception for speech. One possible explanation for the action being perceived as occurring at the beginning of the utterance is that once people receive sensory information about their verbal command, perception of action arises. Theoretically, this possible explanation is in line with the cue integration theory of agency. Cue integration holds that both internal motor cues and external situational information contribute to the SoA (Wegner, Sparrow 2004; Moore et al. 2009; Moore, Fletcher 2012). It has been suggested that the influence of these cues upon our SoA depends on their reliability (Moore, Fletcher 2012). According to the cue integration concept, multiple agency cues are weighted by their relative reliability and then optimally integrated to reduce the variability of the estimated origins of an action. For speech, it may be the case that hearing one’s voice is a highly reliable agency cue and enough to label the action as initiated. Of course, further investigation is required, a larger sample size and other measurements for action perception (such as EEG) will be vital in determining the perception of action perception for verbal commands. These insights will be valuable, particularly for designers of speech recognition software and voice based interfaces. To address number 2) above, we tested whether binding was occurring within each input modality. We conducted a 2x2 repeated measures analysis of variance comparing event type (action/outcome) and context (operant/baseline). The key-press condition resulted in a significant interaction between context and event. Follow up t-tests comparing action operant and action baseline have a significant difference t(13) = - 5.103, p \ .001. This shows that operant actions
Subjective time perception of verbal action and the sense of agency Hannah Limerick1, David Coyle1, James Moore2 1 University of Bristol, UK; 2 Goldsmiths, University of London, UK The Sense of Agency (SoA) is the experience of initiating actions to influence the external environment. Traditionally SoA has been investigated using experimental paradigms where a limb movement is required to initiate an action. However, less is known about the SoA for verbal commands, which are a prevalent mode of controlling our external environment. Examples of this are interacting with other
1) 2)
Subjective time of action perception for verbal commands The SoA for verbal commands
123
S52 were perceived later than the baseline. T test comparing the perceived times of the operant tone condition and the baseline tone condition have a significant difference t(13) = 2.374, p \ .05 and therefore operant tones were perceived earlier than the baseline. The same analysis was repeated for the speech condition which resulted in a trend towards significance between context and event (F1,13 = 3,112, p = .101). Because this was a preliminary investigation, we performed exploratory analysis and performed follow up paired t-tests comparing action operant and action baseline and found a significant difference t(12) = - 2.257, p \ .05 indicating that operant actions are perceived later than the baseline and thus action binding is occurring. A t-test comparing outcome operant and outcome baseline gave a non-significant difference t(13) = .532, p = .604). Therefore the outcome operant condition was not perceived as significantly earlier than the baseline and the outcome binding component of intentional binding was not present. Although intentional binding was present for limb movements (consistent with the existing literature), it was absent for verbal commands. There are several possible explanations for this. One possible explanation is that intentional binding is a phenomenon that does not occur for verbal commands. It is also possible that intentional binding is present at different scales across different sensorimotor modalities. Another explanation, in line the cue integration approach for SoA (described above) is that there are differences in the amount of sensory cues provided to the participant to confirm that the action has occurred. Key-presses involve proprioceptive, visual, haptic and auditory cues, which are all integrated to influence the SoA for an action. For verbal commands, there are less sensory cues—proprioceptive and auditory. Less agency cues involved in verbal commands may result in no intentional binding effect. Therefore, further investigation would determine whether different factors within the experimental set up has an impact on intentional binding for verbal commands. Alterations such as longer of shorter timescales, perhaps different forms of outcome (e.g. non auditory) or additional agency cues may alter intentional binding. There may also be experimental factors that lead to no intentional binding being present for the verbal condition. Typically a speech recognizer would need to process the entire utterance and perform recognition before deeming it as an action. However, as we discussed above, the user typically considered their utterance as an action roughly at the beginning of the utterance, thus giving a variable delay between action-outcome. Intentional binding studies have found that the binding phenomenon breaks down beyond 650 ms (Haggard, Clark, Kalogerous 2002). This may also explain the lack of tone binding found here. Interestingly, further exploratory analyzes of the speech data suggest that the action component of intentional binding was present but the outcome component was absent (hence the apparent lack of overall binding). This suggests that an element of binding is occurring here. References Haggard P, Clark S, Kalogeras J (2002) Voluntary action and conscious awareness. Nature Neurosci 5(4):382–385. doi:10.1038/nn827 Moore J, Fletcher P (2012) Sense of agency in health and disease: a review of cue integration approaches. Conscious Cogn 21(1): 59–68. doi:10.1016/j.concog.2011.08.010 Moore J, Obhi S (2012) Intentional binding and the sense of agency: a review. Conscious Cogn 21(1):546–561. doi:10.1016/j.concog. 2011.12.002 Moore JW, Wegner DM, Haggard P (2009) Modulating the sense of agency with external cues. Conscious Cogn 18(4):1056–1064. doi:10.1016/j.concog.2009.05.004 Wegner DM, Sparrow B (2004) Authorship processing. In: Gazzaniga M (ed) The cognitive neurosciences III. MIT Press, Cambridge, MA, pp 1201–1209
123
Cogn Process (2014) 15 (Suppl 1):S1–S158
Memory disclosed by motion: predicting visual working memory performance from movement patterns Johannes Lohmann, Martin V. Butz Cognitive Modeling, Department of Computer Science, University of Tu¨bingen, Germany Abstract Embodied cognition proposes a close link between cognitive and motor processes. Empiric support for this notion comes from research applying hand-tracking in decision making tasks. Here we investigate if similar systematics can be revealed in case of a visual working memory (VWM) task. We trained recurrent neural networks (RNNs) to predict memory performance from velocity patterns of mouse trajectories. Compared to previous studies, the responses were not speeded. The results presented here reflect a work in progress and more detailed analyzes are pending, especially the low generalization performance to unknown data requires a more thorough investigation. So far, the results indicate that even small RNNs can predict participants working memory state from raw mouse tracking data. Keywords Mouse Tracking, Recurrent Neural Networks, Visual Working Memory Introduction With the embodied turn in cognitive science and due to the reconsideration of cognition in terms of a dynamic system (Spivey and Dale 2006), the dynamic coupling between real-time cognition and motor responses has become a prominent topic in cognitive psychology. Freeman et al. (2011) provided a first review of this body of research, concluding that movement trajectories convey rich and detailed information about ongoing cognitive processes. Most studies investigating this coupling, applied speeded responses, where participants were instructed to respond as accurate and as fast as possible. Here we investigate if movement characteristics are also predictive for higher cognitive functions in case of non-speeded responses. More precisely, we analyze mouse trajectories obtained in a visual working memory (VWM) experiment and try to predict recall performance (how well an item was remembered) from the movement characteristics. Experimental Setup Mouse trajectories were obtained during a VWM experiment, applying a delayed cued-recall paradigm with continuous stimulus spaces (see Zhang and Luck 2009, for a detailed description of the paradigm). In each trial, participants had to remember three or six stimuli. After a variable interstimulus interval (ISI), they had to report the identity of one of them. The stimuli consisted of either colored squares or Fourier descriptors. Memory performance in terms of precision was quantified as angular distance between the reported and the target stimulus. At the end of the ISI, one of the previous stimulus locations was highlighted and the mouse cursor appeared at the center of the screen. Around the center either a color or a shape wheel, depending on the initial stimuli, was presented and participants had to click at the location that matched the stimulus at the cued location. The responses were not speeded and participants were instructed to take as much time as they wanted for the decision. The trajectory of the mouse cursor was continuously tracked at a rate of 50 Hz. We obtained 4,000 trajectories per participant. Network Training We used the trajectory data to train Long Short-Term Memory (LSTM, Gers et al. 2003) networks, to predict the memory performance based on the velocity pattern of the first twenty samples of a mouse trajectory. We chose LSTMs instead of other possible classifiers since LSTMs are well suited to precisely identify predictive temporal dependencies in time series, which is difficult for other algorithms, like Hidden Markov Models. We used the raw velocity
Cogn Process (2014) 15 (Suppl 1):S1–S158 vectors of the trajectories as inputs, without applying any normalization. We did not require the network to learn a direct mapping between movement trajectories and reported angular distances (referred to as D in the plots). Rather we labeled each trajectory based on the data obtained in the respective condition as either low distance or high distance and trained the network as a binary classifier. Trajectories that led to an angular distance below the 33 % quantil (Q(33) in Fig. 1), were labeled as low distance, trajectories that led to angular distances above the 66 % quantil (Q(66) in Fig. 1), were labeled as high distance. The intermediate range between the 33 and 66 % quantil was not used for training. Labels were assigned based on the response distribution of the respective experimental condition. Hence, the same angular distance not always led to the same label assignment. Thus, a suitable network had to learn to perform a relative quality judgment instead of a mere median split. Half of the 4,000 trajectories of a participant were used for the training of a single network. From these 2,000 trajectories, the 33 % labeled as low distance and the 33 % labeled as high distance were used in the training. We compared the performance of networks consisting of either 5, 10, or 20 LSTM blocks. For each network size ten networks were trained. Results The presented results were obtained with the 4,000 trajectories of one participant. Depending on the network size, classification performance increased from 60 % after the first learning epochs up to 80 % at the end of the training. Fig. 1 provides an overview of the results. One-sample t tests revealed that both the proportion of correct- as well as of misclassifications differed significantly from chance level for the two trained categories. Paired t tests indicated that the proportion of correct and misclassifications differed significantly within the trained categories. Despite the apparent ability of the networks to acquire criteria to distinguish trajectories associated with either low or high angular distance, cross-validation results were rather poor, yet still significantly above chance level. Discussion In this study we investigated if motion patterns are still predictive for cognitive performance in case of non- speeded responses. We trained comparatively small recurrent neural networks to predict the precision of memory recall from mouse movement trajectories. Even if the generalization performance obtained so far is rather low, our preliminary results show that characteristics of non-speeded movements
S53 can be predictive for the performance of higher cognitive functions like VWM state and retrieval. References Freeman JB, Dale R, Farmer TA (2011) Hand in motion reveals mind in motion. Front Psychol 2:59 Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. JMLR 3:115–143 Spivey MJ, Dale R (2006) Continuous dynamics in real-time cognition. Curr Dir Psychol 15(5):207–211 Zhang W, Luck SJ (2009) Sudden death and gradual decay in visual working memory. Psychol Sci 20(4):423–428
Role and processing of translation in biological motion perception Jana Masselink, Markus Lappe University of Mu¨nster, Germany Keywords Human walking, Biological motion processing, Translation Visual perception of human movement is often investigated using point-light figures walking on a treadmill. However, real human walking does not only consist of a change in the position of the limbs relative to each other (referred to as articulation), but also of a change in body localization in space over time (referred to as translation). In point-light displays this means that the motion vector of each dot is composed of both an articulatory and a translatory component. We have examined the influence and processing mechanisms of this translation component in perception of point-light walkers. In three experiments each with a two-alternative forced-choice task observers judged the apparent facing orientation and articulation respectively— in terms of walking direction or forward/backward discrimination—of a point-light walker viewed from the side. Translation could be either consistent or inconsistent with facing/articulation or not existent at all (treadmill walking). Additionally, stimuli differed in point lifetime to manipulate the presence of local image motion. Stimuli were presented for 200 ms to prevent eye movements to the translating stimulus. Although participants were explicitly instructed to judge facing orientation and articulation regardless of translation, results revealed an effect of translation in terms of response bias in translation direction in all three tasks. As translation even had an effect on walkers with absent local motion signal in facing orientation and walking direction task, we conclude that the global motion of the center-of-mass of the dot pattern is relevant to processing of translation. Overall, translation direction seems to influence both perception of form and motion of a walker. This supports the idea that translation interacts with both the posture-based analysis of form and the posture-time-based analysis of articulation in the perception of human body motion.
How to remember Tu¨bingen? Reference frames in route and survey knowledge of one’s city of residency
Fig. 1 Aggregated evaluation of the 30 networks after 50 learning epochs. Black bars indicate correct classification performance for the two trained categories. Error bars indicate the standard error of the mean. Significant differences between classifications within one category are marked with an asterisk
Tobias Meilinger1, Julia Frankenstein2, Betty J. Mohler1, Heinrich H. Buelthoff1 1 Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany; 2 Cognitive Science, ETH Zu¨rich, Switzerland Knowledge underlying everyday navigation is distinguished into route and survey knowledge (Golledge 1999). Route knowledge allows re-combining and navigating familiar routes. Survey
123
S54 knowledge is used for pointing to distant locations or finding novel shortcuts. We show that within one’s city of residency route and survey knowledge root in separate memories of the same environment and are represented within different reference frames. Twenty-six Tu¨bingen residents who lived there for seven years in average faced a photo- realistic virtual model of Tu¨bingen and completed a survey task in which they pointed to familiar target locations from various locations and orientations. Each participant’s performance was most accurate when facing north, and errors increased as participants’ deviation from a north-facing orientation increased. This suggests that participants’ survey knowledge was organized within a single, north-oriented reference frame. One week later, 23 of the same participants conducted route knowledge tasks comprising of the very same start and goal locations used in the survey task before. Now participants did not point to a goal location, but used arrow keys of a keyboard to enter route decisions along an imagined route leading to the goal. Deviations from the correct number of left, straight, etc. decisions and response latencies were completely uncorrelated to errors and latencies in pointing. This suggests that participants employed different and independent representations for the matched route and survey tasks. Furthermore, participants made fewer route errors when asked to respond from an imagined horizontal walking perspective rather than from an imagined constant aerial perspective which replaced left, straight, right decisions by up, left, right, down as in a map with the order tasks balanced. This performance advantage suggests that participants did not rely on the single, north-up reference used for pointing. Route and survey knowledge were organized along different reference frames. We conclude that our participants’ route knowledge employed multiple local reference frames acquired from navigation whereas their survey knowledge relied on a single north-oriented reference frame learned from maps. Within their everyday environment, people seem to use map or navigation-based knowledge according to which best suits the task. Reference Golledge RG (1999) Wayfinding behavior. The John Hopkins University Press, Baltimore
The effects of observing other people’s gaze: faster intuitive judgments of semantic coherence Romy Mu¨ller Technische Universita¨t Dresden, Germany Introduction Our actions are modulated by an observation of others’ behavior, especially when we represent others as intentional agents. However, inferring intentions can even be accomplished on the basis of seeing someone’s gaze. Do eye movements also exert a stronger influence on an observer when he is ascribing them to a human instead of a machine? Indeed, reflexive shifts of attention in response to gaze shifts are modulated by subjects’ beliefs (Wiese et al. 2012): A human face elicited stronger gaze cueing effects than a robot face, but this difference disappeared when the instruction stated that both stimuli were of the same origin (i.e. either produced by a human or a machine). This suggests that beliefs about someone else’s visual attention can exert a direct influence on our own processing of the same stimuli. A possible way in which the interpretation of gaze as human can affect our processing is that we try to infer the meaning of the things another person is attending to. In this case, observers of object-
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 directed gaze should be more likely to perceive a coherent relation in the objects that they see being looked at. To test this, the present study used the Remote Associates Test (Mednick 1962) in which subjects decide whether word triads are coherent by means of allowing meaningful combinations with a fourth word. Before each decision, a dot moved across the words and subjects were either told that it represented the eye movements of a human trying to find word associations, or a computer-generated control. It was hypothesized that interpreting the dot as someone’s gaze would increase the frequency and reduce the time for ‘‘intuitive judgments’’, namely those for which subjects assume a coherent relation but cannot name a solution. Methods Sixteen subjects participated in the experiment and their eye movements were tracked with an SR EyeLink 1000. Within each trial there was a preview video with cursor overlay and a word triad. Videos showed a 5 9 4 grid of rectangles containing 20 words, three of which had to be rated for coherence later. A purple dot cursor (15 px) moved across the grid, either resting on the three words that were chosen later, or on three other words. Contrary to what subjects were told, the cursor always was a real eye movement recording. Each subject saw 100 triads, one after each video. All triads were composed of words from the respective video, but only in half of the trials these words had been cued by the cursor. Subjects were instructed that the cursor depicted eye movements (gaze) or a computer-generated control (dot). No strategy of using the cursor was instructed. Each trial started with a video which was followed by a triad that remained on the screen until subjects pressed a key to indicate whether it was coherent or not. If they negated, the response was counted as incoherent. After a positive response, they were asked to submit the solution word. If they gave no solution or a wrong solution, this was counted as a yes + unsolved response, whereas trials with correct solution words were classified as yes + solved. Subjects worked through two blocks of 50 trials, with each block corresponding to one of the cursor conditions. Results The frequency distributions of the three response types (yes + solved, yes + unsolved, incoherent) were compared between both cursors and there was no difference, v2 (2) = 1.546, p = .462. Specifically, the amount of yes + unsolved (intuitive) responses was similar for gaze and dot (24.6 and 22.5 %), and this also did not depend on whether the triad had been cued by the cursor during the video or not, both Fs \ 1, both ps [ .3. Mean response times did not differ between gaze and dot overall (9.5 vs. 9.4 s), F \ 1, p [ .8, but cursor interacted with response type, F(2,28) = 6.052, p = .007, indicating that only the yes + unsolved responses were faster for gaze than for dot (8.9 vs. 11.3 s), p = .02. In contrast, there was no difference for yes + solved and incoherent responses, both ps [ .6. There was no main effect or interaction with cueing, both Fs \ 1, both ps [ .6, suggesting that the speed advantage for yes + unsolved responses in gaze was unspecific, i.e. it also occurred for triads that had not been cued by the gaze cursor (Fig. 1). To investigate the impact of the two cursors on subjects’ visual attention, subjects’ eye movements were analyzed in terms of the time spent on the three cued areas within a grid. This time did not differ between gaze and dot (39.0 vs. 41.9 %), t(15) = 1.36, p = .193. Thus, although there was quite some interindividual variation in subjects’ strategies of using the cursor, most subjects looked at the gaze and dot cursor in a similar manner. Discussion The present results indicate that observing another person’s eye movements can affect the coherence we assume in the things being looked at. When subjects believed that they saw a depiction of gaze on word triads, their intuitive classifications as coherent were no more frequent (perhaps due to a lack of sensitivity) but faster than when
Cogn Process (2014) 15 (Suppl 1):S1–S158
Fig. 1 Percentage of responses (A) and response times (B) depending on cursor and response type. The percentage of time spent on the cued areas for every single subject (C) was similar for both cursors they interpreted the exact same cursor as non-human. Thus, it appears that seeing someone else looking at objects makes people assume that there must be ‘‘something in it’’, especially when they cannot name it. Interestingly, the effect was not specific to cued triads, suggesting that with gaze transfer the overall readiness for assuming coherence was higher. In the light of this result, it is possible that gaze increased subjects’ openness for uncertain judgments more than it affected their actual processing of the objects. This question will have to remain for future research. In contrast to what could be predicted on the basis of previous work (Wiese et al. 2012), subjects’ visual attention allocation did not differ between gaze and dot. First, this rules out the possibility that differences between both cursors only occurred because subjects had ignored the presumably irrelevant dot. Moreover, it raises the question to what degree and on what level of processing more abstract depictions of intentional behavior (such as cursors) can exert an influence. This has implications for basic research on social attention and joint action as well as for applied topics such as the visualization of eye movements or computer-mediated cooperation with real and virtual agents. References Mednick SA (1962) The associative basis of creativity. Psychol Rev 69(3):220–232 Wiese E, Wykowska A, Zwickel J, Mu¨ller HJ (2012) I see what you mean: how attentional selection is shaped by ascribing intentions to others. PLoS ONE 7(9):e45391
Towards a predictive processing account of mental agency Iuliia Pliushch, Wanja Wiese Johannes Gutenberg University Mainz, Mainz, Germany The aim of this paper is to sketch conceptual foundations for a predictive processing account of mental agency. Predictive processing accounts define agency as active inference (as opposed to perceptual inference, cf. Hohwy 2013, Friston 2009, Friston et al. 2012, Friston et al. 2013). Roughly speaking, perceptual inference is about modeling the causal structure of the world internally; active inference is about making the world more similar to the internal model. Existing accounts, however, so far mainly deal with bodily movements, but not with mental actions (cf. Proust 2013, Wu 2013; the only conceptual connection between active inference and mental action we know of is made in Hohwy 2013, pp 197–199). Mental actions are important because they do not just determine what we do, they determine who we are. The paper is structured as follows. (I) First, we will briefly explain the notion of active inference. (II) After that, we will review purely
S55 philosophical accounts of mental agency. (III) Finally, we will highlight aspects of mental agency that need to be explained by predictive processing accounts and, more specifically, suggest possible conceptual connections between mental actions and active inference. (I) What is active inference? Two aspects of agency explanations have been emphasized in the predictive processing literature: 1. The initiation of action: According to the framework provided by Karl Friston’s free-energy principle, agency emerges from active inference. In active inference, changes in the external world that can be brought about by action are predicted. In order to cause these changes (instead of adjusting the internal model to the sensory input), the organism has to move. In beings like us, this involves changing the states of our muscles. Therefore, changes in proprioceptive sensors are predicted. These evoke proprioceptive prediction errors (PPE). If these errors are just used to adjust proprioceptive predictions, no action occurs. Therefore, PPEs that are sent up the processing hierarchy have to be attenuated by top-down modulation, in other words: their expected precision must be lowered (Brown et al. 2013). Overly precise PPEs just lead to a change of the hypothesis, while imprecise PPEs lead to action. The initiation of action therefore crucially depends on precision optimization at the lower end of the processing hierarchy (the expected precision of bottom-up sensory signals has to be low, relative to the precision of top-down proprioceptive predictions). 2. The choice and conductance of action: Agency (a goal-directed kind of behavior) has been explained as active inference (e.g., Friston et al. 2013; Moutoussis et al. 2014). Agents possess a representation of a policy which is a sequence of control states (where control states are beliefs about future action, cf. Friston et al. 2013, p 3): ‘‘[A]ction is selected from posterior beliefs about control states. […] these posterior beliefs depend crucially upon prior beliefs about states that will be occupied in the future.’’ (Friston et al. 2013, p 4). In this process, precision is argued to play a dual biasing role: biasing perception toward goal states and enhancing confidence in action choices (cf. 2013, p 11). The latter fact may influence the phenomenology of the agent (cf. Mathys et al. 2011, p 17). From the point of view of predictive processing, two aspects are central to the explanation of agency: precision and the fact that possible, attainable counterfactual states are represented. Determining which counterfactual states minimize conditional uncertainty about hidden states corresponds to action selection (cf. Friston et al. 2012, p 4). Optimizing precision expectations enables action, which is ultimately realized by attenuating proprioceptive prediction error through classical reflex arcs (Brown et al. 2013, p 415). Anil Seth (2014) also emphasizes the importance of counterfactually-rich generative models: models that ‘‘[…] encode not only the likely causes of current sensory inputs, but also the likely causes of those sensory inputs predicted to occur given a large repertoire of possible (but not necessarily executed) actions […].’’ (p 2). Seth (2014) argues that counterfactually-rich generative models lead to the experience of perceptual presence (subjective veridicality). This suggests that counterfactual richness could possibly also play a role in explaining the phenomenal sense of mental agency. (II) What is a mental action? Here, we briefly review accounts proposed by Joe¨lle Proust (2013) and Wayne Wu (2013), respectively. According to Proust, mental actions depend on two factors: an informational need and a specific epistemic norm (cf. 2013, p 161). As an example of an informational need, Proust gives remembering the name of a play. Crucially, the agent should not be satisfied with any possible name that may pop into her mind. Rather, the agent must be motivated by epistemic norms like accuracy or coherence. Agents who are motivated by epistemic norms have epistemic feelings reflecting the extent to which fulfilling the informational need is a feasible task: ‘‘These feelings predict the probability for a presently activated disposition to fulfill the constraints associated with a given norm […].’’ (2013, p 162). Wu (2013) defines mental action as
123
S56 selecting a path in behavioral space with multiple inputs (memory contents) and outputs (possible kinds of behavior). The space of possible paths is constrained by intentions (cf. 2013, p 257). This is why it constitutes a kind of agency, according to Wu. (III) In what follows, we provide a list of possible conceptual connections between active inference and mental agency, as well as targets for future research. 1. Mental and bodily actions have similar causal enabling conditions. The initiation of mental as well as bodily action depends on the right kind of precision expectations. This renders mental and bodily actions structurally similar. Mental actions can be initiated at every level of the processing hierarchy. At each level, the magnitude of expected precisions may vary. As in bodily active inference, the precisions of prior beliefs about desired events must be high enough; otherwise, the informational need will simply be ignored. Furthermore, allocating attention may be a factor contributing to the success of mental actions (e.g., attending away from one’s surroundings if one wants to remember something). 2. The contents of a mental action cannot typically be determined at will prior to performing the action (cf. Proust 2013, p 151). Example: I cannot try to remember the name John Wayne. (But I can try to remember the name of the famous American western movie actor.) Similarly, in active inference, actions themselves need not be represented, only hidden states that are affected by action (cf. Friston et al. 2012, p 4). The system possesses counterfactual representations whose content is ‘‘[…] what we would infer about the world, if we sample it in particular way.’’ (Friston et al. 2012, p 2) In the case of perception, it could be the ‘‘[…] visual consequences of looking at a bird.’’ (p 4) In the case of remembering, it could be the consequences that the remembered content would produce in the generative model. A central question that remains to be answered here is to what extent this would call for an extension of the predictive processing framework, in the sense that counterfactuals about internal consequences would also have to be modeled. Interestingly, conducting mental actions may often be facilitated by refraining from certain bodily actions. Imagining, for instance, may be easier with one’s eyes closed. In terms of predictive processing, this means that visual input is predicted to be absent, and bodily action ensues in order to make the world conform to this prediction (i.e., one closes one’s eyes). 3. Both bodily and mental actions can be accompanied by a phenomenal sense of agency. For the sense of bodily agency, comparator models have been proposed (cf. Frith 2012; Seth 2013). For the sense of mental agency (cf. Metzinger 2013), at least the following questions need to be answered: (1) Is it possible to explain the sense of mental agency with reference to a comparison process? (2) If yes, what kinds of content are compared in this process? A possible mechanism could compare the predicted internal consequences with the actual changes in the generative model after the mental action has been performed. 4. Proust (2013) argues that mental agency is preceded and followed by epistemic feelings. The latter reflect the uncertainty that the right criteria for the conductance of a mental action have been chosen and that it has been performed in accordance with the chosen criteria. We speculate that the phenomenal certainty that a mental action will be successful depends both on the prior probability of future states, and on the conditional probabilities of those states given (internal) control states (thereby, it indirectly depends on counterfactual richness: the more possibilities to realize a future state, the higher the probability that the state will be obtained). 5. A possible problem for predictive processing accounts of mental agency arises from the role of attention. Predictive processing accounts define attention as the optimization of precision estimates
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 (cf. Feldman & Friston 2010; Hohwy 2012). Precision estimates, and therefore attention, play a crucial role both in active inference and in mental agency. However, some attentional processes, like volitional attention, have also been described as a kind of mental action (cf. Metzinger 2013, p 2; Hohwy 2013, pp 197–199). It is thus an open challenge to show how attentional processes that are a constitutive aspects of mental action differ from those that are a kind of mental action themselves.
Acknowledgment The authors are funded by the Barbara Wengeler foundation. References Brown H, Adams RA, Parees I, Edwards M, Friston K (2013) Active inference, sensory attenuation and illusions. Cogn Process 14(4):411–427 Feldman H, Friston KJ (2010) Attention, uncertainty, and free-energy. Front Human Neurosci 4:215 Friston K (2009) The free-energy principle: a rough guide to the brain? Trends Cogn Sci 13(7):293–301 Friston K, Adams RA, Perrinet L, Breakspear M (2012) Perceptions as hypotheses: saccades as experiments. Front Psychol 3:151 Friston K, Schwartenbeck P, FitzGerald T, Moutoussis M, Behrens T, Dolan RJ (2013) The anatomy of choice: active inference and agency. Front Human Neurosci 7 Frith C (2012) Explaining delusions of control: the comparator model 20 years on. Conscious Cogn 21(1):52–54 Hohwy J (2012) Attention and conscious perception in the hypothesis testing brain. Front Psychol 3:96 Hohwy J (2013) The predictive mind. Oxford University Press, Oxford Mathys C, Daunizeau J, Friston KJ, Stephan KE (2011) A Bayesian foundation for individual learning under uncertainty. Front Human Neurosci 5 Metzinger T (2013) The myth of cognitive agency: subpersonal thinking as a cyclically recurring loss of mental autonomy. Front Psychol 4:931 Moutoussis M, Fearon P, El-Deredy W, Dolan RJ, Friston KJ (2014) Bayesian inferences about the self (and others): a review. Conscious Cogn 25:67–76 Proust J (2013) Philosophy of metacognition: mental agency and selfawareness. Oxford University Press, Oxford Seth AK (2013) Interoceptive inference, emotion, and the embodied self. Trends Cogn Sci 17(11):565–573 Seth AK (2014) A predictive processing theory of sensorimotor contingencies: explaining the puzzle of perceptual presence and its absence in synesthesia. Cogn Neurosci 1–22 Wu W (2013) Mental action and the threat of automaticity. In Clark A, Kiverstein J, Vierkant T (eds) Decomposing the will. Oxford University Press, Oxford, pp 244–261
The N400 ERP component reflects implicit prediction error in the semantic system: further support from a connectionist model of word meaning Milena Rabovsky1, Daniel Schad2, Ken McRae3 1 Department of Psychology, Humboldt University at Berlin, Germany; 2 Charite´, Universita¨tsmedizin Berlin, Germany; 3 University of Western Ontario, London, Ontario, Canada Even though the N400 component of the event-related brain potential (ERP) is widely used to investigate language and semantic
Cogn Process (2014) 15 (Suppl 1):S1–S158 processing, the specific mechanisms underlying this component are still under active debate (Kutas, Federmeier 2011). To address this issue, Rabovsky and McRae (2014) recently used a feature-based connectionist attractor model of word meaning to simulate seven N400 effects. We observed a close correspondence between N400 amplitudes and semantic network error, that is, the difference between the activation pattern produced by the model over time and the activation pattern that would have been correct. Here, we present additional simulations further corroborating this relationship, using the same network as in our previous work, with 30 input units representing word form that directly map onto 2,526 semantic feature units representing word meaning, according to empirically derived semantic feature production norms (McRae et al. 2005). The present simulations focus on influences of orthographic neighbors, which are words that can be derived from a target by exchanging a single letter, preserving letter positions. Specifically, empirical ERP research has shown that words with many orthographic neighbors elicit larger N400 amplitudes. We found that a model analogue of this measure (i.e., the number of word form representations differing in a single input unit from the target) increases network error. Furthermore, the frequency of a word’s orthographic neighbors has been shown to play an important role, with orthographic neighbors that occur more frequently in language producing larger N400 amplitudes than orthographic neighbors that occur less frequently. Again, our simulations showed a similar influence on network error. In psychological terms, network error has been conceptualized as implicit prediction error, and we interpret our results as yielding further support for the notion that N400 amplitudes reflect implicit prediction error in the semantic system (McClelland 1994; Rabovsky, McRae 2014). References Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the N400 component of the event-related brain potential (ERP). Ann Rev Psychol 62:621–647 McClelland JL (1994) The interaction of nature and nurture in development: a parallel distributed processing perspective. In Bertelson PEP, d’Ydewalle G (ed) International perspectives on psychological science, vol 1. Erlbaum, UK McRae K, Cree GS, Seidenberg MS, McNorgan C (2005) Semantic feature production norms for a large set of living and nonliving things. Behav Res Methods 37(4):547–559 Rabovsky M, McRae K (2014) Simulating the N400 ERP component as semantic network error: insights from a feature-based connectionist attractor model of word meaning. Cognition 132:68–89
Similar and differing processes underlying carry and borrowing effects in addition and subtraction: evidence from eye-tracking Patricia Angela Radler1, Korbinian Moeller2,3, Stefan Huber2, Silvia Pixner1 1 Institute for Psychology, UMIT–Health and Life Sciences University, Hall, Tyrol, Austria; 2 Knowledge Media Research Center, Tu¨bingen, Germany; 3 Department of Psychology, Eberhard-Karls University, Tu¨bingen, Germany Keywords Eye fixation behavior, Addition, Subtraction, Carry-over, Borrowing Recent research indicated that investigating participants’ eye fixation behavior (Rayner 1998; Rakoczi 2012) can be informative to evaluate processes underlying numerical cognition (Geary et al. 1993;
S57 Green et al. 2007; Moeller et al. 2011a; Moeller et al. 2011b). However, so far, there are only few studies using this methodology to better understand the processes involved in mental arithmetic with a specific focus on addition (Green et al. 2007; Moeller et al. 2011a; Moeller et al. 2011b). In this context, Moeller and colleagues (2011b) suggested that successful application of the carry-over procedure in addition (e.g., 23 + 41 = 64 vs. 28 + 36 = 64) involves at least three underlying processes. First, the sum of the unit digits is computed already in first pass encoding (i.e., 3 + 1 = 4 vs. 8 + 6 = 14 in above examples). Second, based on this unit sum the need for a carry-over procedure is evaluated (with the need for a carry-over indicated by a unit sum of C 10). Third, the carry-over procedure has to be executed by finally adding the decade digit of the unit sum to the sum of the tens digits of the summands to derive the correct result (i.e., 2 + 4 + 0 = 6 vs. 2 + 3 + 1 = 6). Interestingly, the authors found that the first two processes were specifically associated with the processing of the unit digits of the summands reflecting increased processing demands when the sum of the unit digits becomes B 10 and it is recognized that a carry is needed. In particular, it was found that already during the initial encoding of the problem first fixation durations (FFD) on the second summand increased continuously with the sum of the unit digits indicating that unit sum indeed provides the basis for the decision whether a carry is needed or not. Additionally, after the need for a carry procedure was detected carry addition problems were associated with particular processing of the unit digits of both summands as indicated by an increase in refixations. In the current study, we aimed at valuation of how far these results of the specific processing of unit digits associated with the carry-over procedure in addition can be generalized to the borrowing procedure in subtraction. Similar to the case of the carry-over procedure, the necessity of a borrowing procedure can also be evaluated when processing the unit digit of the subtrahend during first pass encoding (i.e., by checking whether the difference of the unit digits of minuend and subtrahend is B 0) (Geary et al. 1993; Imbo et al. 2007). Furthermore, after the need for a borrowing procedure was detected, later processing stages may well involve particular processing of the unit digits of minuend and subtrahend. Therefore, we expected the influence of the necessity of a borrowing procedure in subtraction problems on participants’ eye fixation behavior to mirror the influence of the carry-over procedure in addition. Forty-five students [9 males, mean age: 23.9 years; SD = 7.2 years] solved both 48 addition and 48 subtraction problems in a choice reaction time paradigm. Their fixation behavior was recorded using an EyeLink 1000 eye-tracking device (SR-Research, Kanata, Canada) providing a spatial resolution of less than 0.5 degrees of visual angle at a sampling rate of 500 Hz. In a 2 x 2 design arithmetic procedure (addition vs. subtraction) and the necessity of a carry-over or borrowing procedure was manipulated orthogonally with problem size being matched. Problems were displayed in white against a black background in non-proportional font Courier New (style bold, size 50). Each problem was presented together with two solution probes of which participants had to indicate the correct one by pressing a corresponding button. The order, in which participants completed the addition and subtraction task, was counterbalanced across participants. For the analysis of the eye-tracking data, areas of interest were centered around each digit (height: 200 pixels, width: 59 pixel). All fixations falling within a respective area of interest were considered fixations upon the corresponding digit. Generally, additions were solved faster than subtractions (3,766 ms vs. 4,581 ms) and carry/borrow problems were associated with longer reaction times (4,783 ms vs. 3,564 ms). Importantly, however, effects of carry-over and borrowing were also observed in participants’ eye fixation behavior. Replicating previous results the necessity of a carry-over led to a specific increase of FFD on the unit digit of the second summand (323 ms vs. 265 ms) during first pass encoding. Interestingly, this was also observed for a required
123
S58 borrowing procedure. FFD were specifically elevated on the unit digits of the subtrahend (415 ms vs. 268 ms). However, in contrast to our hypothesis we did not observe such a congruity between the influences of carry in addition and borrowing in subtraction on later processing stages. While the need for a carry procedure led to a specific increase of the processing of the unit digits of both summands (as indicated by an increase of fixations on these digits, 2.04 vs. 1.55 fixations), this specificity was not found for borrowing subtraction problems, for which the number of fixations increased evenly on tens (2.15 vs. 1.74 fixations) and units (2.33 vs. 1.84 fixations) due to the need for a borrowing procedure. Taken together, these partly consistent but also differing results for the carry-over procedure in addition and the borrowing procedure in subtraction indicate that evaluating the need for both is associated with specific processing of the unit digit of the second operand (i.e., the second summand or the subtrahend). This is plausible, as in both addition and subtraction the sum or the difference between the unit digits is indicative of whether a carry-over or borrowing procedure is necessary. Importantly, both the sum of the unit digits as well as their difference can only be evaluated after having considered the unit digit of the second operand. However, later processes underlying the carry-over and borrowing procedure seem to differ. While the need for a carry procedure is associated with specific reprocessing of the unit digits of both summands this was not the case for a required borrowing procedure. Thereby, these data provide first direct evidence, suggesting that similar cognitive processes underlie the recognition whether a carry or borrowing procedure is needed to solve the problem at hand. On the other hand, further processing steps may differ between addition and subtraction. Future studies are needed to investigate the processes underlying the execution of the borrowing procedure in subtraction more closely.
References Geary DC, Frensch PA, Wiley JG (1993) Simple and complex mental subtraction: strategy choice and speed-of-processing differences in younger and older adults. Psychol Aging 8(2):242–256 Green HJ, Lemaire P, Dufau S (2007) Eye movements correlates of younger and older adults’ strategies for complex addition. Acta Psychol 125:257–278. doi:10.1016/j.actpsy.2006.08.001 Imbo I, Vandierendonck A, Vergauwe E (2007) The role of working memory in carrying and borrowing. Psychol Res 71:467–483. doi:10.1007/s00426-006-0044-8 Moeller K, Klein E, Nuerk H–C (2011a) (No) Small adults: children’s processing of carry addition problems. Dev Neuropsychol 36(6):702–720 Moeller K, Klein E, Nuerk H–C (2011b) Three processes underlying the carry effect in addition—evidence from eye tracking. Br J Psychol 102:623–645. doi:10.1111/j.2044-8295.2011.02034.x Rakoczi G (2012) Eye Tracking in Forschung und Lehre. Mo¨glichkeiten und Grenzen eines vielversprechenden Erkenntnismittels. In Gottfried C, Reichl F, Steiner A (Hrsg.), Digitale Medien: Werkzeuge fu¨r exzellente Forschung und Lehre (S. 87–98). Mu¨nster: Waxmann Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124:372–422
Simultaneous acquisition of words and syntax: contrasting implicit and explicit learning Patrick Rebuschat, Simon Ruiz Lancaster University, UK The topic of implicit learning plays a central role in cognitive psychology, and recent years have witnessed an increasing amount of
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 research dedicated to this issue. However, comparatively little research has focused on the implicit learning of vocabulary and, to our knowledge, no study has examined whether syntax and vocabulary can be acquired simultaneously. This is an important question, given that in language acquisition outside of the experimental lab, subjects are exposed to (and learn) many linguistic features at the same time. This paper reports the results of an experiment that investigated the implicit learning of second language (L2) syntax and vocabulary by adult learners. The linguistic focus was on verb placement in simple and complex sentences (Rebuschat, Williams 2009, 2012; Tagarelli, Borges, Rebuschat 2011, in press). The novel vocabulary items were ten pseudowords, taken from Hamrick and Rebuschat (2012, 2013). Sixty native speakers of English were exposed to an artificial language consisting of German syntax and English words, including ten pseudowords that followed English phonotactics. Subjects in the incidental group (n = 30) did not know they were going to be tested, nor that they were supposed to learn the grammar or vocabulary of a novel language. The exposure task required subjects to judge the semantic plausibility of 120 different sentences, e.g. ‘‘Chris placed today the boxes on the dobez’’ (plausible) and ‘‘Sarah covered usually the fields with dobez’’ (implausible). The task thus required subjects to process the sentences for meaning. Subjects were provided with a picture that matched the meaning of the pseudowords, in the examples above with a black-and-white drawing of a table underneath the sentence. Subjects in the intentional group (n = 30) read the same 120 sentences but were asked to discover the word-order rules and to memorize the meaning of the pseudowords. In the testing phase, all subjects completed two tests, a grammaticality judgment task to assess whether they had learned the novel syntax and a forced-choice task to assess their knowledge of the pseudowords. In both tasks, subjects were also asked to report how confident they were and to indicate what the basis of their judgment was. Confidence ratings and source attributions were employed to determine whether exposure had resulted in implicit or explicit knowledge (see Rebuschat 2013, for a review). Data collection has recently concluded but data have not yet been fully analyzed. Given our previous research (e.g. Tagarelli et al. 2011, in press; Grey, Williams, Rebuschat 2014; Rogers, Revesz, Rebuschat, in press; Rebuschat, Hamrick, Sachs, Riestenberg, Ziegler 2013), we predict that subjects will be able to acquire both the syntax and the vocabulary of the artificial language simultaneously and that the amount of implicit and explicit knowledge will vary depending on the learning context, with subjects in the incidental group acquiring primarily implicit knowledge and also some explicit knowledge, and vice versa in the intentional group The paper concludes with implications for future research. References Grey S, Williams JN, Rebuschat P (2014) Incidental exposure and L3 learning of morphosyntax. Stud Second Lang Acquis 36:1–34 Hamrick P, Rebuschat P (2012) How implicit is statistical learning? In Rebuschat P, Williams JN (eds) Statistical learning and language acquisition. Mouton de Gruyter, Berlin, pp 365–382 Hamrick P, Rebuschat P (2013) Frequency effects, learning conditions, and the development of implicit and explicit lexical knowledge. In Connor-Linton J, Amoroso L (eds) Measured language: quantitative approaches to acquisition, assessment, processing and variation. Georgetown University Press, Washington Rebuschat P, Williams JN (2009) Implicit learning of word order. In Taatgen NA, van Rijn H (eds) Proceedings of the 31st annual conference of the cognitive science society. Cognitive Science Society, Austin, pp 425–430 Rebuschat P (2013) Measuring implicit and explicit knowledge in second language research. Lang Learn 63(3):595–626
Cogn Process (2014) 15 (Suppl 1):S1–S158 Rebuschat P, Hamrick P, Sachs R, Riestenberg K, Ziegler N (2013) Implicit and explicit knowledge of form-meaning connections: evidence from subjective measures of awareness. In Bergsleithner J, Frota S, Yoshioka JK (eds) Noticing: L2 studies and essays in honor of Dick Schmidt. University of Hawaii Press, Honolulu, pp 255–275 Rogers J, Revesz A, Rebuschat P (in press) Implicit and explicit knowledge of L2 inflectional morphology: an incidental learning study Tagarelli KM, Borges Mota M, Rebuschat P (in press 2014) Working memory, learning context, and the acquisition of L2 syntax. In Zhisheng W, Borges Mota M, McNeill A (eds) Working memory in second language acquisition and processing: theory, research and commentary. Multilingual Matters, Bristol Tagarelli K, Borges Mota M, Rebuschat P (2011) The role of working memory in the implicit and explicit learning of languages. In Carlson L, Ho¨lscher C, Shipley T (eds) Proceedings of the 33rd annual conference of the cognitive science society. Cognitive Science Society, Austin, pp 2061–2066
Towards a model for anticipating human gestures in human-robot interactions in shared space Patrick Renner1, Thies Pfeiffer2, Sven Wachsmuth2 1 Artificial Intelligence Group, Bielefeld University, Germany; 2 CITEC, Bielefeld University, Germany Abstract Human-robot interaction in shared spaces might benefit from human skills of anticipating movements. We observed human-human interactions in a route planning scenario to identify relevant communication strategies with a focus on hand-eye coordination. Keywords Shared-space interaction, Hand-eye coordination, 3D eye tracking Introduction A current challenge in human-robot interaction is to advance from using robots as tools to solving tasks cooperatively with them in close interaction. When humans and robots interact in shared space, by the overlap of the peripersonal spaces of the interaction partners, an interaction space is formed (Nguyen and Wachsmuth 2011). Here, the actions of both partners have to be coordinated carefully in order to ensure a save cooperation as well as a flawless, successful task completion. This requires capabilities beyond collision avoidance, because the robot needs to signal a mutual understanding of situations where both interaction partners interfere. With a dynamic representation of its peripersonal space (Holthaus and Wachsmuth 2012), a robot can be aware of its immediate surrounding and this way, e.g., avoid collisions before they are actually perceived as a potentially harmful situation. However, shared-space interactions of humans and robots are still far from being as efficient as those between humans. Modeling human skills for anticipating movements could help robots to increase robustness and smoothness of shared-space interactions. Our eyes often rest on objects we want to use or to refer to. In a specific pointing task, Prablanc et al. (1979) found that the first saccade to the target occurs around 100 ms before the hand movement is initiated. If the robot were able to follow human eye gaze and to predict upcoming human gestures, several levels of interaction could be improved: First, anticipated gesture trajectories could be considered during action planning to avoid potentially occupied areas. Second, action executions could be stopped if the robot estimates a human movement conflicting with its current target. Third, the robot could turn its sensors towards the estimated target to facilitate communication robustness and increase the human’s confidence in the grounding of the current target (Breazeal et al. 2005).
S59 Shared-space interaction study Identifying corresponding human communication strategies requires studying humans in free interaction. Therefore, we investigate faceto-face, goal-oriented interactions in a natural setting which comprises spatial references with gaze and pointing gestures. In a route planning scenario, participants are to plan paths to rooms on three floors of a university building. The three corresponding floor plans are located on a table between them. The scale of the 32x32 cm plans is approximately 1:180, each floor having about 60 rooms. The floor plans are printed on a DIN A0 format poster. This way, each participant has one floor plan directly in front of him or her, one is shared with the interaction partner, and one plan is not reachable. The difficulty of the task is increased by introducing blocked areas in the hallways: Detours have to be planned (forcing participants to repeatedly change the floor), which lead to more complex interactions ensuring a lively interaction and not a rigid experimental design with artificial stimuli. Recorded data In the experiments, multimodal data were recorded: Two video cameras observed the participants during the interactions. One participant was equipped with mobile eye-tracking glasses. Pointing directions and head positions of both participants were recorded by an external tracking system. As analyzing eye-tracking data usually requires time-consuming manual annotation, an automatic approach was developed combining fiducial marker tracking and 3D-modeling of stimuli in virtual reality as proxies for intersection testing between the calculated line of sight and the real objects Pfeiffer and Renner (2014). The occurrence of pointing gestures to rooms, stairs, elevators and markers for blocked areas were annotated semiautomatically. Results The results of our experiments show that at the time of a pointing gesture’s onset, it is indeed possible to predict its target area when taking into consideration fixations which occurred in the last 200 ms before the onset. When allowing a maximum deviation of 20 cm the target area was predicted for 75 % of the cases and with a maximum deviation of 10 cm for 50 % of the cases. Figure 1 shows an example of a fixation on the target of a pointing gesture, preceding the hand movement. In the same study, we also analyzed body movements, in
Fig. 1 An example for a fixation (highlighted by the ring) anticipating the pointing target. The three floor plans can be seen. The black tokens serve for marking blocked areas
123
S60 particular leaning forward: Participants almost exclusively used leaning forward to point to targets more distant than 65 cm (from the edge of the table). Conclusion Altogether, our findings provide quantitative data to develop a prediction model considering both eye-hand coordination and leaning forward. This could enable the robot to have a detailed concept of an upcoming human pointing movement. For example, based on current gaze information of its interlocutor, a robot could predict that a starting pointing gesture would end within a 20 cm radius around the currently fixated point (with a 75 % chance). This will allow the robot to decide whether the predicated target space is in conflict with its own planned actions and it might react accordingly, e.g. by avoiding the area or pausing. Acknowledgments This work has been partly funded by the DFG in the SFB 673 Alignment in Communication. References Breazeal C, Kidd CD, Thomaz AL, Hoffman G, Berlin M (2005) Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In: Intelligent Robots and Systems 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on, IEEE, pp 708–713 Holthaus P, Wachsmuth S (2012) Active Peripersonal Space for More Intuitive HRI. In: International Conference on Humanoid Robots, IEEE RAS, Osaka, Japan, pp 508–513 Nguyen N, Wachsmuth I (2011) From body space to interaction space-modeling spatial cooperation for virtual humans. In: 10th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, Taipei, Taiwan, pp 1047–1054 Pfeiffer T, Renner P (2014) EyeSee3D: A Low-cost Approach for Analyzing Mobile 3D Eye Tracking Data Using Computer Vision and Augmented Reality Technology. In: Proceedings of the Symposium on Eye Tracking Research and Applications, ACM, pp 195–202 Prablanc C, Echallier J, Komilis E, Jeannerod M (1979) Optimal response of eye and hand motor systems in pointing at a visual target. Biological cybernetics 124 pp 113–124
Preserved expert object recognition in a case of unilateral visual agnosia Johannes Rennig, Hans-Otto Karnath, Marc Himmelbach Center of Neurology, Hertie-Institute for Clinical Brain Research, University of Tu¨bingen, Tu¨bingen, Germany We examined a stroke patient (HWS) with a unilateral lesion of the right medial ventral visual stream. A high resolution MR scan showed a severe involvement of the fusiform and parahippocampal gyri sparing big parts of the lingual gyrus. In a number of object recognition tests with lateralized presentations of target stimuli, HWS showed remarkable deficits for contralesional presentations only. His performance on the ipsilesional side was unaffected. We further explored his residual capabilities in object recognition confronting him with objects he was an expert for. These were items he knew from his job as a trained car mechanic that were occupationally and personally relevant for him. Surprisingly, HWS was able to identify these complex and specific objects on the contralesional side while he failed in recognizing even highly familiar everyday objects. This observation of preserved expert object recognition in visual agnosia gives room for several explanations. At first, these results may be caused by enhanced information processing of the ventral system in the intact hemisphere that is exclusively available for expert objects.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 On the other hand, expert knowledge could also trigger top-down mechanisms supporting object recognition despite of impaired basic functions of object processing. Finally, a more efficient stimulus processing for expert objects might simply not require complete resources of an intact ventral stream.
Visual salience in human landmark selection Florian Ro¨ser, Kai Hamburger University of Giessen, Germany Abstract Visual aspects of landmarks are a main component in almost every theory about landmark preference, selection and definition theory. But could this aspect be moderated by some other factors, for example the object’s position? Keywords Spatial cognition, Landmarks, Visual salience Introduction Visual aspects of objects play an elementary role in the landmark selection process during a wayfinding task (Sorrows, Hirtle 1999; for an overview see Caduff, Timpf 2008). Thereby the contrast to the surrounding of the object is elementary, namely the contrast to other objects. Our assumption is that the visual aspect of an object in a wayfinding context is only necessary to recognize this object. The decision in favor of or against an object will be based on other, more cognitive aspects, for example an object’s position. Two contrary assumptions exist. On the one side preliminary experiments showed that the ideal position at an intersection (allocentric perspective) is the position in front of the intersection in the direction of turn (Ro¨ser, Hamburger, Krumnack, Knauff 2012). On the other side it has been shown that in an arrangement of different objects the ‘‘pop-out’’ object (‘‘single one’’ or singleton) will be preferred (Ro¨ser, Krumnack, Hamburger 2013). Here we want to discuss in how far these two contrasting assumptions go together and which influence different tasks or instructions can have. Experiment Method A total of 32 students (21 $;ø age: 27 years; range: 19–56) participated. All participants provided informed written consent. All had normal or corrected-to-normal visual acuity and color vision (tested with Velhagen and Broschmann 2003). They received course credits or money for participation. Materials and Procedure The material existed of four grey (apartment) blocks with one white square each at the medially oriented corner. These should represent the facades of the building (Ro¨ser et al. 2012) at an intersection. Within these blocks the different objects are placed; we call them landmarks. This is due to the fact that they could help to orientate in such an environment. All landmark objects consist of a cross and five thin lines in different arrangements so that they are generally distinct (Fig. 1). Three of these landmarks had the same color, one was different (singleton). The color differences range from 0 to 180. The color gradient is visible in Fig. 1 (left top and bottom). The single one was presented once at each position at the intersection and once at each position for a left and right turn (2nd and 3rd experimental condition). This results in 64 different pictures/intersections which were presented in a randomized order. Each participant was assigned to one of three experimental conditions. In the first condition (intersection) the intersections were presented without a route direction arrow and the task was to choose the object that pops out most. In the second one (intersection and arrow) an arrow indicated the direction in which a change of route
Cogn Process (2014) 15 (Suppl 1):S1–S158
S61 N, Wachsmuth I (eds) Proceedings of the 35th annual conference of the cognitive science society cognitive science society, Austin, TX, pp 3315–3320 Sorrows ME, Hirtle SC (1999) The nature of landmarks for real and electronic spaces. In: Freksa C, Mark DM (eds) Spatial information theory: cognitive and computational foundations of geographic information science, international conference COSIT. Springer, Stade, pp 37–50 Velhagen K, Broschmann D (2003) Tafeln zur Pru¨fung des Farbsinns. 33., unvera¨nderte Auflage. Georg Thieme Verlag, Stuttgart
Fig. 1 Left (top and bottom) the used colors of the objects and the color gradient. Top (middle and right) examples of the intersections with and without an arrow. Bottom (right) results. The x-axis represents the single experimental variation (low difference on the left and high on the right). The y-axis represents the participants’ relative object selection of the single object (percentage number) was about to happen (Fig. 1); the task still remained the same as in the first condition. In the third condition the intersections looked the same as in the second one, but now the task was to choose the object, which the participant would use in order to give a route description. All experiments were run on the same computer (19 inches). Results Figure 1 (bottom right) depicts the frequency for choosing the single object. In the condition ‘‘intersection’’ it can be seen that the single object is clearly identifiable by a color difference of 11 on (*100 %). 0 and 3 are on chance level and 6 something between. A similar result is observable for the condition ‘‘intersection and arrow’’. The remaining condition, in which the participants had to decide which object they would prefer to give a route description, shows a different curve. First of all, it increases slower and secondly, it reaches its top at around 60 %. On the other hand participants chose the ideal position in 60 % of the cases This differs significantly from chance level (t(9) = 4.576, p = .001). Discussion Participants are capable to identify the single object, if the color difference exceeds 6. The instruction ‘‘which one would you choose to give a route description’’ led to different landmark selections. Here the position seems to play a major role. Thus, we may conclude that the perception of the color distribution at the intersection is moderated by the task at hand. One interpretation could be that the contrast to the surrounding of landmarks at an intersection is strongly moderated by the participant’s task. This will be examined in more detail in further experiments. Acknowledgments We thank Anna Bosch and Sarah Jane Abbott for help within data recording. References Caduff D, Timpf S (2008) On the assessment of landmark salience for human navigation Cog Pro 9:249–267 Klippel A, Winter S (2005) Structural salience of landmarks for route directions. In: AG Cohn, DM Mark (Eds) Spatial information theory. International Conference COSIT, Springer, Berlin pp 346–362 Ro¨ser F, Hamburger K, Krumnack A, Knauff M (2012) The structural salience of landmarks: Results from an online study and a virtual environment experiment. J of Spatial Science 5, 37–50 Ro¨ser F, Krumnack A, Hamburger K (2013) The influence of perceptual and structural salience. In: Knauff M, Pauen M, Sebanz
Left to right or back to front? The spatial flexibility of time Susana Ruiz Ferna´ndez1, Juan Jose´ Rahona2, Martin Lachmair1 1 Leibniz Knowledge Media Research Center (KMRC), Tu¨bingen, Germany; 2 Complutense University, Madrid, Spain How is time represented in space? Strong evidence was found for a spatial representation of time that goes from left-to-right with past represented on the left and future on the right side (Santiago et al. 2007). There is also evidence for a back-to-front timeline with past represented behind and future ahead (Ulrich et al. 2012). Based on the notion of a flexible representation of time onto space (Torralbo et al. 2006) the present study compared both time representations directly. Embodied theories suggest that internal representations of abstract concepts include multimodal perceptual and motor experiences. Assuming richer back-to-front spatial experiences through our senses, we expect faster responses for the back-to-front than for the left-toright response mapping. Method After the presentation of a future or past related word (e.g., yesterday, tomorrow), forty-four participants (all right handed) had to classify the time word moving the slider of a response device along one of two axes (left-to-right axis or back-to-front axis) according to the temporal content of the words. In the congruent condition, participants had to perform a right or forward movement in response to a futurerelated word and a left or backward movement if a past-related word was presented. In the incongruent condition, a backward or left movement was performed in response to a future-related word and a forward or right movement in response to a past-related word. For the performance of the movement, a response device was used that recorded continuous movements of the manual response in the left-toright and the back-to-front plane (see Ulrich et al. 2012). Touchsensitive devices registered the onset of the response and the time when the slider of the device reached one of the two endpoints. Reaction time (RT) required from the onset of the presentation of the word to the onset of the response (leaving the start position of the slider) was measured. Additionally, the movement time (MT) required from response onset to one of the two endpoints was measured. Depending on the required response axis response device was rotated by 90 or 180. The experiment consisted of four experimental blocks. The experiment combined the factors response axis (back-to-front vs. leftto-right) and response congruency (congruent: forward or right movement to future-related words and backward or left movement to past-related words vs. incongruent: forward or right movement to past related words and backward or left movement to future-related words). Each combination resulted in one block that included 120 trials (including 20 practice trials). Separate repeated measures analyzes of variance (ANOVA) were conducted on mean RT and mean MT taking participants (F1) as well as items (F2) as random factors. When necessary, p-values were adjusted for violations of the sphericity assumption using the Greenhouse-Geisser correction.
123
S62
Cogn Process (2014) 15 (Suppl 1):S1–S158 Torralbo A, Santiago J, Lupian˜ez J (2006) Flexible conceptual projection of time onto spatial frames of reference. Cogn Sci 30:745–757 Ulrich U, Eikmeier V, de la Vega I, Ruiz Ferna´ndez S, Alex-Ruf S, Maienborn C (2012) With the past behind and the future ahead: Back-to-front representation of past and future sentences. Mem Cognit 40:483–495
Smart goals, slow habits? Individual differences in processing speed and working memory capacity moderate the balance between habitual and goaldirected choice behavior
Fig. 1 Mean RT depending on response congruency and response axis Results RT results are showed in Fig. 1, which depicts mean RT as a function of response congruency and response axis. An ANOVA on RT showed shorter RT for the congruent (768.84 ms) compared to the incongruent condition (812.15 ms), F1 (1, 43) = 29.42, p \ .001; F2 (1, 19) = 104.37, p \ .001. Participants were faster initiating a right or forward movement for future-related words and a left or backward movement for past-related words than initiating a left or backward movement for future-related words and a right or forward movement for past-related words. RT were marginally shorter for the left-to-right axis (783.07 ms) compared to the back-to-front axis (797.92 ms), F1 (1, 43) = 3.73, p = .060; F2 (1, 19) = 14.06, p = .001. The interaction between response congruency and response axis failed significance, F1 and F2 \ 1. An ANOVA on MT did not reveal significant effects for response congruency [F1 (1, 43) = 0.52, p = .476; F2 (1, 19) = 2.59, p = .124], response axis [F1 (1, 43) = 1.58, p = .216], and interaction [F1 and F2 \ 1]. Only the F2-analysis revealed an effect of response axis, F2 (1, 19) = 23.04, p \ .001. Accordingly, response congruency and response axis affected movement initiation but not movement execution. Discussion Results support a flexible projection of time into space. Unexpectedly, a trend to faster responses for the left-to-right mapping was found, suggesting an influence of reading direction on response axis. A possible explanation is that reading temporal words could activate the left-to-right response axis. This activation needs to be inhibited when a front-to-back response is performed. This explanation is supported by recent experiments that show a higher activation of the time–space congruency when visual (instead of auditory) stimuli were used (Rolke et al. 2013). Acknowledgments We thank R. Bahlinger, V. Engel, N. Feldmann, P Huber, S. Kaiser, J. Kinzel, H. Kriening, S. Riedel, K. Wessolowski, E. Wiedemann and K. Zeeb for their assistance. References Rolke B, Ruiz Ferna´ndez S, Schmid M, Walker M, Lachmair M, Rahona Lo´pez JJ, Herva´s G, Va´zquez C (2013) Priming the mental time-line: Effects of modality and processing mode. Cogn Process 14:231–244 Santiago J, Lupia´n˜ez J, Pe´rez E, Funes MJ (2007) Time (also) flies from left to right. Psychon B Rev 14:512–516
123
Daniel Schad1, Elisabeth Ju¨nger2, Miriam Sebold1, Maria Garbusow2, Nadine Bernhart2, Amir Homayoun Javadi3, Ulrich S. Zimmermann2, Michael Smolka2, Andreas Heinz1, Michael A. Rapp4, Quentin Huys5 1 Charite´, Universita¨tsmedizin Berlin, Germany; 2 Technische Universita¨t Dresden, Germany; 3 University College London (UCL), UK; 4 Universita¨t Potsdam, Germany; 5 TNU, ETH und Universita¨t Zu¨rich, Switzerland Choice behavior is shaped by cognitively demanding goal-directed and by more automatic habitual processes. External cognitive load manipulations alter the balance of these systems. However, it is unclear how individual differences in specific cognitive abilities contribute to the arbitration between habitual and goal-directed decision-making. 29 adults performed a two-step decision task explicitly designed to capture the two systems’ computational characteristics. We also collected measure of fluid and crystalline intelligence. There was an inverted U-shape relationship between processing speed and habitual choice together with a linear relationship between processing speed and goal-directed behavior. Working memory capacity impacted on this balance only amongst those subjects with high processing speed. Different aspects of intelligence have specific contributions to complex human decision-making. Individual differences in such cognitive abilities moderate the balance between habitual and goaldirected choice behavior.
Tracing the time course of n 2 2 repetition costs Juliane Scheil, Thomas Kleinsorge Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany Introduction In order to flexibly adapt to a permanently changing environment, it is necessary to inhibit previously activated but now irrelevant processing pathways. Empirically, this inhibition manifests itself only indirectly in terms of a cost of reengaging a previously inhibited pathway, that is, the so-called n - 2 repetition costs: When switching among three tasks A, B, and C, higher reaction times and error rates occur when the task in the current trial equals the task in trial n - 2 (i.e., sequences of type ABA) compared to two consecutive switches to another task (sequences CBA). Although n - 2 repetition costs have been reported in many studies, it remains an open question when and how inhibition is triggered and how it develops over time. A possibility to capture the time course of inhibition lies in the variation of different time intervals in the cued task switching paradigm. The cue-stimulus interval (CSI) allows participants to prepare
Cogn Process (2014) 15 (Suppl 1):S1–S158
S63
for the next task. On the other hand, no specific preparation for the next task is possible during the response-cue interval (RCI), in which usually a fixation mark is presented that contains no information about the next task. Therefore, effects of the RCI cover passive processes, like decaying inhibition or activation. The present study aimed at investigating the time course of inhibition in a fine-grained manner. For this purpose, the length of the RCI (the time between the response of trial n - 1 and the cue of trial n) was manipulated in five steps separated by 125 ms each. This allowed us to capture also non-linear trends of the size of n - 2 repetition costs that could be overlooked in designs using only two distinct RCIs. Method In two experiments, subjects (Exp I: 10 men, 21 women, mean age 23.8 years; Exp II: 6 men, 15 women, mean age 22.7 years) switched between three tasks in an explicitly cued task switching experiment. In Exp I, participants had to indicate via keypress whether the digit serving as imperative stimulus is smaller or larger than five, odd or even, or regarding its position along the number line relative to the digit five (central or peripheral). In Exp II, participants had to judge shapes regarding their size (big or small), color (yellow or blue), or shape (x or +). Stimuli were presented centrally on a 17’’ monitor on light-grey background. Viewing distance approximated 60 cm. The experimental design resulted from a factorial combination of the within-subjects factors RCI, varied in five steps (50, 175, 300, 425, and 550 ms), and Task Sequence (ABA vs. CBA). Results Both experiments revealed significant n - 2 repetition costs that were modulated by the RCI. Costs were highest for RCIs of 300 ms and differed significantly from those of RCIs of length 50 and 175 ms (Experiment I and II), 425 ms (Experiment I), and 550 ms (Experiment II, cf. Fig. 1). Discussion In both experiments, the size of n - 2 repetition costs was modulated by the length of the RCI. Highest n - 2 repetition costs could be observed for the RCI of 300 ms, while they were smaller for shorter RCIs (50 ms or 175 ms). Furthermore, the size of n - 2 repetition costs declined again when the RCI exceeded 300 ms, that is, with a RCI of 425 and 550 ms. This pattern can be interpreted in terms of an overlap of two different time courses involved in inhibition. On the one hand, inhibition seems to need about 200–300 ms to reach its full extent, reflecting a process of building up a sufficient amount of inhibition in order to cope with interference of recently established task sets. Importantly, while there have been
Experiment I
Experiment II * *
100
*
n - 2 repetition cost [ms]
**
80
** *
60
40
20
0 50
175
300
425
550
50
175
300
425
550
RCI [ms]
Fig. 1 Mean n - 2 repetition cost [ms] as a function of RCI [ms] (*p \ .05; **p \ .01). Error bars represent SEM
investigations focusing on how and when inhibitory processes decline, the present study is the first trying to identify the time needed for inhibition to build up On the other hand, our results suggest that n - 2 repetition costs, after reaching their maximum at about 300 ms, start to decay. Therefore, the results are in line with the assumption of inhibition that, once exerted, decays during the RCI.
Language cues in the formation of hierarchical representation of space Wiebke Schick, Marc Halfmann, Gregor Hardiess, Hanspeter A. Mallot Cognitive Neuroscience, Dept of Biology, University of Tu¨bingen, Germany Keywords Region effect, Linguistic categories, Whole-part-relations, Interaction language, spatial knowledge The formation of a hierarchical representation of space can be induced by the spatial adjacency of landmark objects belonging to the same semantic category, as was demonstrated in a route planning experiment (Wiener, Mallot 2003). Using the same paradigm, we tested the efficiency of linguistic cues with various hierarchical categorization principles in regional structuring. In different conditions, the experimental environment was parceled (i) with landmarks of different semantic categories, (ii) with superordinate fictive proper names, (iii) with superordinate prototypical names, (iv) with names from different linguistic semantic categories, and (v) with holonymmeronym relations (semantic whole-parts relation). A region effect comparable to the landmark-object condition was found only for the holonym-meronym condition which combined spatial proximity with a shared context. Wiener, Mallot (2003) investigated the influence of regions on human route planning behavior in a hexagonal, iterated y-maze in a virtual environment. All of the 12 decision places were marked by a landmark belonging to one of three different semantic categories (vehicles, animals and paintings), thus defining three regions composed of four adjacent places. When asked to navigate routes which allowed for two equidistant alternatives, subjects consistently preferred the one that crossed fewer regional borders (61.6 % against chance level). These routes also passed more places of the target region. In the actual investigation, we repeated the experiment and also modified it to test whether such a region-perception can be evoked linguistically as well. Procedure The test phase consisted of 18 navigation trials including 12 with equidistant but region-sensitive route alternatives, and six distractors. The subjects were asked to choose the shortest route passing three places and had access to the place names on a second screen. Participants Only the data of those who performed at least 50 % of the test routes correctly were included in the analysis. This applied to 65 subjects (37 female, 28 male, all 19–43 years of age). Variables of interest Test trials allowed for two equidistant route alternatives to the goal, differing in the amount of region boundaries that had to be crossed. We call the route choices with the smaller number of region-crossings ‘‘region-consistent’’ and count the total number of region-consistent routes for each subject, expecting a chance level of 50 % if route choice was based solely on distance. Significant preference for one route type is regarded as evidence for a regionalized representation of
123
S64 the experimental environment. We also measured the navigational errors. Results The results of the landmark-condition confirmed the findings by Wiener, Mallot (2003). For the linguistic conditions, higher error rates as well as strong differences in the prevalence of region-consistent route choices were found. A significant preference was found only for the holonym-meronym’’ condition. We therefore suggest that language-based induction of hierarchies must be in itself of spatial nature to induce a regionalized representation of space. Reference Wiener JM, Mallot HA (2003) ‘Fine-to-Coarse’ route planning and navigation in regionalized environments. Spatial Cogn Comput 3(4):331–358
Processing of co-articulated place information in lexical access Ulrike Schild1, Claudia Teickner2, Claudia K. Friedrich1 1 University of Tu¨bingen, Germany; 2 University of Hamburg, Germany Listeners do not have any trouble identifying assimilated word forms such as the spoken string ,,gardem bench‘‘as an instance of ,,garden bench‘‘. Assimilation of place of articulation, such as the coronal place of articulation of the final speech sound of ,,garden‘‘to the dorsal place of articulation of the initial speech sound of ,,bench‘‘is common in continuous speech. It is a matter of debate how the recognition system handles systematic variation resulting from assimilation. Here we test the processing of place variation as soon as appears in the signal. We used co-articulated information in speech sounds. For example, the/o/in ,,jog‘‘has already encoded the dorsal place of articulation of the following/g/. It is still a matter of debate whether subphonemic information is normalized at a pre-lexical level of representation or is maintained and used for lexical access. On the one hand, many traditional models of spoken word recognition such as Cohort (Marslen-Wilson 1987) or TRACE (McClelland 1986) favor abstract pre-lexical representations. Here, sub-phonemic variation is resolved at a prelexical level. On the other hand, full-listing exemplar approaches (Goldinger 1998) assume that phonetic detail is fully represented in lexical access with no need of pre-lexical representations that normalize for variation. Variation in co-articulation information should be less disruptive in the former than in the later account. Somewhere in-between both types of models, the featurally underspecified lexicon (FUL) model (Lahiri and Reetz 2002) avoids pre-lexical representations by means of sparse abstract lexical representations storing only those features that do not frequently undergo variation in the signal. According to FUL, non-coronal features like the labial or dorsal place of articulation are stored in the lexicon. If the input contains another place of articulation the respective candidate is not further considered. For example, ,,foan‘‘would not be able to activate ,,foam‘‘. By contrast, coronal place features are not stored in the lexicon. Thus, utterances containing a coronal feature at a certain position should be activated by any input containing a noncoronal feature at that position. For example ,,gardem ‘‘can activate ,,garden‘‘. Here we investigate the processing of co-articulatory place information in cross-modal word onset priming. We presented 41 German target words with coronal place of the word medial consonant (e.g., ,,Rinne‘‘, Engl., chute), and 41 German target words with non-coronal place of the word medial consonant (e.g. ,,Dogge‘‘, Engl., mastiff). In addition we presented 41 pseudowords that diverged from
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 the coronal target words in medial place (e.g., ,,Rimme‘‘), and 41 pseudowords that diverged from the non-coronal targets in medial place (e.g., ,,Dodde‘‘). Spoken prime fragments were preceding the visual target words and pseudowords, which were presented in capitals. In Experiment 1, the spoken primes were the onsets of the target words and of the pseudowords up to the first nucleus. Those cvprimes differed only in the place feature co-articulated in the vowel, such as ,,ri[n]‘‘and ,,ri[m]‘‘. In Experiment 2, the spoken primes were the onsets of the target words and of the pseudowords up to the consonant following the first nucleus. Those cvc-primes differed in a complete phoneme, such as ,,rin‘‘and ,,rim‘‘. In a Match condition, the primes were followed by their carrier words (e.g., ,,rin‘‘-RINNE) or carrier pseudowords (e.g., ,,rim‘‘-*RIMME), in a Variation condition, the primes were followed by their respective pseudoword pair member (e.g., ,,rin‘‘-*RIMME) or their respective word pair member (e.g., ,,rim‘‘-RINNE). Unrelated prime-target pairs were taken as controls (,,dog‘‘-RINNE). Taking together, we manipulated Condition (Match vs. Variation vs. Control), Lexicality (words vs. pseudowords) and word medial place of the target (coronal vs. non-coronal) as within-subject factors; and Prime Length (cv-primes in Experiment 1 vs. cvc-prime in Experiment 2) as between-subject factor. Parallel to classical psycholinguistic research, we analyzed only the first presentation of the target. Presentation order was counterbalanced across participants. With respect to the role of features in lexical access, we tested whether word recognition cascades from features to the lexicon. If so, we should not find different results for cv-primes vs. cvc-primes. With respect to a pre-lexical level of representation, we tested whether subphonemic variation is maintained up to the lexical level. If so, the Match condition and the Variation condition should differ for words, but not for pseudowords in Experiment 1. With respect to the assumptions of the FUL model, we tested whether lexical representations are sparse for coronal place. If so, responses to the Match condition and to the Variation should only differ for non-coronal targets, but not for coronal targets. Results of four-way ANOVA with the factors Prime Length (Experiment 1 vs. Experiment 2), Lexicality (Word Targets vs. Pseudoword Targets), Place (Targets with Coronal Segment vs. Targets with Non-coronal Segment) and Condition (Match vs. Variation vs. Control) are informative for our hypotheses. First, there was no significant interaction with the factor Prime Length. That is, behavioral results were comparable across both experiments. This is support for cascaded activation of lexical representations from features to word forms. Second, there was an interaction of the factors Condition and Lexicality. For word targets and for pseudoword targets, responses were slowest for the Control condition. For pseudowords, the Match condition and the Variation condition did not differ. However, for words, responses for the Match condition were faster than responses for the Variation condition (Fig. 1, left panel). This is support for the assumption that the lexicon is involved in processing sub-phonemic variation. Third, there was an interaction of the factors Condition and Place. Responses to coronal targets in the Match condition and in the Variation condition did not differ from each other, but both were faster than responses in the control condition. Responses to non-coronal targets were fastest in the Match condition, intermediate in the Variation condition and slowest in the Control condition (Fig. 1, right panel). This is evidence for the assumption of unspecified coronal place. However, it does not appear that this effect is mediated by the lexicon because it was not modulated by the factor Lexicality. The results suggest that information of anticipatory co-articulation is maintained and used in lexical access. Completely matching information activates the target words lexical representation more effectively than partially mismatching information. Even subtle subphonemic variation reduces lexical activation. Thus, subphonemic
Cogn Process (2014) 15 (Suppl 1):S1–S158 Lexicality x Condition
850
S65 PLACE x Condition
800
RT [ms]
750
700
650
600
words pseudowords
550 0
Match
Variation
Control
coronal non-coronal
Match
Variation
Control
Fig. 1 Shown are mean lexical decision latencies collapsed across both experiments. The left side illustrates responses to words (black) and pseudowords (white), the right side illustrates responses to coronal targets (black) and non-coronal targets (white) in the Match condition, in the Variation condition and in the Control respectively. Error bars indicate standard errors detail appears to be used for lexical access in a similar way as phonemic information. Furthermore, our results are evidence for the FUL model. References Goldinger SD (1998) Echoes of echoes? An episodic theory of lexical access. Psychol Rev 105(2):251–279 Lahiri A, Reetz H, Gussenhoven C, Warner N (2002) Underspecified recognition. Mouton de Gruyter, Berlin, pp 638–675 Marslen-Wilson WD (1987) Functional parallelism in spoken wordrecognition. Cogn Int J Cogn Sci 25(1–2):71–102 McClelland JL, Elman JL (1986) The TRACE model of speech perception. Cogn Psychol 18(1):1–86
Disentangling the role of inhibition and emotional coding on spatial stimulus devaluation Christine Scholtes, Kerstin Dittrich, Karl Christoph Klauer Universita¨t Freiburg, Abteilung Sozialpsychologie und Methodenlehre, Germany Keywords Spatial position, Stimulus devaluation, Emotional coding, Edge aversion, Eye tracking In a study investigating the influence of visual selective attention on affective evaluation, Raymond, Fenske, and Tavassoli (2003) observed a distractor devaluation effect: Previously to-beignored stimuli were emotionally devaluated compared to to-beselected stimuli and neutral stimuli not previously presented. According to Raymond et al. (2003), this stimulus devaluation can be explained by assuming that cognitive inhibition is applied to the to-be-ignored stimulus. This inhibition is assumed to be stored with the mental representation of this stimulus and applied to the evaluation task where the stimulus is presented again. Another explanation account is provided by Dittrich and Klauer (2012). According to their account, the act of ignoring leads to a negative emotional code on this to-be-ignored stimulus. This negative code is assumed to be stored with the mental representation of the to-beignored stimulus leading to devaluation when the stimulus is encountered again.
Aside from ignoring, also the spatial position of a stimulus has proven to influence the evaluation of (e.g., Valenzuela and Raghubir 2009, 2010) and the preference for (e.g., Christenfeld 1995; Rodway, Shepman and Lambert 2012) certain stimuli. Meier and Robinson (2004) showed that upper positions are associated with positive affect and lower positions with negative affect. In another study, products in a supermarket context were evaluated as more expensive when presented in upper shelves compared to lower positioned products (Valenzuela and Raghubir 2010). In horizontal arrangements though, there is evidence for an advantage of the central stimulus position. In several studies, participants preferred the central stimulus to laterally presented stimuli (e.g., Christenfeld 1995; Rodway et al. 2012; Valenzuela and Raghubir 2009). This effect pattern was called centerstage-effect by Valenzuela and Raghubir (2009). Attali and Bar-Hillel (2003) suggested that this pattern is not based on preferences for the central position but might rather be explained with an aversion against the edges of this stimulus configuration. The present research combines affective stimulus devaluation and the concept of spatial position effects and measures their influence on later stimulus evaluation. It is assumed that lateral stimuli will be devaluated either due to a negative code which is applied to them (via edge aversion; extending the emotional coding account to other possible stimulus-connoting factors such as spatial position) or due to a (passive) inhibition applied to lateral positions compared to central products and comparable baseline stimuli. Moreover, the present research is set on disentangling these just mentioned possible explanation accounts of this lateral devaluation effect by using a combination of stimulus evaluation patterns and eye tracking measurements. Experiment 1 (N = 20) was conducted to investigate the affective evaluations of centrally and laterally presented products compared to neutral baseline stimuli. In a presentation task, three cosmetics were presented simultaneously in a row. The subsequent evaluation task revealed a devaluation of lateral stimuli compared to central and— more important—compared to baseline stimuli. This lateral devaluation below baseline level is a new finding which points to a bias of the edges and not to a center-stage-effect when comparing central and lateral stimuli. However, the underlying mechanisms that might have led to this lateral devaluation are not solved yet. A devaluation of lateral products might either base on affective coding—a positively connoted center position contrasted to a negatively connoted lateral position (see Attali and Bar-Hillel 2003; Dittrich and Klauer 2012; Valenzuela and Raghubir 2009); or the effect might base on an attentional focus on the center product (e.g., Tatler 2007) and a possible consequential neglect of the lateral stimuli. In Experiment 2 (planned N = 80), we are currently trying to disentangle these just mentioned possible mechanisms. Again, three cosmetics are simultaneously presented; this time either in a horizontal row (Condition 1; replicating Experiment 1) or in a vertical column (Condition 2). Subsequently, one single product either previously presented or not presented has to be emotionally evaluated by the participants. During the experiment, the participants’ eye gazes are tracked. Of interest is the dwell time in three previously defined areas of interest including the three cosmetic products. We expect that products in the vertical arrangement will be evaluated more positively the higher they are placed in the column (see Meier and Robinson 2004; Valenzuela and Raghubir 2010); they are also assumed to be evaluated more positively than novel products. Products in the horizontal arrangement will be devaluated when presented laterally compared to central or novel baseline products (see results Experiment 1). However, the participants’ attentional focus is assumed to rest on the central product in both arrangements (Tatler 2007). A respective result pattern would indicate emotional coding as the underlying mechanism,
123
S66 as the attentional focus on the central product would implicate— following the inhibition account—that in both conditions the lateral products would be inhibited and thus devaluated. Preliminary analyzes of the eyetracking data of 40 participants revealed the expected gaze pattern: participants in both conditions focused on the central product. Implications for the two competing explanation accounts as well as for the transfer of the lateral devaluation effect to consumer psychology will be discussed.
References Attali Y, Bar-Hillel M (2003) Guess where: the position of correct answers in multiple-choice test items as a psychometric variable. J Educ Meas 40(2):109–128 Christenfeld N (1995) Choices from identical options. Psychol Sci 6(1):50–55 Dittrich K, Klauer KC (2012) Does ignoring lead to worse evaluations? A new explanation of the stimulus devaluation effect. Cogn Emot 26:193–208 Meier B, Robinson M (2004) Why the sunny side is up. Psychol Sci 15:243–247 Raymond J E, Fenske M J, Tavassoli N T (2003) Selective attention determines emotional responses to novel visual stimuli. Psychol Sci 14(6):537–542 Rodway P, Schepman A, Lambert J (2012) Preferring the one in the middle: Further evidence for the centre-stage effect. Appl Cogn Psychol 26:215–222 Tatler B (2007) The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. J Vis 7(14):1–17 Valenzuela A, Raghubir P (2009) Position-based beliefs: the centerstage effect. J Consum Psychol 19(2):185–196 Valenzuela A, Raghubir P (2010) Are Consumers Aware of Top– Bottom but not of Left–Right Inferences? Implications for Shelf Space Positions (Working Paper). New York: Baruch College City University, Marketing Department
The role of working memory in prospective and retrospective motor planning Christian Seegelke1,2, Dirk Koester1,2, Bettina Blaesing1,2, Marnie Ann Spiegel1,2, Thomas Schack1,2,3 1 Neurocognition and Action Research Group, Bielefeld University, Germany; 2 Center of Excellence Cognitive Interaction Technology, Bielefeld University, Germany; 3 CoR-lab, Bielefeld University, Germany A large corpus of work demonstrates that humans plan and represent actions in advance, taking into account future task demands (i.e., prospective planning). Empirical evidence exists that, action plans are not always generated from scratch for each movement, but features of previously generated plans are recalled, modified appropriately, and then used for subsequent actions (e.g., van der Wel et al. 2007). This retrospective planning is likely to serve the purpose of reducing the cognitive costs associated with motor planning. In general, these findings support the notion that action planning is contingent on both future and past events (cf. Rosenbaum et al. 2012). In addition, there is considerable evidence to suggest that motor planning and working memory (WM) share common cognitive resources (e.g., Weigelt et al. 2009; Spiegel et al. 2012; 2013). In two experiments, we further explored the role of WM in prospective and retrospective motor planning using different dual-task paradigms.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Experiment 1 examined the mutual influence of reduced attentional resources on the implementation of a new action plan and of movement planning on the transfer of information into visuospatial WM. To approach these two questions, we used a dual-task design in which participants grasped a sphere and planned a placing movement toward a left or right target, according to a directional arrow. (Previous research using a single memory task suggested that visuospatial WM is more effected by a grasp-to-place task than verbal WM; Spiegel et al. 2012.) Subsequently, participants encoded visuospatial information, i.e., a centrally presented memory stimulus (4 9 4 symbol matrix). While maintaining the information in WM, a visual stay/change cue (presented on the left, center or right) either confirmed or reversed the direction of the planned movement (indicated by its color). That is, participants had to execute either the prepared or a re-planned movement, before they reported the symbols of the matrix without time pressure. The results show that both movement re-planning as well as shifting spatial attention to the location of the incongruent stay/change cues constitute processing bottlenecks, presumably because both actions are based on visuospatial WM performance. Importantly, the spatial attention shifts and movement re-planning appeared to be independent of each other. Further, we found that the initial preparation of the placing movement influenced the report of the memorized items. Preparing a leftward movement resulted in better memory performance for the left matrix half, while the preparation of a rightward movement resulted in better memory performance for the right matrix half. Hence, movement planning influenced the transfer of information into WM. Therefore, experiment 1 suggests that movement planning, spatial attention and visuospatial WM are functionally correlated but not linked in a mandatory fashion. Experiment 2 examined the role of WM on action plan modification processes (retrospective motor planning) using a hand path priming paradigm. Participants performed a sequential manual tapping task comprised of nine movements in time with a metronome. In a defined part of the trials, tapping movement had to cross an obstacle between the two center targets. Participants executed this task alone (motor only conditions) or while concurrently performing a WM task of varied difficulty (i.e., counting backwards in steps of one or three; motor-WM 1 and motor-WM -3 condition, respectively). In addition, participants performed the WM tasks without simultaneously executing the motor task (WM -1 and WM -3 conditions, respectively). As the generation of a new motor plan from scratch is thought to require more WM resources compared to recall of a previously generated plan, we expected the retrospective effect on motor planning (as measured by means of peak movement height after clearing the obstacle) to increase with task difficulty (i.e., motor-WM- 3 [ motor-WM -1 [ motor only). Corroborating findings from earlier studies (van der Wel et al. 2007), we found that after clearing an obstacle, peak heights of the manual tapping movements were only gradually reduced. This hand path priming effect has been interpreted as indicating that participants recalled the previously generated motor plan and only slightly modified it for the subsequent movements, thereby saving cognitive processing resources. Contrary to our expectation, the results showed that the magnitude of the hand path priming effect was similar regardless of whether participants performed the motor task alone or together with a WM task. This finding suggests that WM has no moderating influence on retrospective motor planning. However, peak heights of the tapping movements were, on average, higher during the dual-task conditions compared to the single-task condition, suggesting an influence of WM on movement execution in general. In addition, WM performance was not influenced by task condition (i.e., single vs. dual-task). These two experiments point toward a tight functional interaction between action control and (spatial) WM processes and attentional load. However, retrospective and prospective planning may draw differentially on WM and attentional resources.
Cogn Process (2014) 15 (Suppl 1):S1–S158 References Rosenbaum DA, Chapman KM, Weigelt M, Weiss DJ, van der Wel R (2012) Cognition, action, and object manipulation. Psychol Bull 138:924–946 Spiegel MA, Koester D, Schack T (2013) The functional role of working memory in the (re-) planning and execution of grasping movements. J Exp Psychol Hum Percept Perform 39:1326–1339 Spiegel MA, Koester D, Weigelt M, Schack T (2012) The costs of changing an intended action: movement planning, but not execution, interferes with verbal working memory. Neurosci Lett 509:82–86 van der Wel R, Fleckenstein RM, Jax SA, Rosenbaum DA (2007) Hand path priming in manual obstacle avoidance: evidence for abstract spatiotemporal forms in human motor control. J Exp Psychol Hum Percept Perform 33:1117–1126 Weigelt M, Rosenbaum DA, Huelshorst S, Schack T (2009) Moving and memorizing: motor planning modulates the recency effect in serial and free recall. Acta Psychol 132:68–79
Temporal preparation increases response conflict by advancing direct response activation Verena C. Seibold, Freya Festl, Bettina Rolke Evolutionary Cognition, Department of Psychology, University of Tu¨bingen, Germany Temporal preparation refers to processes of selectively attending and preparing for specific moments in time. Various studies have shown that these preparatory processes allow for faster and more efficient stimulus processing, as reflected in shorter reaction time (RT) and higher accuracy in a variety of tasks (e.g. Rolke, Ulrich 2010). Recently, however, Correa et al. (2010) showed that temporal preparation impairs performance in tasks with conflicting response information. Specifically, these authors observed that temporal preparation magnified compatibility effects in a flanker task. The flanker compatibility effect refers to an increase in RT to a target that is flanked by response-incompatible stimuli. According to dual-route models (e.g. Eimer et al. 1995), this effect arises because stimuli activate responses at a cortical level along two parallel routes: a slower controlled route, which activates responses according to task instructions, and a fast direct route, which activates responses via direct response priming. In case of incompatible flankers the direct route thus activates the incorrect response, leading to conflict. Within this framework, temporal preparation may increase conflict effects by giving direct-route processing a head start. We investigated this idea by measuring the stimulus-locked lateralized readiness potential (LRP) of the event-related potential (ERP) in a flanker task. We picked the LRP because it reflects response hand-specific ERP lateralization in motor areas and thus enabled us to separate controlled from direct response activation in incompatible flanker trials: whereas controlled (correct) response hand activation shows up in a negativegoing LRP, direct activation of the incorrect response hand emerges as an early positive LRP dip Accordingly, if temporal preparation advances direct-route based response activation we expected to observe an earlier occurring positive LRP in incompatible trials. In addition, this latency shift may also affect response activation in the controlled route, as indexed by the negative LRP. Method Twelve participants performed an arrowhead version of the flanker task. In each trial participants had to indicate the orientation of a central arrowhead with either a left or right hand response. This target was flanked by two vertically aligned stimuli being either responsecompatible (arrowheads pointing in the same direction), incompatible (arrowheads pointing in the opposite direction) or neutral (rectangles). To maximize compatibility effects and disentangle the time-course of
S67 incorrect and correct response activation we included a constant flanker-to-target delay of 100 ms (see Kopp et al. 1996). A blocked foreperiod (FP) paradigm (FPs of 800 and 2,400 ms) served as manipulation of temporal preparation, whereby the short FP leads to good temporal preparation. The LRP was derived at electrode sites C4/C3 in the digitally filtered (0.05–10 Hz), artifact-free (horizontal EOG \ ± 30 lV; all other electrodes \ ± 80 lV) ERP as the average of contra minus ipsilateral activity for left- and right-hand responses. The 100 ms preflanker interval served as baseline. Jackknife-based onset latency (50 % relative amplitude criterion) was calculated for positive and negative LRPs (time windows: 140–240 ms and 150–400 ms). Results Statistical analysis was performed via repeated-measures analysis of variance (rmANOVA) and pairwise t-tests for post hoc comparisons (with Bonferroni corrected p-values). Mean RT in correct trials, mean percentage error (PE), and negative LRP onsets were submitted to separate rmANOVAs with factors foreperiod (short, long) and compatibility (compatible, neutral, incompatible). Positive LRP onset was analyzed via an rmANOVA with factor foreperiod (short, long). Analysis of mean RT revealed a compatibility main effect, F(2,22) = 75.8, p \ .001 (RTcompatible \ RTneutral \ RTincompatible; both ps \ .001; Fig. 1). Furthermore, FP had a main effect on RT, F(1,11) = 37.1, p \ .001, which was further qualified by a FP x Compatibility interaction, F(2,22) = 4.6, p = .02. RTs were shorter after the short FP, but only in compatible and neutral trials (both ps \ .001), not in incompatible trials (p = .16). PE was affected by compatibility, F(2,22) = 9.1, p = .01, and FP, F(1,11) = 10.6, p = .008, as well as the interaction, F(2,22) = 10.4, p = .006. PE was selectively higher in incompatible trials, t(11) = 3.0, p = .02, specifically after the short FP, t(11) = 3.4, p = .02. Negative LRP onset (Fig. 2a) was affected by compatibility, FC(2,22) = 28.0, p \ .001, with increasing latency from compatible to neutral to incompatible trials (both ps \ .001). The FP main effect was not significant, FC(1,11) = 2.64, p = .13, nor the FP x Compatibility interaction (FC \ 1). The positive LRP in incompatible trials was clearly affected by FP, FC(1,11) = 18.8, p = .001, with shorter latency after the short FP (Fig. 2b). Discussion By means of ERPs, we examined how temporal preparation affects response activation in conflict tasks. Replicating previous studies (Kopp et al. 1996), we observed clear compatibility effects, as RT and (negative) LRP latency increased from compatible to incompatible trials. Furthermore, temporal preparation increased the size of the behavioral response conflict. Most importantly, temporal preparation reduced the latency of the positive LRP in incompatible trials indexing direct response activation, but it did not affect negative LRP
Fig. 1 Mean RT (correct responses) and PE as a function of compatibility and FP
123
S68
Fig. 2 a Negative LRP as a function of compatibility. b Positive LRP in incompatible trials as a function of FP Flanker (F) and target (T) onset are marked at the x-axis latency indexing controlled response activation. This finding suggests that temporal preparation modulates response activation along the direct route and thereby increases response conflict. References Correa A, Cappucci P, Nobre AC, Lupia´n˜ez J (2010) The two sides of temporal orienting: facilitating perceptual selection, disrupting response selection. Exp Psychol 57:142–148. doi:10.1027/16183169/a000018 Eimer M, Hommel B, Prinz, W (1995) S–R compatibility and response selection. Acta Psychol 90:301–313. doi:10.1016/0001-6918(95) 00022-M Kopp B, Rist F, Mattler U (1996) N200 in the flanker task as a neurobehavioral tool for investigating executive control. Psychophysiol 33:282–294. doi:10.1111/j.1469-8986.1996.tb00425.x Rolke B, Ulrich R (2010) On the locus of temporal preparation: Enhancement of pre-motor processes. In: Nobre AC, Coull JT (eds) Attention and time. Oxford University Press, Oxford, pp 228–241
The flexibility of finger-based magnitude representations Elena Sixtus, Oliver Lindemann, Martin H. Fischer Cognitive Science Division, University of Potsdam, Germany Finger counting is a crucial step towards accomplishing counting and understanding number. Consistent with the theoretical stance of embodied cognition (see e.g., Glenberg, Witt, Metcalfe 2013), recent studies reported evidence suggesting that adults show an influence of finger counting on cognitive number processing in
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 various tasks (e.g., Domahs, Moeller, Huber, Willmes, Nuerk 2010; Fischer 2008). Di Luca and Pesenti (2007) demonstrated in adults that pictures of finger counting postures prime numerical size in an Arabic number classification task. This suggests that finger representations become automatically activated during number processing. The present study reports further interactions between the execution of finger counting postures and the processing of numbers; it provides evidence for an activation of number representations through finger postures. In Experiment 1, 25 right-handed adult participants were instructed to compare two successively presented digits while performing finger postures. Each trial comprised a reference number ranging from 2 to 4, followed by a target number that was either smaller or larger by 1 and thus ranging from 1 to 5. Responses were given verbally (i.e. saying ‘‘ta’’ for bigger and ‘‘to’’ for smaller). The postures were executed behind a visual occluder with the dominant hand with 2 to 4 fingers stretched out in an either canonical (finger counting, starting from the thumb) or non-canonical way. Crucially, the number of extended fingers sometimes corresponded with the presented target number (congruent trials). The current posture was instructed by the experimenter before each block of 15 trials. Each trial started with a button press before the finger posture was readopted to refresh participants’ proprioceptive experience. Results showed a significant comparison time advantage for congruent trials, only when canonical finger postures were adopted (RT advantage of 13 ms, SD = 27 ms for congruent trials compared to incongruent trials; t(24) = 2.39, p \ .03). These data suggest that, although most participants reported not to be aware that they were occasionally adopting finger counting postures, these finger movements pre-activated the representation of specific numbers that led to a facilitated number processing. In Experiment 1, almost all participants were right-starters in finger counting. It is possible that congruency effects only emerge for the hand that is usually used to represent these specific numbers. It also remains unclear whether the coding of numbers larger than 5 benefits from adopting finger postures. We therefore conducted a second experiment in which both hands were used and numbers between 2 and 9 served as stimuli. In Experiment 2, 26 right-handed participants verbally classified numbers (2, 3, 4, 7, 8, 9) as odd or even, while again executing canonical or non-canonical finger postures with one hand. In contrast to Experiment 1, participants were required to perform two blocks, in which they adopted finger postures with the left and with the right hand. Responses were again given verbally, by saying ‘‘odd’’ or ‘‘even’’ (German: ‘‘ungerade’’ and ‘‘gerade’’, respectively). We subtracted from each vocal RT the subject’s individual mean RT per response. In this design, at least four different congruencies can be distinguished. Again, the number of extended fingers could coincide with the classified number (exact congruency), the numerical size of the stimulus could correspond to the respective hand in finger counting (counting hand congruency), both the number of fingers and the digit were either odd or even (parity congruency), and both the finger posture and the digit could be relatively small or large (with a range of 2–4 for finger postures and a range of 2–9 for presented digits; relative size congruency). While no significant exact and counting hand congruency effects were found and only a trend for a RT advantage for parity congruent trials (4 ms, SD = 12 ms; t(25) = 1.89, p = .07), there was a significant relative size congruency effect for canonical (but not for non-canonical) postures 2 and 4 (12 ms, SD = 23 ms for congruent trials compared to incongruent trials; t(25) = 2.56, p \ .02): Executing a relatively small counting posture led to faster parity decisions for small than large digits, and vice versa for a relatively big counting posture, while a medium counting posture had no such effect. Together, these results clarify our understanding of embodied number processing. First, the presence of the exact congruency effect was limited to a situation in which the numbers did not exceed the
Cogn Process (2014) 15 (Suppl 1):S1–S158 counting range of one hand, suggesting that finger counting postures only activate the corresponding mental number representations when embedded in an appropriate task. Second, the absence of a counting hand congruency effect shows that using the non-starting hand does not necessarily activate the respective mental representation for larger numbers. Third, the finding that finger postures and numbers interact based on their respective relative sizes demonstrates a more flexible size activation through finger postures than previously assumed. This is in line with the idea of a generalized magnitude system, which is assumed to ‘‘encode information about the magnitudes in the external world that are used in action’’ (Walsh 2003, p 486). Specifically, showing almost all fingers of one hand is associated to large magnitudes and showing very few fingers to small magnitudes. The present study shows that only under certain task demands subjects activate a one-to-one correspondence between fingers and numbers. In other situations, magnitudes might not have to be exactly the same, but rather proportional to be associated.
Acknowledgments This research is supported by DFG grant FI 1915/2-1 ‘‘Manumerical cognition’’. References Di Luca S, Pesenti M (2007) Masked priming effect with canonical finger numeral configurations. Exp Brain Res 185(1): 27–39. doi: 10.1007/s00221-007-1132-8 Domahs F, Moeller K, Huber S, Willmes K, Nuerk H–C (2010) Embodied numerosity: implicit hand-based representations influence symbolic number processing across cultures. Cognition 116(2):251–266. doi:10.1016/j.cognition.2010.05.007 Glenberg AM, Witt JK, Metcalfe J (2013) From the revolution to embodiment: 25 years of cognitive psychology. Perspect Psychol Sci 8(5):573–585. doi:10.1177/1745691613498098 Fischer MH (2008) Finger counting habits modulate spatial-numerical associations. Cortex 44(4): 386–92. doi:10.1016/j.cortex.2007. 08.004 Walsh V (2003) A theory of magnitude: common cortical metrics of time, space and quantity. Trends Cogn Sci 7(11):483–488. doi: 10.1016/j.tics.2003.09.002
Object names correspond to convex entities Rahel Sutterlu¨tti, Simon Christoph Stein, Minija Tamosiunaite, Florentin Wo¨rgo¨tter Faculty of Physics: Biophysics and Bernstein Center for Computational Neuroscience, Go¨ttingen, Germany Commonly one assumes that object-identification (and recognition) requires complex—innate as well as acquired—cognitive processes (Carey 2011), however, it remains unclear how objects can be individuated, segregated into parts, and identified (named) given the high degree of variability of the sensory features which arise even from similar objects (Geisler 2008). Gestalt laws, relying on shape parameters and their relations; for example edge-relations, compactness, or others; seem to play a role in this process (Spelke et al. 1993). Specifically, there exist several results from psychophysics (Hoffman and Richards 1984, Biederman 1987, Bertamini and Wagemans 2013) and machine vision (Siddiqi and Kimia 1995, Richtsfeld et al. 2012), which demonstrate that convex-concave surface transitions can be used for object partitioning. Here we are now trying to discern to what degree such a partitioning corresponds to our language-expressible object ‘‘understanding’’. To this end, a total of 10 real scenes, consisting of
S69 3D point cloud data and the corresponding RBG image, have been analyzed. Scenes were recorded by RGB-D sensors (Kinect), which provide 3D point cloud data and matched 2D RGB images. Scenes were taken from openly available machine vision data bases (Richtsfeld et al. 2012, Silberman et al. 2012). We segmented the scenes into 3D entities using convex-concave transitions in the point cloud by a model-free machine vision algorithm, the details of which are described elsewhere (LCCP Algorithm, Stein et al. 2014). This is a purely data-driven segmentation algorithm, which does not use any additional features for segmentation and works reliably for in-door RGB-D scenes with a depth range of approx. 0.5 to 5 meters using only 2 parameters to set the resolution. Note, due to the limited spatial resolution of the RGB-D sensors, small objects cannot be consistently labeled. Thus, segments smaller than 3 % of the image size were manually blackened out by us as they most often represent sensor noise. We received a total of 247 segments (i.e. about 20–30 per image). Segments are labeled on the 2D RGB image with different colors to make them distinguishable for the observer. To control for errors introduced by image acquisition and/or by the computer vision algorithm, we use the known distance error function of the Kinect sensor (Smisek et al. 2011) to calculate a reliability score for every segment. We asked 20 subjects to compare the obtained 247 color-labeled segments with the corresponding original RGB image, asking: ‘‘How precisely can you name it?’’; and recorded their utterances obtaining 4,940 data points. Subsequently we analyzed the utterances and divided them into three groups: 1) precise naming of a segment (e.g. ‘‘table leg’’), where it does not play a role whether or not subjects would use unique names (e.g. ‘‘table leg’’, ‘‘leg’’, and ‘‘table support’’ are equally valid), 2) definite failure/impossibility to name a segment and 3) unclear cases, where subjects stated that they are not sure about the identification. One example scene is shown in Fig. 1a. Using color-based segmentation (BenSalah et al. 2011) the resulting image segments rarely correspond to objects in the scene (Fig. 1b) and this is also extremely dependent on illumination. Unwanted merging or splitting of objects will, regardless of the chosen segmentation parameters, generically happen (e.g. ‘‘throat + face’’, ‘‘fridge-fragments’’, etc. Figure 1b). Instead of using 2D color information, here point clouds were 3Dsegmented along concave/convex transitions. We observed (Fig. 1b) that subjects many times used different names (e.g. ‘‘face’’ or ‘‘head’’) to identify a segment, which are equally valid as both describe a valid conceptional entity (an object). There are however several cases where segments could not be identified. We find that on average 64 % of the segments could be identified, 30 % not, and there were 6 % unclear cases. Are these 30 % non-identified segments possibly (partially) due to machine vision errors? To assess this, we additionally considered the reliability of the individual segments. Due to the discretization error of the Kinect (stripy patterns in Fig. 1c), data at larger distances become quadratically more unreliable (Smisek et al. 2011) leading to merging of segments. When considering this error source, we find that subjects could more often identify reliable segments (Fig. 1e, red) and unrecognized cases dropped accordingly (green). The red lettering in Fig. 1d marks less reliable segments and, indeed, identification is lower or more ambivalent for those segments as compared to the more reliable ones. The here performed segmentation generically renders identifiable object parts (e.g. ‘‘head’’, ‘‘arm’’, ‘‘handle’’ of fridge, etc.). Clearly, no purely data-driven method exists, which would allow detecting complex, compound objects (e.g. ‘‘woman’’) as this requires additional conceptual knowledge. Furthermore, we note that we are here not concerned with higher cognitive aspects, relating to context analysis, hierarchization, categorization, and other complex processes. Our main observation is that the purely geometrical (lowlevel) breaking up of a 3D scene, most often leads to entities for which we have an internal object or object-part concept which may
123
S70
Cogn Process (2014) 15 (Suppl 1):S1–S158
Fig. 1 Humans can with high reliability identify image segments that result from splitting images along concave-convex surface transitions. a One example scene used for analysis. b Color-based segmentation of the scene. c Point cloud image of parts of the scene. d 3D-segmented scene and segment names used by our subjects to identify objects. Missing percentages are the non-named cases. Red lettering indicates segments with reliability less than 50. e Fraction of identified (red), not-identified (green) and unclear (blue) segments for the complete data set plotted against their reliability. Fat dots represent averages across reliability intervals [0,10]; [10,20]; …; [150,160]. The ability to identify a segment increases with reliability. Grand averages (red 0.64, green 0.30, blue 0.06) for all data are shown, too
reflect the low-level perceptual grounding of the ‘‘bounded region’’ hypothesis formulated by Langacker (1990) as a possible foundation for grammatical entity construal. It is known that color, texture and other such statistical image features vary widely (Geisler 2008). Thus, object individuation cannot rely on them. By contrast, here we find that convex-concave transitions between 3D-surfaces might represent the required prior to which a contiguous object concept can be unequivocally bound. These transitions render object boundaries and, consequentially leads to the situation that we can name them. In addition, we note that this bottom-up segmentation can easily be combined with other image features (edge, color, etc.) and also—if desired—with object models where one now can go beyond object individuation towards true object recognition.
References Ben Salah, M, Mitiche, A, Ayed, IB (2011) Multiregion image segmentation by parametric kernel graph cuts. IEEE Trans Image Proc. 20(2):545–557 Bertamini, M, Wagemans, J (2013) Processing convexity and concavity along a 2-D contour: figure-ground, structural shape, and attention. Psychon Bull Rev 20(2):191–207 Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94:115–147
123
Carey S (2011) Precis of ‘The origin of concepts’ (and commentaries), behav Brain Sci 34(3):113–167 Geisler W (2008) Visual perception and the statistical properties of natural scenes. Ann Rev Psy 59:167–192 Hoffman D, Richards W (1984) Parts of recognition. Cognition 18(13):65–96 Langacker RW (1990) Concept, image, and symbol: the cognitive basis of grammar. Mouton de Gruyter, Berlin Richtsfeld A, Morwald T, Prankl J, Zillich M, Vincze M (2012) Segmentation of unknown objects in indoor environments. In: Proceedings of IEEE Conference on EEE/RSJ intelligent robots and systems (IROS), pp 4791–4796 Siddiqi K, Kimia BB (1995) Parts of visual form: computational aspects. IEEE Trans Pattern Anal Mach Intel 17:239–251 Silberman N, Hoiem D, Kohli P, Fergus R (2012) Indoor segmentation and support inference from RGB-D images. In: Proceedings of European conference on computer vision (ECCV), pp 746–760 Smisek J, Jancosek M, Pajdla T (2011) 3D with Kinect. In: Proceedings of international conference comp vision (ICCV), pp 1154–1160 Spelke ES, Breinlinger K, Jacobson K, Phillips A (1993) Gestalt relations and object perception: a developmental study. Perception 22(12):1483–1501 Stein S, Papon J, Schoeler M, Wo¨rgo¨tter F (2014) Object partitioning using local convexity. In: Proceedings of IEEE conference on
Cogn Process (2014) 15 (Suppl 1):S1–S158 computer vision and pattern recognition (CVPR), 2014. http://www. cv-foundation.org/openaccess/content_cvpr_2014/papers/Stein_ Object_Partitioning_using_2014_CVPR_paper.pdf
The role of direct haptic feedback in a compensatory tracking task Evangelia-Regkina Symeonidou, Mario Olivari, Heinrich H. Bu¨lthoff, Lewis L. Chuang Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany Haptic feedback systems can be designed to assist vehicular steering by sharing manual control with the human operator. For example, direct haptic feedback (DHF) forces, that are applied over the control device, can guide the operator towards an optimized trajectory, which he can either augment, comply with or resist according to his preferences. DHF has been shown to improve performance (Olivari et al. submitted) and increase safety (Tsoi et al. 2010). Nonetheless, the human operator may not always benefit from the haptic support system. Depending on the amount of the haptic feedback, the operator might demonstrate an over- reliance or an opposition to this haptic assistance (Forsyth and MacLean 2006). Thus, it is worthwhile to investigate how different levels of haptic assistance influence shared control performance. The current study investigates how different gain levels of DHF influence performance in a compensatory tracking task. For this purpose, 6 participants were evenly divided into two groups according to their previous tracking experience. During the task, they had to compensate for externally induced disturbances that were visualized as the difference between a moving line and a horizontal reference standard. Briefly, participants observed how an unstable air- craft symbol, located in the middle of the screen, deviated in the roll axis from a stable artificial horizon. In order to compensate for the roll angle, participants were instructed to use the control joystick. Meanwhile, different DHF forces were presented over the control joystick for gain levels of 0, 12.5, 25, 50 and 100 %. The maximal DHF level was chosen according to the procedure described in (Olivari et al. 2014) and represents the best stable performance of skilled human operators. The participants’ performance was defined as the reciprocal of the median of the root mean square error (RMSE) in each condition. Figure 1a shows that performance improved with in- creasing DHF gain, regardless of experience levels. To evaluate the operator’s contribution, relative to the DHF contribution, we calculated the ratio
Fig. 1 a Performance of the experienced and in experienced participants as well as the baseline of direct haptic feedback (DHF) assistance without human input for increasing haptic gain. b The ratio of overall system performance to DHF performance without human input for increasing haptic gain
S71 of overall performance to estimated DHF performance without human input. Figure 1b shows that the subject’s contribution in both groups de- creased with increasing DHF up to the 50 % condition. The contribution of experienced subjects plateaued between the 50 and 100 % DHF levels. Thus, the increase in performance for the 100 % condition can mainly be attributed to the higher DHF forces alone. In contrast, the inexperienced subjects seemed to completely rely on the DHF during the 50 % condition, since the operator’s contribution approximated 1. However, this changed for the 100 % DHF level. Here, the participants started to actively contribute to the task (operator’s contribution [1). This change in behavior resulted in performance values similar to those of the experienced group Our findings suggest that the increase of haptic support with our DHF system does not necessarily result in over-reliance and can improve performance for both experienced and inexperienced subjects. References Forsyth BAC, MacLean KE (2006) Predictive haptic guidance: intelligent user assistance for the control of dynamic tasks. IEEE Trans Visual Comput Graph 12(1):103–13 Olivari M, Nieuwenhuizen FM, Bu¨lthoff HH, Pollini L (2014) An experimental comparison of haptic and automated pilot support systems. In: AIAA modeling and simulation technologies conference, pp 1–11 Olivari M, Nieuwenhuizen F, Bu¨lthoff H, Pollini L (submitted) Pilot adaptation to different classes of haptic aids in tracking tasks. J Guidance Control Dyn Tsoi KK, Mulder M, Abbink DA (2010) Balancing safety and support: changing lanes with a haptic lane-keeping support system. In: 2010 IEEE international conference on systems, man and cybernetics, pp 1236–1243
Comprehending negated action(s): embodiment perspective Nemanja Vaci1, Jelena Radanovic´2, Fernando Marmolejo-Ramos3, Petar Milin2,4 1 Alpen-Adria University Klagenfurt, Austria; 2 University of Novi Sad, Serbia; 3 University of Adelaide, Australia; 4 Eberhard Karls Universita¨t Tu¨bingen, Germany Keywords Embodied cognition, comprehension
Negation,
Mental
simulation,
Sentence
According to the embodied cognition framework, comprehension of language involves activation of the same sensorimotor areas of the brain that are activated when entities and events described by language structures (e.g., words, sentences) are actually experienced (Barsalou 1999). Previous work on the comprehension of sentences showed support for this proposal. For example, Glenberg and Kaschak (2002) observed that judgment about sensibility of a sentence was facilitated when there was congruence between the direction of an action implied by the sentence and the direction of a movement required for making a response, while incongruence led to slower responses. It was also shown that linguistic markers (e.g., negation) could modulate mental simulation of concepts (Kaup 2001). This finding was explained by the twostep negation processing: (1) a reader simulates a sentence as if there is no negation; (2) she negates the simulated content to reach full meaning. However, when a negated action was announced in preceding text, negated clause was processed as fast as the affirmative one (Lu¨dtke and Kaup 2006). The mentioned results suggest the mechanism of negation processing can be altered contextually. In this study, we aimed at further investigating the effects of linguistic markers, following the assumptions of embodied
123
S72 framework. To obtain manipulation of a sentence context that would target mental simulations we made use of materials from De Vega et al. (2004). These researchers created sentences by manipulating whether or not two actions described in a sentence were competing for the cognitive resources. They showed that sentences with two actions being performed at the same time were easier to process when they aimed at different sensorimotor systems (whistling and painting a fence), than when described actions involved the same sensorimotor system (chopping a wood and painting a fence). We hypothesized that given two competing actions negation could provide suppression for one of them and, thus, change the global simulation time course. Experiment 1 was a modified replication of De Vega et al. (2004) study in Serbian. We constructed sentences by manipulating whether or not two actions described in a sentence were performed using the same or different sensorimotor systems. We also manipulated temporal ratio of the two actions (simultaneous vs. successive). Finally, actions within a sentence could be physically executed or mentally planned (reading a book vs. imagining reading a book). This way we included both descriptions of ‘‘real’’ actions as well as the descriptions of mental states. Introduction of this factor aimed at testing whether linguistic marker for mentally planned actions would induce ‘‘second order’’ simulation, similar to the two-step processing, or suppress the mental simulation, which than would match the one-step processing. The participants’ task in this experiment was to read the sentences and to press a button when they finished. To ensure comprehension, in 25 % randomly chosen trials participants were instructed to repeat the meaning of a sentence to the experimenter. In the following two experiments, we focused on the mechanism of negation, using similar sentences as in Experiment 1. Here, we manipulated the form of the two parts (affirmative vs. negative). The task used Experiment 2 and 3 was a modified self-paced reading task, allowing a by-clause reading rather than a by-word or by-sentence reading. This way we obtained response times for each of the two parts (clauses). We were also interested in measuring the time (and accuracy) required for judging sensibility of the whole sentence. Therefore, we included the equal number of nonsensible filler sentences. The linear mixed-effect modeling was applied to the response times and logistic mixed-effect modeling to the accuracy rates. We controlled for trial order, clause and sentence length and sensibility of a sentence (the sensibility ratings were obtained in separate study using different participants). Results from Experiment 1 confirmed the findings of De Vega et al. (2004): the sentences with two actions from the same sensorimotor system were comprehended slower (t(489.90) = 4.21, p \ .001). In addition, we observed a stronger inhibitory effect from a length in case of sentences with simultaneously executed actions, which indicates additional comprehension load for this type of sentences (t(499.40) = - 2.00, p \ .05). Finally, processing time was longer when sentences described mentally planned actions as opposed to ‘‘real’’ ones (t(489.70) = 3.21, p \ .01). The analyzes of Experiment 2 and 3 showed consistent results between the clause (local) and sentence (global) response times. The interaction between sensorimotor systems (same vs. different) and a form of a clause (affirmative vs. negative) was significant (t(303.70) = 2.95, p \ .01): different sensorimotor actions/systems combined with negation lead to slower processing time and lower accuracy; when the sensorimotor system was the same, affirmative and negated markers did not induce significant differences. However, in cases when actions addressed the same sensorimotor system, the accuracy of the sensibility judgments was higher if the second action was negated (z(303.70) = 4.36, p \ .001). Taken together, this pattern of results suggest that in case of competing actions negation
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 might be processed in one step, as opposed to the two stages processing in the case of non-competing actions. The present results support the claim about mental simulation as influenced by linguistic markers. We showed, however, that such an influence depends on more general contextual factors. Present results suggest that negation might have regulatory purpose in sentence comprehension. The negated content is comprehended in a two-step simulation only if actions do not compete for the cognitive resources. Contrariwise, when actions within a sentence are in sensorimotor competition, negation can suppress the second action to facilitate the comprehension. References Barsalou LW (1999) Perceptual symbol systems. Behav Brain Sci 22:577–660 De Vega M et al. (2004) On doing two things at once: temporal constraints on actions in language comprehension. Mem Cogn 33:1033–1043 Glenberg AM, Kaschak MP (2002) Grounding language in action. Psychon Bull Rev 9:558–565 Kaup B (2001) Negation and its impact on the accessibility of text information. Mem Cogn 29:960–967 Lu¨dtke J, Kaup B (2006) Context effects when reading negative and affirmative sentences. In: Sun R (ed) Proceedings of the 28th annual conference of the cognitive science society. Lawrence Erlbaum Associates, Mahwah, pp 1735–1740
Effects of action signaling on interpersonal coordination Cordula Vesper1, Lou Safra2, Laura Schmitz1, Natalie Sebanz1, Gu¨nther Knoblich1 1 CEU, Budapest, Hungary; 2 Ecole Normale Superieure, Paris, France How do people coordinate actions such as lifting heavy objects together, clapping in synchrony or passing a basketball from one person to another? In many joint action tasks (Knoblich et al. 2011), talking is not needed or simply too slow to provide useful cues for coordination. Instead, two people who coordinate their actions towards a joint goal often adapt the way they perform their own actions to facilitate performance for a task partner. One way of supporting a co-actor is by providing relevant information about one’s own action performance. This can be achieved non-verbally by exaggerating specific movement aspects so that another person can more easily understand and predict the action. This communicative modulation of one’s own actions is often referred to as signaling (Pezzulo et al. 2013) and includes common action exaggerations such as making a distinct, disambiguating step towards the right to avoid a collision with another person on the street. The present study investigated signaling in a joint action task in which sixteen pairs of participants moved cursors on a computer screen towards a common target with the goal of reaching the target synchronously. Short feedback tones at target arrival indicated the coordination accuracy of their actions. To investigate whether actors modulate their signaling depending on what is perceptually available to their partners, we compared several movement parameters between two conditions: In the visible condition, co-actors could see each other’s movements towards the target (i.e. both computer screens were visible to both co-actors); in the hidden condition an occluder between the coactors prevented them from receiving visual feedback about each other. Analyzes of participants’ movements showed that signaling in the form of exaggerating the trajectory towards the target (by increasing the
Cogn Process (2014) 15 (Suppl 1):S1–S158 curvature of the movement) was specifically used in the visible condition, whereas a temporal strategy of reducing the variability of target arrival times (Vesper et al. 2011) was used in the hidden condition. Furthermore, pairs who signaled more were overall better coordinated. Together these findings suggest that signaling is specifically employed in cases where a task partner is able to use the information (i.e. can actually see the action modulation) and that this can be beneficial for successful joint action performance. Thus, co-actors take into account what their partners can perceive in their attempts to coordinate their actions with them. Moreover, our study demonstrates how, depending on the type and amount of perceptual information available between co-actors, different mechanisms support interpersonal coordination. References Knoblich G, Butterfill S, Sebanz N (2011) Psychological research on joint action: theory and data. In: Ross B (ed) The psychology of learning and motivation 54. Academic Press, Burlington, pp 59–101 Pezzulo G, Donnarumma F, Dindo H (2013) Human sensorimotor communication: a theory of signaling in online social interactions. PLoS ONE 8: e79876 Vesper C, van der Wel RPRD, Knoblich G, Sebanz N (2011) Making oneself predictable: Reduced temporal variability facilitates joint action coordination. Exp Brain Res 211: 517–530
S73 tactile signals in expected sensory areas such as the primary sensory cortex, supramarginal gyri, and Rolandic opercula. In second-level analyses significant 2-way interactions between the belt on/off and pre/post training condition indicates an involvement of Rolandic opercula, Insula, MST and PPC. Inspection of the activation intensities shows a significant difference belt on [ off only in the first measurement before the training period, but not after the training period. In summary, in fMRI we observe differential activations in areas expected for path integration tasks and tactile stimulation. Additionally, we also found activation differences for the belt signals well beyond the somatosensory system, indicating that processing is not limited to sensory areas but includes also higher level and motor regions as predicted by the theory of sensorimotor contingencies. It is demonstrated that the belt’s signal is processed differently after the training period. Our fMRI results are also in line with subjective reports indicating a qualitative change in the perception of the belt signals.
Do you believe in Mozart? The influence of beliefs about composition on representing joint action outcomes in music Thomas Wolf, Cordula Vesper, Natalie Sebanz, Gu¨nther Knoblich CEU, Budapest, Hungary
Physiological changes through sensory augmentation in path integration: an fMRI study Susan Wache1,*, Johannes Keyser1, Sabine U Ko¨nig1, Frank Schumann1, Thomas Wolbers2,3, Christian Bu¨chel2, Peter Ko¨nig1,4 1 Institute of Cognitive Science, University Osnabru¨ck; 2 Institute of Systems Neuroscience, University Medical Center Hamburg Eppendorf; 3 German Center for Neurodegenerative Diseases, Magdeburg; 4 Department of Neurophysiology and Pathophysiology, University Medical Center Hamburg Eppendorf The theory of sensorimotor contingencies (SMCs) describes qualitative experience as based on the dependency between sensory input and its preceding motor actions. To investigate sensory processing and learning of new SMCs we used sensory augmentation in a virtual path integration task. Specifically, we built a belt that maps directional information of a compass to a set of vibrating elements such as that the element pointing north is always activated. The belt changes its tactile signals only by motor actions of the belt-wearing participants, i.e. when turning around. Nine subjects wore the belt during all waking hours for seven weeks, 5 control subjects actively trained their navigation, but without a belt (age 19–32y, seven female). Before and after the training period we presented in the fMRI scanner a virtual path integration (PI) task and a corresponding control task with identical visual stimuli. In half of the trials of both tasks the belt was switched on, coherently vibrating with the virtual movements of the subjects. We used ROI analysis to concentrate on regions relevant for spatial navigation and for sensory processing. We used a mixedeffects ANOVA to decompose the four factors belt on/off, belt/ control subjects, PI/control task, and before/after training. The main effect PI [ control task shows large-scale differences in areas that have been found to be active in similar navigational tasks such as medial superior temporal cortices (MST), posterior parietal cortex (PPC), ventral intraparietal areas, and caudate nucleus. Additionally we found sensorimotor regions such as supplementary motor areas (SMA), insula, primary sensory cortex, and precentral gyrus. The main effect belt on [ off reveals processing of the
Actors in joint action situations represent the outcomes of their joint actions and use these to guide their actions (Vesper, Butterfill, Knoblich, Sebanz 2010). However, it is not clear how conceptual and perceptual information affect the representations of joint action outcomes. In the present experiment, we investigated whether beliefs about the intended nature of joint action outcomes are sufficient to elicit changes in their representation. As recent studies provide evidence that participants represent joint action outcomes in musical paradigms (Loehr, Kourtis, Vesper, Sebanz, Knoblich 2013), we used a piano paradigm to investigate the hypothesis that beliefs about the composer’s intentions can influence representations of jointly produced tones. In our paradigm, we used a within-subjects 2 9 2 design with the factors Belief (together, separate) and Key (same, different). Two adult piano novices played 24 melody-sets with the help of templates. In the Belief condition ‘‘together’’, the participants were told that the melodies they were going to play were intended to be played together as duets. In the condition ‘‘separate’’, participants were told that their melodies were not intended to be played together. With the Key manipulation, we manipulated the cognitive costs of joint action outcome representations as follows. All 24 melody-sets were generated by a python script, and followed the same simple chord progression (I-IV-V7-I). They differed only along the Key manipulation: In 12 melody-sets, the aforementioned chord progression was implemented in the same musical key. When the two melodies are realized following the same chord progression in the same key, the cognitive cost of representing the joint action outcome should be lower than in the other 12 melody-sets, where the same chord progression was implemented in different keys. Representing the joint action outcome of two melodies in different keys demands more resources, even though representing only one’s own action outcome is equally costly in both key conditions. During the experiment, accuracy, tempo and synchrony were measured. Following our hypothesis that beliefs about the composition affects the representation of the joint action outcome, we predicted that the differences between the same Key and the different Key melody-sets would be significantly higher when participants believed the melodies were meant to be played together, attesting that the participants’ beliefs had led to an increase of joint action representations. In other words, we predicted that an ANOVA with the
123
S74
Cogn Process (2014) 15 (Suppl 1):S1–S158
independent variables Belief and Key would show a significant interaction. References Vesper C, Butterfill S, Knoblich G, Sebanz N (2010) A minimal architecture for joint action. Neural Netw 23:998–1003 Loehr JD, Kourtis D, Vesper C, Sebanz N, Knoblich G (2013) Monitoring individual and joint action outcomes in duet music performance. J Cogn Neurosci 25(7):1049–1061
Processing sentences describing auditory events: only pianists show evidence for an automatic space pitch association Sibylla Wolter, Carolin Dudschig, Irmgard de La Vega, Barbara Kaup Universita¨t Tu¨bingen, Germany Embodied language understanding models suggest that language comprehension is grounded in experience. It is assumed that during reading of words and sentences these experiences become reactivated and can be used as mental simulation (Barsalou 1999; Zwaan, Madden 2005). Despite a growing body of evidence supporting the importance of sensory-motor representations during language understanding (e.g., Glenberg, Kaschak 2002) rather little is known regarding the representation of sound during language processing. In the current study, we aim to close this gap by investigating whether processing sentences describing auditory events results in similar action-compatibility effects as have been reported for physical tone perception. With regard to physical tone perception it is known that real tones of different pitch heights trigger specific spatial associations on a vertical as well as horizontal axis. The vertical association is typically activated automatically for all participant groups (Lidji, Kolinsky, Lochy, Morais 2007; Rusconi, Kwan, Giordano, Umilta´, Butterworth 2006). In contrast, the horizontal axis seems to be mediated by musical expertise. Specifically, only pianists with a considerable amount of experience with the piano keyboard and other musicians show an automatic association between low tones and the left side and high tones and the right side (Lidji et al. 2007; Trimarchi, Luzatti 2011). This suggests that the experiences pianists make when playing the piano lead to a space-pitch association automatically elicited when processing high or low auditory sounds. The aim of the present study was to investigate whether experience-specific space-pitch associations in the horizontal dimension can also be observed during the processing of sentences referring to high or low auditory sounds. For pianists, we expected to find faster responses on the right compared to the left for sentences implying high pitch and faster responses on the left compared to the right for sentences implying low pitch. For non-musicians no such interaction was expected. Finding the respective differences between pianists and non-musicians would strongly support the idea that during language processing specific experiential associations are being reactivated. 20 skilled pianists with an average training period of 14.85 years (Experiment 1) and 24 non-musicians with none or less than 2 years of musical training that took place at least 10 years ago (Experiment 2) were presented with sentences expressing high/low auditory events, such as the bear growls deeply vs. the soprano singer sings a high Aria. Half of the sentences contained the words high or low (explicit condition), the other half only implicitly expressed pitch height (implicit condition). Nonsensical sentences served as filler items. Participants judged whether the sentence was semantically correct or incorrect by pressing either a left or right response key. The
123
Fig. 1 Mean response times for left/right-responses to sentences implying high/low pitch for pianists (left panel) and non-musicians (right panel). The error bars represent the 95% confidence interval and are conducted according to Masson and Loftus (2003) response position (sensible is right vs. left) was varied between blocks, starting position was balanced between participants. Each sentence was presented only once to each participant. A by-participant (F1) and a by-item (F2) ANOVA was conducted, one treating participants and one items as random factor. The results are displayed in Fig. 1. The pianists (Exp 1) showed a significant interaction between implied pitch and response hand (F1(1,19) = 4.8, p \ .05, F2(1,56) = 6.77, p \ .05) with faster responses to sentences implying high pitch with a right compared to a left keypress response and faster responses to sentences implying low pitch with a left compared to a right keypress response. Sentence type (explicit vs. implicit) did not modify this interaction (Fs \ 1). For the non-musicians, no interaction between implied pitch and response hand was found (Fs \ 1). Additionally, the data showed significant main effects of implied pitch and sentence type in the by- participants analysis for both participant groups (pianists: F1(1,19) = 21.42, p \ .001, F2(1,56) = 1.4, p = .24; F1(1,19) = 29.87, p \ .001, F2(1,56) = 2.56, p = .12; non-musicians: F1(1,23) = 20.01, p \ .001, F2(1,56) = 1.21, p = .28; F1(1,23) = 27.14, p \ .001, F2(1,56) = 1.17, p = .28). Sentences implying high pitch yielded faster responses compared to sentences implying low pitch and implicit sentences were responded to faster than explicit sentences. The results show that specific musical experiences can influence a linguistically implied space-pitch association. This is in line with the mental simulation view of language comprehension suggesting that language understanding involves multimodal knowledge representations that are based on experiences acquired during interactions with the world. References Barsalou LW (1999) Perceptual symbol systems. Behav Brain Sci 22:577–660 Glenberg AM, Kaschak MP (2002) Grounding language in action. Psychon Bull Rev 9(3):558–565 Lidji P, Kolinsky R, Lochy A, Morais J (2007) Spatial associations for musical stimuli: a piano in the head? J Exp Psychol 33(5):1189–1207 Masson MEJ, Loftus GR (2003) Using confidence intervals for graphically based data interpretation. Can J Exp Psychol 57(3):203–220 Rusconi E, Kwan B, Giordano BL, Umilta´ C, Butterworth B (2006) Spatial representation of pitch height: the SMARC effect. Cognition 99:113–129 Trimarchi PD, Luzatti C (2011) Implicit chord processing and motor representation in pianists. Psychol Res 75:122–128 Zwaan RA, Madden CJ (eds) (2005) Embodied sentence comprehension. CUP, Cambridge
Cogn Process (2014) 15 (Suppl 1):S1–S158
A free energy approach to template matching in visual attention: a connectionist model Keyvan Yahya1, Pouyan R. Fard2, Karl J. Friston3 1 University of Birmingham, Edgbaston, Birmingham,UK; 2 Graduate school of Neural Information Processing, University of Tu¨bingen, Germany; 3 The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, London, UK Abstract In this work, we propose a free energy model for visual template matching (FR-SAIM) based on the selective visual attention and identification model (SAIM). Keywords Selective Visual Attention, Template Matching, Free Energy Principle Introduction Visual search is a perceptual task that has been extensively studied in the cognitive processing systems literature. It is widely known that this process rests on matching the input from visual field with a topdown attentional set, namely, a ‘search template’. However, the way this attentional set is formed —and how it guides the visual search — is still is not clear. The free energy principle is an emerging neurocognitive framework, which tries to account for how interactions within a self-organizing system, like the brain, lead to represent, perceive and interpret sensory data —by minimizing a free energy that can be considered as ‘prediction error’ (Friston 2009). By extending the SAIM (Heinke, Humphreys 2003), we demonstrate how connectionist models can shed a light on how free energy minimization mediates template matching in a visual attention model. The Overview of FR-SAIM model The architecture of the FR-SAIM model is illustrated in Fig. 1a. In brief, visual input sampling is carried out by the content network (CN), while controlled by the selection network (SN), and mapped to the focus of attention (FOA). When multiple objects appear in the retina, there is a property called inhibition of return to make the model select one and only one object to avoid them being overlapped in the FOA. At the same time, the content network rectifies the already selected objects. Every neuron in the CN (sigma-pi nodes) holds a correspondence between the retina and the FOA. On the other hand, the SN determines which one of them is instantiated. By using a top-down
A
B C
S75 control mechanism, the knowledge network (KN) identifies the content of the FOA in comparison with the template it entails. Moreover, the location map complements the matching task by imposing another top-down control that supervises the selection of the input image. In the FR-SAIM, every network is associated with a free energy function in a hierarchical fashion. Each lower-level network makes a prediction and sends it up to the level above and in turn, each higherlevel network calculates the top-down prediction error signal and returns to the level below. The Generative Model: To model sensory information in a hierarchical structure, we define a nonlinear function, say f, to represent our state-space in terms of the sensory states (input data), in the way the following equation suggests: ð1Þ si ¼ f xðiÞ þ w : w Nð0; Rm ðx; m; cÞÞ where the causal states m are mediated by hidden states x and thereby the hierarchical states link together and bring about a memory for the model and establish the local dynamics xðiÞ :xm ; xx are both random fluctuations produced through observation. Concerning equation (1), the model dynamics can be written in a hierarchical fashion as follows: ð2Þ xð0Þ ¼ f xð1Þ xð1Þ ¼ f xð2Þ þ Ui ð3Þ xð2Þ ¼ f xð1Þ bottomup ð4Þ xð2Þ ¼ f xð3Þ topdown prediction ð5Þ where U(i) is the action the networks takes to modify the selection process of sen-sory data and is denoted by Ui ¼ max½xð2Þ ; xð3Þ . The Energy Functions: The energy functions of the neural networks in the FR-SAIM are derived by combining the original SAIM network energy functions and the prediction errors computed using free energy principle. The details of mathematical derivation of these functions are discussed in Yahya (2013). These energy functions can be written as follows:
E
SCN
SN xCN ij ; xkl
X bCN X X CN SN ¼ xij ySN xVF kl kl ykþi;lþj 2 ij kl kl X 2 þ ykl SN 1
!2
ð6Þ
kl
a X 2 KN CN ¼ EKN yKN yKN 1 þbKN m ; xij l 2 l 0 !2 1 X X KN @yl xCN wlij A KN ij yl l
ð7Þ
ij
X 2 aLM X LM SN SN LM ELM yLM yLM ¼ l ykl 1 þbLM kl ; xkl kl xkl ykl 2 l
Top-down Modulation Focus of Attention
ð8Þ D Top-down Modulation
Finally, the gradient descent method, at time step t, will be imposed on all of the network energy functions in order to have them minimized: xi ðt þ 1Þ ¼ xi ðtÞ
Fig. 1 a Architecture of the FR-SAIM Model, b Visual field input to the model, c Activation patterns of the content network during simulation, d Time course of activation of the content network
oEðxi Þ oxi
ð9Þ
Simulation Results Simulation results are shown in Fig. 1b–d. Here, the model starts processing visual input and will put the result into the FOA. These results illustrate how the target template ‘2’, won the competition over the distractor template ‘+’, by dominating the activation of the
123
S76 content network, as time passes. Furthermore, the time plot of the content network shows how the obtained network energy functions are minimized with regards to free energy principle. References Friston KJ (2009) The free-energy principle: a rough guide to the brain? Trend Cogn Sci 13(7):293–301 Heinke D, Humphreys GW (2003) Attention, spatial representation, and visual neglect: simulating emergent attention and spatial
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 memory in the selective attention for identification model (SAIM). Psychol Rev 110:29–87 Yahya K (2013) A computational study of visual template identification in the SAIM: a free energy approach, MPhil Thesis, University of Birmingham
Cogn Process (2014) 15 (Suppl 1):S1–S158
Oral Presentations Analyzing psychological theories with F-ACT-R: an example F-ACT-R application Rebecca Albrecht, Bernd Westphal Informatik, Universita¨t Freiburg, Germany Abstract The problem to decide whether an ACT-R model predicts experimental data is, today, solved by simulation. This, of course, needs a complete ACT-R model and fixed global parameter settings. Such an ACT-R model may include implementation details, e.g. the use of control variables as part of declarative knowledge, in order to yield expected results in simulation. Some of these implementation details are not part of a psychological theory but, nevertheless, may change the model’s behavior. On the other hand, the crucial parts of a psychological theory modelled in ACT-R may only depend on very few rules. Based on a formal semantics for the ACT-R architecture we present preliminary results on a method to formally analyze whether a partial ACT-R model predicts experimental data, without the need for simulation. Keywords ACT-R, Formal Methods, Model Analysis, SMT, Model Checking Introduction In cognitive modelling, computer models are used to describe human cognitive processes wrt. psychological assumptions. Unified theories of cognition and their implementations (called cognitive architectures) provide means for cognitive modelling. A widely used unified theory of cognition and cognitive architecture is ACT- R (Anderson 1983, 2007). ACT-R is a so-called hybrid architecture which consist of a symbolic and a subsymbolic layer. As part of the symbolic layer declarative knowledge (chunks) and procedural knowledge (production rules) is defined. The interface between the symbolic and the subsymbolic layer in ACT-R is given by so- called modules. Modules are requested by production rules to process declarative information and make them accessible through associated buffers. The subsymbolic layer is defined by the behavior of modules, i.e. the responses of modules for given requests. For some modules, these responses depend on numerical parameters, e.g. the decay rate for the implementation of base-level learning as part of the declarative module. The process of cognitive modelling in ACT-R can be described as defining a model which adequately predicts average human data collected in experiments. Today this process is typically performed as follows. There is a psychological theory, i.e., a hypothesis on how a given task is principally solved by humans. In order to validate the psychological theory, an ACT-R model which implements the theory is constructed and evaluated wrt. experimental data. Practically, figures like average error rates or response times are derived from several executions of the ACT-R model and compared to average human data collected in experiments. If the figures obtained from executions of the ACT-R model deviate too far from experimental data, there are two possible ways to adjust the model’s behavior. On the one hand, numerical parameters can be adjusted, on the other hand, a different implementation of the psychological theory can be provided. If there is no implementation and parameter setting with which the cognitive architecture yields adequate predictions, the psychological theory needs to be rejected. Today, the only available method for ACT-R model validation is simulation, i.e. repeated model execution. Using this method for the validation of psychological theories requires an ACT-R model which is suitable for simulation. Creating such a sufficiently complete ACTR model may take a significant effort even if issues of a theory may depend on only few production rules of a model. For example, a psychological theory may be invalid because according to the theory a
S77 certain number of rules must be executed whose sequential execution takes more time than it takes humans to solve corresponding (sub-) tasks in the experiments. In this work we propose a new method to investigate the validity of a psycho- logical theory with ACT-R models. Based on a formal semantics (Albrecht 2013; Albrecht and Westphal 2014) of ACT-R, we reduce the question whether global parameter settings exist such that, e.g., a timely execution of a set of ACT-R rules is possible, to a satisfiability problem, i.e. a formula in first order logic. In order to analyze the resulting satisfiability problem we use a satisfiability modulo theories (SMT) (De Moura and Bjørner 2011) solver to analyze it. If the SMT solver proves the given formula unsatisfiable, we can conclude that there are no appropriate global parameter settings, thus there is an issue with the given implementation of the psychological theory. If the SMT solver proves the given formula satisfiable, we obtain valuable hints on global parameter settings and can check them for plausibility. As our approach is not based on actual executions of an ACT-R model, it in particular applies to partial ACT-R models, i.e., to small sets of rules essential for the psychological theory. This may save significant modelling effort. Motivating Example Experimental Setting. A typical task in the domain of relational spatial reasoning with mental models is the following. Objects are visually presented to participants either on the left or on the right of a computer screen (cf. Fig. 1). The position of objects on two subsequently shown screens implicitly encodes a relation between two objects. For example, the two leftmost screens in Fig. 1 together encode the relation ‘‘A is to the left of B’’. The psychological experiment consists of n different tasks, where task i consists of showing six screens at times t0i ,…,t5i . The two relations encoded by the first four screens are called premises, the relation encoded by the last two screens shown at t4i and t5i is called conclusion. After the sixth screen of a task has been shown, participants should state whether the two premises and the conclusion are contradictory. In the example shown in Fig. 1, they are not contradictory because objects A, B, and C can be arranged in an order which satisfies both premises and the conclusion. The Theory of Preferred Mental Models In the preferred mental model theory (Ragni, Knauff and Nebel 2005), it is assumed that participants construct a mental spatial array of dynamic size which integrates information given by the premises. Whether a conclusion contradicts the given premises is checked by inspecting the spatial array. Furthermore, it is assumed that only one preferred mental model is constructed immediately when the premises are presented. Only if the given conclusion does not hold in the preferred mental model an alternative mental model is constructed. For example, a possible model of the premises shown in Fig. 1 is to order the objects as A, C, B. This model does not imply the conclusion. Modelling the Theory of Preferred Mental Models. When modelling the theory of preferred mental models in ACT-R, a crucial aspect is the use of the declarative memory. In the ACT-R theory, the time and probability for retrieving a chunk from declarative memory depend on the activation of chunks. Activation in turn depends on different
Fig. 1 Example relational reasoning task with id i. Premise 1 is ‘‘A is to the left of B’’, premise 2 is ‘‘A is to the left of C’’, and the conclusion is ‘‘B is to the left of C’’. The time when the j-th stimulus is presented is denoted by tij
123
S78 assumptions on human memory processing, e.g. spreading activation, where the content of the declarative memory is considered and base level learning, where the history is considered. In an ACT-R cognitive architecture where only base level learning is considered, the activation is calculated based on two global parameters: the decay rate d which determines how fast the activation of a chunk decays over time and the threshold s which defines a lower bound on activation values for successful chunk retrieval. A fundamental assumption of the theory of preferred mental models is that the preferred mental model for the two premises is constructed before the conclusion is presented. That is, the behavior of the environment imposes hard deadlines on the timing of the model: any valid ACT-R model for the theory of preferred mental models must complete the processing of all rules needed to construct the preferred mental model before the next stimulus is presented. Consider the top row of Fig. 2 for a more formal discussion. During a task, stimuli are presented to the participants at fixed points in time. For example, let E1 denote the third screen (i.e. the onset of the first element of premise 2) and E2 denote the fifth screen (i.e. the onset of the first element of the conclusion) shown at times ti2 and ti4, respectively, in the i-th task. This is the interval where the second premise has to be processed. Then, according to the assumption stated above, processing of premise 2 has to be completed within tb := t2i t4i time units. An ACT-R model for this task in particular needs to model successful solutions of the task. That is, in an ACT-R model which is valid given experimental data, the execution of all rules which are involved in constructing the mental model must complete in at most tb time units. In Fig. 2, we illustrate cognitive states by the circular nodes, arrows indicate the execution of one rule which transforms one cognitive state into another. In addition to rules which request the declarative module, an ACT-R model of the theory of preferred mental models may comprise rules with deterministic timing and outcome, e.g., when modifying buffers of the imaginal module. In Fig. 2, we assume that there is only one request to the declarative module by rule r, i.e. a request for the already constructed mental model comprising premise 1, which has two qualitatively different outcomes: a correct reply, and a wrong reply or no reply at all. Now given corresponding rules, if it is impossible to choose the decay rate d and the threshold s such that ti2 - ti4 B tb, then the considered rules definitely do not constitute a valid (partial) ACT-R model for the preferred mental model theory. A Valid ACT-R Model for the Theory of Preferred Mental Models. The preferred mental model theory has been implemented in ACT-R (Ragni, Fangmeier and Bru¨ssow 2010). In this model, each premise is represented by a mental model chunk which is constructed in the imaginal buffer. A mental model chunk specifies a number of positions pos1, pos2,… and assigns objects presented on the computer screen accordingly. When premise 2 is presented, the mental model chunk representing the first premise has to be retrieved from declarative memory in order to construct a new mental model chunk which integrates both premises. In the ACT-R model for the preferred mental model theory, only base-level learning is considered. In the following, we use a part of this ACT-R model to illustrate our approach. As the ACT-R model predicts the experimental data appropriately for a certain choice of parameters, we expect our approach to confirm this result. Formal Analysis of ACT-R Models Formally, an ACT-R production rule is a pair r = (p, a) which comprises a precondition p and an action a. An ACT-R model is a set of production rules. A cognitive state c = (s, t) consists of a mapping s from buffers to chunks or to the symbol nil, and a time-stamp t 2 Rþ 0. The F-ACT-R formal semantics (Albrecht, Westphal 2014) explains how a set of production rules induces a timed transition system on cognitive states given a set of global parameters, including
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 decay d and threshold s. Two cognitive states c = (s, t) and c0 = (s0 , t0 ) are in transition relation, denoted by c ? rc0 , if there is a rule r = (p, a) such that precondition p is satisfied in s, s0 is obtained by applying a to s, and t0 - t is the time needed to execute action a. Now the ACT-R model validity problem stated in Section 2 basically reduces to checking whether, given a start cognitive state (c, t)) and a goal state (c0 , t0 ) there exist values for d and s such that there is a sequence of transitions r1
r2
rn
ðco ; t0 Þ !ðc1 ; t1 Þ ! . . . !ðcn ; tn Þ
ð1Þ
with c0 = c and cn = c0 . For an example, consider the phase of the preferred mental model theory shown in Fig. 2 as discussed in Section 2. More specifically, consider a rule r which requests the declarative module for a mental model representing premise 1 when the first screen of the second premise is presented at time ti2. In the following, we consider for simplicity a model where r is the only nondeterministic rule which is ever enabled between ti2 and ti4 and that the sequence of rules executed before and after rule r is deterministic. Then the time to execute the model only varies wrt. the time for executing r. The model is definitely not valid if there are no choices for decay d and threshold s such that there is a transition c ? rc0 where c = (s, t) is a cognitive state associated with a realistic history and c0 = (s0 , t0 ) is a cognitive state where the mental model chunk representing premise 1 has correctly been recalled. This can be encoded as a satisfiability problem as follows. A cognitive state can be characterized by a formula over variables V which model buffer contents, i.e. cognitive states. We can assume formula us0 over variables V0 to encode cognitive state s and us0 . over variables V0 to encode cognitive state s0 . The precondition p of rule r can be seen as a formula over V, the action a relates s and s0 , so it is a formula over V and V0 . Furthermore, we use A(c, t) to denote the activation of chunk c at time t. We use optimized base-level learning to calculate activation values: A(c, t) = ln (2) - ln (1 - d) - d(t - tc) where tc is the first time chunk c was presented. For our experiments, we consider two
Fig. 2 Example sequence of cognitive states (circles) in between environment event E1 and E2 (rectangles). A cognitive state which leads to a correct reply is denoted by ‘V’, and a state which leads to a wrong reply or no reply at all as ‘X’. Label r indicates a state where a retrieval request is posed to the declarative module
Cogn Process (2014) 15 (Suppl 1):S1–S158
S79
chunks c1, which correctly represents premise 1, and c2, which does not. The formula to be checked for satisfiability, then, is 9d; s : us ^ p ^ ðAðc1 ; tÞ [ s ^ Aðc2 ; tÞ\Aðc1 ; tÞÞ ^ a ^ u0s ^ ðt0 tÞ\tb :
ð2Þ
As a proof-of-concept, we have used the SMT solver SMTInterpol (Christ, Hoenicke and Nutz 2012) to check an instance of (2). With an appropriate start cognitive state, SMTInterpol reports satisfiability of (2) and provides a satisfying valuation for d and s in about 1 s in total. If we choose an initial cognitive state where the activation of c1 is too low, SMTinterpol proves (2) unsatisfiable as expected. By adding, e.g., constraints on s and d to (2), we can use the same procedure to check whether the model is valid for particular values of d and s which lie within a range accepted by the community. Note that our approach is not limited to the analysis of single rules. Given an upper bound n on the number of rules possibly executed between two points in time, a formula similar to (2) can be constructed. Conclusion We propose a new method to check whether and under which conditions a psychological theory implemented in ACT-R predicts experimental data. Our method is based on stating the modelling problem as a satisfiability problem which can be analyzed by an SMT solver. With this approach it is in particular no longer necessary to write a complete ACT-R model in order to evaluate a psychological theory. It is sufficient to provide those rules which are possibly enabled during the time considered for the analysis. For example, in Albrecht and Ragni (2014) we propose a cognitive model for the Tower of London task, where an upper bound on the time to complete a retrieval request for the target position of a disk is defined as the time it takes the visual module to encode the start position. We expect the evaluation of such mechanisms to become much more efficient using our approach as compared to simulationbased approaches. In general, we believe that by using our approach the overall process of cognitive modelling can be brought to a much more efficient level by analyzing crucial aspects of psychological theories before entering the often tedious phase of complete ACT-R modelling. References Albrecht R (2013) Towards a formal description of the ACT-R unified theory of cognition. Unpublished master’s thesis, Albert-Ludwigs-Universita¨t Freiburg Albrecht R, Ragni M (2014) Spatial planning: an ACT-R model for the tower of London task. In: Proceedings of spatial cognition conference 2014, to appear Albrecht R, Westphal B (2014) F-ACT-R: dening the architectural space. In: Proceedings of KogWis 2014, to appear Anderson JR (1983) The architecture of cognition, vol 5. Psychology Press Anderson JR (2007) How can the human mind occur in the physical universe? Oxford U Press Christ J, Hoenicke J, Nutz A (2012) SMTInterpol: an interpolating SMT solver. In Donaldson AF, Parker D (eds) SPIN, vol 7385. Springer, pp 248–254 De Moura L, Bjørner N (2011) Satisfiability modulo theories: introduction and applications. Commun ACM 54 (9): 69–77. doi: 10.1145/1995376.1995394 Ragni M, Fangmeier T, Bru¨ssow S (2010) Deductive spatial reasoning: from neurological evidence to a cognitive model. In: Proceedings of the 10th international conference on cognitive modeling, pp 193–198
Ragni M, Knauff M, Nebel B (2005) A computational model for spatial reasoning with mental models. In: Proceedings of the 27th annual cognition science conference, pp 1064–1070
F-ACT-R: defining the ACT-R architectural space Rebecca Albrecht, Bernd Westphal Informatik, Universita¨t Freiburg, Germany Abstract ACT-R is a unified theory of cognition and a cognitive architecture which is widely used in cognitive modeling. However, the semantics of ACT-R is only given by the ACT-R interpreter. Therefore, an application of formal methods from computer science in order to, e.g., analyze or compare cognitive models wrt. different global parameter settings is not possible. We present a formal abstract syntax and semantics for the ACT-R cognitive architecture as a cornerstone for applying formal methods to symbolic cognitive modeling. Keywords ACT-R, Cognitive Architectures, Formal Methods, Abstract Syntax, Formal Semantics Introduction In Cognitive Science researchers describe human cognitive processes in order to explain human behavioral patterns found in experiments. One approach is to use cognitive architectures which implement a set of basic assumptions about human cognitive processes and to create cognitive models with respect to these assumptions. ACT-R (Anderson 1983, 2007) is one such cognitive architecture, which provides a programming language to create a cognitive model and an interpreter to execute the model. The ACTR architecture is a hybrid architecture which includes symbolic and subsymbolic mechanisms. Symbolic mechanisms consist of three concepts, namely assuming a modular structure of the human brain, using chunks as basic declarative information units and production rules to describe processing steps. Subsymbolic processes are associated with the modules’ behavior. The modules’ behavior is controlled by so-called global parameters, which enable the execution of a cognitive model with respect to different assumptions about human cognition. In this work we introduce a formal abstract syntax and semantics for the ACTR cognitive architecture. An ACT-R architecture is defined as a structure which interprets syntactic components of an ACT-R model with respect to psychological assumptions, e.g. global parameter settings. As a result, we construct a complete transition system which describes all possible computations of an ACT-R model with respect to one ACT-R architecture. The architectural space of ACT-R is defined as the set of all possible ACT-R architectures. State of the Art ACT-R. The functionality of ACT-R is based on three concepts. Firstly, there are basic information units (chunks) which describe objects and their relationship to each other. A chunk consists of a chunk type and a set of slots which reference other chunks. Secondly, the human brain is organized in a modular structure, that is, information processing is localized differently wrt. how information is processed. There are different modules for perception, interaction with an environment, and internal mental processes. When requested, each module can process one chunk at a time and the processing of chunks needs time to be completed. The processed chunk is made accessible by the module through an associated buffer. A state in cognitive processing (cognitive state) is the set of chunks made accessible by modules through associated buffers. Thirdly, there are cognitive processing steps, i.e. changing a cognitive state by altering or deleting chunks which are made accessibly by modules, or requesting new chunks from modules. This
123
S80 is accomplished by the execution of production rules. A production rule consists of a precondition, which characterizes cognitive states where the production rule is executable, and an action which describes changes to cognitive states when the production rule is executed. Basically, actions request modules to process certain chunks. Which chunk is processed and how long this processing takes depends on the implementation of psychological assumptions within the modules and may be controlled by global parameters. Formalization of Symbolic Cognitive Architectures. To the best of our knowledge, there is no formal abstract description of ACT-R. Other works try to make cognitive modelling more accessible by utilizing other modelling formalisms, like GOMS (Card, Moran and Newell 1983) as high level languages for ACT-R (Salvucci, Lee 2003; St. Amant, Freed and Ritter 2005; St. Amant, McBride and Ritter 2006). In other approaches the authors propose high-level languages which can be compiled into different cognitive architectures, e.g. ACT-R and SOAR (Laird, Newell and Rosenbloom 1987). This includes HERBAL (Cohen, Ritter and Haynes 2005; Morgan, Haynes, Ritter and Cohen 2005) and HLSR (Jones, Crossman, Lebiere and Best 2006). All these approaches do not report a formal description for ACT-R but only describe the high-level language and compilation principles. In Schultheis (2009), the author introduces a formal description for ACT-R in order to prove Turing completeness. However, this formal description is too abstract to be used as a complete formal semantics for ACT-R. In Stewart and West (2007) the authors analyze the architectural space of ACT-R. In general, this idea is similar to the idea presented in this work. However, the result of their analysis is a new implementation of ACT-R in the Python programming language. Therefore, it is not abstract and, e.g., not suitable for applying formal methods from software engineering. A Formal Definition of ACT-R In this section, we describe the basic building blocks of our formalization of ACTR. The formalization complies to the ACT-R theory as defined by the ACT-R interpreter. Firstly, we provide an abstract syntax for ACT-R models which includes chunk instantiations, abstract modules, and production rules. In our sense, an ACT-R model is simply a syntactic representation of a cognitive process. Secondly, we formally introduce the notion of architecture as an interpretation of syntactic entities of an ACT-R model with respect to psychological assumptions, i.e. subsymbolic mechanisms. This yields a representation of cognitive states and finite sequences thereof. Thirdly, for a given model we introduce an infinite state, timed transition system over cognitive states which is induced by an architecture. Abstract Syntax. We consider a set of abstract modules as a generalization of the particular modules provided by the ACT-R tool. A module M consists of a finite set of buffers B, a finite set of module queries Q, and a finite set of action symbols A. Buffers are represented as variables which can be assigned to chunks. Module queries are represented as Boolean variables. As action symbols we consider the standard ACT-R action symbols ‘ + ’, ‘ = ’, and ‘-’. In order to describe the ACT-R syntax we define the signature of ACT-R models which is basically a set of syntactic elements of a model. The signature of an ACT-R model consists of a set of modules, a set of chunk types, a set of ACT-R variables, and a set of relation symbols. A chunk type consists of a type name and a finite set of slot names (or slots for short). An abstract production rule consists of a precondition and an action. A precondition is basically a set of expressions over a model’s signature, i.e. over the content of buffers of a module parameterized by ACT-R variables and module queries. An action is also an expression over the model’s signature which uses action symbols of modules. An abstract ACT-R model consists of a finite set of production rules R, a finite set of initial buffer actions A0 in order to define the initial state, and a finite set of chunk type instantiations C0.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Architecture. In this section we describe the formal interpretation of an abstract ACT-R model with respect to psychological assumptions. We propose to denote by architecture a structure which provides all necessary means to interpret an abstract ACT-R model. To this end, an architecture defines chunks as the building blocks of declarative knowledge, i.e. instantiations of chunk types, an interpretation function for each action symbol of a module, and a production rule selection mechanism. In order to describe all computations of an ACT-R model as a transition system, we introduce the notion of a cognitive state and finite sequences thereof which are induced by the execution of production rules. The introduction of finite sequences of cognitive states is necessary as the interpretation of actions depends on the history of a model. As the most prominent example consider base-level learning in ACT-R. A request to the declarative module yields one chunk as a result. In general, it is possible that more than one chunk fits the request. Which chunk is returned by the declarative module depends on how often and when all fitting chunks were processed by the model before. We use D to denote the set of chunks, where a chunk c 2 D is a unique entity which has a chunk type and maps each slot (as defined by the chunk type) to another chunk. A cognitive state c is a function which maps each buffer b 2 B to a chunk c 2 D and a delay d 2 Rþ 0 . The delay corresponds to the timing behavior of modules. By mapping buffer b to a delay d [ 0 we indicate that there is an action pending and that it will be completed in d time units. If there is no action pending, d is 0. This is a slightly different view as common in ACT-R, where a chunk is accessible in a cognitive state only after it has been processed by the module, i.e. if the module’s delay d is 0. Intuitively, in our representation, an interpreter is able to look ahead when scheduling actions. In the following, we use C to denote the set of all cognitive states and Cpart to denote the set of all partial cognitive states, i.e., functions which not necessarily assign all buffers. A finite trace p is simply a finite sequence c0 ; c1 ; . . .; cn 2 C of cognitive states. In the following, we use P to denote the set of all finite traces. Given an action symbol a 2 A of a module M, an interpretation of a D is a function Ihai : P ! 2Cpart K2 which assigns to each finite trace p a set of possible effects of the action. An effect is a triple (cpart, k, C) consisting of a partial cognitive state cpart, a valuation k 2 K of module queries, and a set C 2 2D of chunks. The partial cognitive state cpart defines an update of the buffer contents, k provides new values for module queries, and C comprises the chunks which are removed from buffers and which have to be considered for an update of the declarative memory. Similarly, the production rule selection mechanism is formally a function S : P ! 2R which yields a set of production rules. The production selection mechanism decides whether a precondition is satisfied in a cognitive state given an interpretation of relation symbols and an assignment of ACT-R variables to chunks. Note that our notion of architecture provides a clear interface between the symbolic layer of ACT-R, i.e. the world of rules and chunks, and the sub-symbolic layer, i.e. formal principles corresponding to human cognitive processing captured by the interpretation functions of actions symbols associated with modules. Furthermore, each choice of global parameters e.g. the decay rate in base-level learning corresponds to exactly one architecture as defined above. Architectures differ in the definitions of the interpretation functions I, i.e. which effects are obtained for a given finite trace, and in the production rule selection function S. Behavior of ACT-R Models. In this section we introduce the computational space of an ACT-R model given an ACT-R architecture. This is done by introducing a labelled, timed transition system as induced by a model and an architecture. To this end, we define the following transition relation. Two time-stamped cognitive states (c, t) and (c0 , t0 ) are in transition relation wrt. a production rule r 2 R, an execution delay s 2 Rþ 0 for production rule r, a set of chunks x 2 D, and a finite trace p 2 P, i.e.
Cogn Process (2014) 15 (Suppl 1):S1–S158 r;s;x
ðc; tÞp! ðc0 ; t0 Þ; if and only if production rule r is executable in cognitive state wrt. the finite trace p, i.e. if r 2 S(p, c), if the effect of the actions in r according to the interpretation functions of the action symbols yields c0 , and if time-stamp t0 is t + s. The introduced transition relation corresponds to a cognitive processing step in ACT-R, i.e. the execution of one production rule. The transition relation ‘? ’ induces an (infinite state) timed transition system with the initial state defined by the cognitive state given by initial buffer actions A0. Given an ACT-R model, there is a one-to-one correspondence between the set of simulation runs obtainable from the ACT-R tool (for a given set of parameters) and computation paths in the timed transition system induced by the architecture corresponding to the chosen parameters. We validated the formal description by comparing a prototype implementation to the ACT-R interpreter for several models described in the ACT-R tutorial. Conclusion In this work, we presented the first comprehensive, high-level formal semantics for the ACT-R programming language as defined by the ACT-R interpreter. By our notion of ACT-R architectures, we have precisely captured the architectural space of ACT-R. Our formalization lays the ground to approach a number of known issues with the ACT-R modelling language. Firstly, our notion of architecture can be used to explicitly state all assumptions under which cognitive models are created and evaluated. Then the architectures used for different cognitive models can be compared precisely due to the formal nature of our definition. We expect such comparisons to provide deeper insights into human cognition as such. Today, mechanisms and parameter settings employed for modelling and evaluation are often neither reported nor discussed, mainly due to the intransparent integration of these principles in the ACT-R interpreter. Secondly, our formal semantics allow to compare different ACT-R models. Whether two models with (syntactically) different rule sets describe the same behavior now amounts to proving that the induced timed transition systems are equivalent. Thirdly, our formal view on ACT-R models allows to go beyond today’s quantitative evaluation of ACT-R models with the ACT-R interpreter towards a qualitative evaluation. Today, the ACT-R interpreter is typically used to compute abstract quantitative figures like the average time needed by the model to solve a certain task. Our formalization provides a stepping stone to, e.g., formal analysis techniques. With these techniques we can, for instance, analyze whether and under what conditions certain aspects of psychological theories (Albrecht, Westphal 2014) can possibly predict empirical data, or check whether and under what conditions a certain cognitive state which is crucial for a modelled psychological theory is reachable. Last but not least, formal techniques can be applied to improve the software engineering aspect of ACT-R modelling which is often perceived to be rather inefficient and error prone (Morgan et al. 2005; Jones et al. 2006) in the literature. Furthermore, the scope of our work is not limited to ACT-R but has a clear potential to affect the whole domain of rule-based cognitive architectures. Firstly, efforts to provide alternative ACT-R interpreters like (Stewart, West 2007) can refer to a common reference semantics. Secondly, we are able to formally establish connections between different cognitive architectures ranging from general purpose architectures like SOAR to special purpose architectures like CasCas (Lu¨dtke et al. 2006). In future work, our formalization needs to be extended to cover probabilistic aspects. Furthermore, we plan to extend the prototype implementation of our formal description (Albrecht, Gießwein and Westphal 2014) to support more ACTR models before we investigate options for improved high level model description languages that are explicitly suitable for the ACT-R theory.
S81 References Albrecht R, Gießwein M, Westphal B (2014) Towards formally founded ACT-R simulation and analysis. In: Proceedings of KogWis 2014, to appear Albrecht R, Westphal B (2014) Analyzing psychological theories with F-ACT-R. In: Proceedings of KogWis 2014, to appear Anderson JR (1983) The architecture of cognition, vol 5. Psychology Press. Anderson JR (2007) How can the human mind occur in the physical universe? Oxford U Press Card SK, Moran TP, Newell A (1983) The psychology of human– computer interaction. CRC. Cohen MA, Ritter FE, Haynes SR (2005) Herbal: a high-level language and development environment for developing cognitive models in soar. In: Proceedings of 14th conference on behavior representation in modeling and simulation, pp 177–182 Jones RM, Crossman JA, Lebiere C, Best BJ (2006) An abstract language for cognitive modeling. In: Proceedings of 7th international conference on cognitive modeling. Lawrence Erlbaum, Mahwah Laird JE, Newell A, Rosenbloom PS (1987) Soar: an architecture for general intelligence. Artif Intell 33(1):1–64 Lu¨dtke A, Cavallo A, Christophe L, Cifaldi M, Fabbri M, Javaux D (2006) Human error analysis based on cognitive architecture. In HCI-Aero, pp 40–47 Morgan GP, Haynes SR, Ritter FE, Cohen MA (2005) Increasing Efficiency of the Development of User ModelsIn SIEDS, pp 82–89 Salvucci DD, Lee FJ (2003) Simple cognitive modeling in a complex cognitive architecture. In: CHI pp 265–272 Schultheis H (2009) Computational and explanatory power of cognitive architectures: the Case of ACT-R. In: Proceedings of 9th international conference cognitive modeling, pp 384–389 St. Amant R, Freed AR, Ritter FE (2005) Specifying ACT-R models of user interaction with a GOMS language. Cogn Syst Res 6(1):71–88 St. Amant R, McBride SP, Ritter FE (2006) An AI planning perspective on abstraction in ACT-R Modeling: toward an HLBR language manifesto. In: Proceedings of ACT-R Workshop Stewart TC, West RL (2007) Deconstructing and reconstructing ACTR: exploring the architectural space. Cogn Syst Res 8(3):227–236
Defining distance in language production: extraposition of relative clauses in German Markus Bader Goethe-Universita¨t Frankfurt, Institut fu¨r Linguistik, Frankfurt am Main, Germany Abstract This paper presents results from a corpus study and two language production experiments that have investigated the position of relative clauses in German. A relative clause can appear either adjacent to its head noun or extraposed behind the clause final verb. The corpus data show that the major factor deciding whether to extrapose or not is the distance that has to be crossed by extraposition. Relative clause length has an effect too, but a much smaller one. The experimental results show that distance is not defined as number of words but as new discourse referents in the sense of the Dependency Locality Theory of Gibson (2000). Keywords Sentence production, Extraposition, Dependency length, Dependency Locality Theory (DLT) Introduction A large body of research into word order variation has shown that constituent weight is a major factor determining the choice between competing syntactic alternatives (e.g., Hawkins 1994; Wasow 2002). More recently, it has become common to define weight in terms of the
123
S82 length of syntactic dependencies, like the dependencies between verbs and their arguments (e.g., Hawkins 2004; Gildea, Temperley 2010). This raises the question of how dependency length is to be measured. The syntactic alternation considered in this paper concerns the position of relative clauses in German. As shown in (1), a relative clause in German can appear either adjacent to its head noun (1-a) or extraposed behind the clause final verb (1-b).
Cogn Process (2014) 15 (Suppl 1):S1–S158 same way as proposed by the DLT for language comprehension, namely in terms of new discourse referents, not in terms of words. Corpus Analysis About 2000 sentences containing a relative clause in either adjacent or extraposed position were drawn from the deWaC corpus (Baroni, Bernardini, Ferraresi and Zanchetta 2009) and analyzed. Preliminary results of the ongoing analysis are shown in Figs. 1 and 2. Figure 1 shows the effect of relative clause length. Figure 2 shows the effect of the post head- noun region, which is the region between head noun/ relative clause and clause-final verb (Geschenke in (3)). The verb is not included because the verb has always to be crossed when extraposing and additional analyses show that the length of the verbal complex has only very small effects on the rate of extraposition. When only the total extraposition distance is considered, as in the older corpus literature, one misses the point that it is the length of the post head-noun region which is crucially involved in determining extraposition. In accordance with earlier corpus studies of relative clause placement in German, the rate of extraposition increases with increasing length of the relative clause and decreases with increasing length of the post head-noun region. Furthermore, the length of the post head-noun region is a much more important predictor of relative
When deciding whether to keep the relative clause adjacent to its head noun or to extrapose it behind the clause-final verb, two dependencies have to be considered. One is the dependency between head noun and relative clause and the second one is the dependency between head noun and clause-final verb. As shown in (2) and (3), these two dependencies stand in a trade-off relation.
When the relative clause is adjacent to the head noun, the head noun—relative clause dependency (solid arrow) is optimal whereas the head noun–verb dependency (dashed arrow) is not because the relative clause intervenes between head noun and verb. Extraposition of the relative clause shortens the head noun–verb dependency but lengthens the head-noun– relative clause dependency, that is, while the former dependency improves the latter one becomes worse. Corpus studies (Hawkins 1994; Uszkoreit et al. 1998) show that the decision to extrapose depends on both dependencies. First, the rate of extraposition increases with increasing length of the relative clause. Second, the rate of extraposition declines with increasing extraposition distance, that is, with an increasing amount of material that intervenes between head noun and relative clause in the extraposed variant. In (3), for example, extraposition has to cross two words (Geschenke geben). In studies of language production (Stallings, MacDonald 2011) and corpus research (Gildea, Temperley 2010), distance is measured as number of words. The same is true for the efficiency theory proposed in (Hawkins 2004) which is neutral with regard to language production or language comprehension. This contrasts with the Dependency Locality Theory (DLT) of (Gibson 2000), which is a theory of processing load during language comprehension. According to the DLT, dependency length is not measured in number of words but in number of new discourse referents. The aim of the present work is to test the hypothesis that dependency length for purposes of language production is defined in the
123
Fig. 1 Proportion of extraposition depending on the length of the relative clause (in words)
Fig. 2 Proportion of extraposition depending on the length of the pre-verbal material (in words)
Cogn Process (2014) 15 (Suppl 1):S1–S158 clause placement than the length of the relative clause. When the post head-noun region is empty, extraposition is almost obligatory, but already a post head-noun region of four words drives the extraposition rate down to less than 10 %. In the following experiments, the post head-noun region will range from 0 to 2 words. As shown in Fig. 2, this relatively small increase has strong effects on the decision to extrapose when averaged across all different kinds of intervening material. In this case, the extraposition rate goes down from ca. 90 % for 0 words to 60 % for one word and to 35 % for two words. The question addressed by the following two experiments is whether more fine-grained distinctions show up when looking at particular types of intervening material. Experiment 1 In order to decide between defining dependency length in terms of number of words or number of new discourse referents 32 students participated in an oral production experiment which was a variant of the production-from-memory task (Bock, Warren 1985). Participants first read a main clause as in (4). After a visual prompt like Max said that, the initial main clause had to be repeated orally from memory in the form of an embedded clause. While the initial main clause fixed the lexical content of the to-be-produced embedded clause, participants were completely free with regard to the position of the relative clause.
The experiment varied the amount of material that had to be crossed by extraposition in addition to the verb: nothing (4-a), a bare NP object (4-b), or an NP object containing a determiner (4-c). The latter two conditions differ in number of words but are identical in number of new discourse referents. As shown by the corpus analysis, a difference of one word has a strong effect on the rate of extraposition in the length range under consideration. The percentages of sentences with extraposed relative clauses are presented in Table 1. Table 1 shows that the rate of extraposition decreases substantially in the presence of an object but the difference between one- and two-word objects is quite small. The results were analyzed by means of mixed-effect logistic regression using the R-package LME4 (Bates, Maechler 2010). The experimental factors were coded in such a way that all contrasts test whether differences between means are significant (so-called contrast coding). Table 2 shows the results of the statistical analysis. The difference between 0 words and 1 word was significant but the difference between 1 word and 2 words was not. In sum, the results of Experiment 1 suggest that distance is defined as number of new discourse referents, as in the DLT, and not as number of words. Experiment 2 To corroborate the results of Experiment 1, Experiment 2 uses the same experimental procedure for testing material that differs only in one respect from the material investigated in the first experiment. As shown in (5), the condition with one additional word before the verb now contains the indefinite pronoun etwas (‘something’) instead of a bare noun.
S83 Both the indefinite pronoun and a bare noun introduce a new discourse referent and should thus block extraposition in the same way. However, because the indefinite pronoun lacks lexical content, it causes less semantic processing costs. Since the cost of semantic processing is the underlying reason of why distance is measured in terms of new discourse referents in the DLT, it could be expected that it is easier to extrapose across an indefinite pronoun than across a bare noun. 27 students participated in Experiment 2. The results, which are also shown in Table 1, reveal a 14 % drop in extraposition rate in the presence of a one-word object and a further 9 % drop when going from one- to two-word objects. The results were analyzed as described for Experiment 1. The results of the logistic mixed effect regression are shown in Table 3. The difference between 0 words and 1 word was significant but the difference between 1 word and 2 words failed to reach significance. Discussion The experimental results presented in this paper show that the decision between keeping a relative clause adjacent to its head noun and extraposing the relative clause behind the clause-final verb is strongly affected by the amount of material that intervenes between head noun (including the relative clause) and the clause final verb. When a new discourse referent intervenes, the rate of extraposition is substantially reduced. Whether the new discourse referent was introduced by a oneword NP or a two-word NP had no significant effect, although numerically there were some differences in the expected direction. The results thus suggest that dependency length is defined in the same way for language production and language comprehension, namely in terms of new discourse referents. This in turn argues that the DLT has a broader coverage than just processing load during language comprehension. An alternative to defining weight in terms of new discourse referents is the prosodic theory proposed by (Anttila, Adams and Speriosu 2010) in their analysis of the English dative alternation. In a nutshell, (Anttila et al. 2010) propose that dependency length should be measured as number of intervening phonological phrases, where a phonological phrase consists of an accented lexical word possibly preceded by unaccented function words. According to this
Table 1 Percentages of extraposition in Experiments 1 and 2 Structure
% Extraposed in Exp 1
% Extraposed in Exp 2
Ø
38
54
N
15
40
Det + N
11
31
Table 2 Results of mixed effect model for Experiment 1 Contrast
Estimate
Std. Error
z value
Pr([|z|)
Ø vs. N
2.3233
0.6387
3.638
0.0002
N vs. Det + N
0.5776
0.8060
0.717
0.4735
Table 3 Results of mixed effect model for Experiment 2 Contrast
Estimate
Std. Error
z value
Pr([|z|)
Ø vs. N
1.0587
0.3855
2.747
0.0060
N vs. Det + N
0.5054
0.3313
1.526
0.1271
123
S84 definition, an NP consisting of a bare noun like Gedichte (’poems’) and an NP consisting of a determiner and a noun like einige Gedichte (’some poems’) both constitute a single phonological phrase. This would be compatible with the finding of Experiment 1 that the rate of extraposition did not differ significantly between these two types of NPs. In contrast to a bare noun like Gedichte, an indefinite pronoun like etwas ‘something’ does not form a phonological phrase because etwas is an unaccented function word. This predicts that the intervening indefinite pronoun etwas should be invisible with regard to extraposition. However, as shown by the results for Experiment 2, the rate of extraposition decreased significantly when etwas was present. The rate of extraposition decreased even further when an NP consisting of a determiner and a noun intervened, but this further decrease was not significant. The results of Experiment 2 do thus not support the prosodic definition of distance proposed by (Anttila et al. 2010). In sum, the results of the two experiments reported in this paper favor a definition of dependency length in terms of intervening new discourse referents. The two alternatives that were considered—distance measured as number of words or number of phonological phrases—could not account for the complete pattern of results. References Anttila A, Adams M, Speriosu M (2010) The role of prosody in the English dative alternation. Lang Cogn Process 25(7–9):946–981 Baroni M, Bernardini S, Ferraresi A, Zanchetta E (2009) The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang Resour Eval J 23(3):209–226. doi: 10.1007/s10579-009-9081-4 Bates DM, Maechler M (2010) lme4: linear mixed-effects models using S4 classes Bock JK, Warren RK (1985) Conceptual accessability and syntactic structure in sentence formulation. Cognition 21:47–67 Gibson E (2000) The dependency locality theory: a distance-based theory of linguistic complexity. In Marantz A, Miyashita Y, O’Neil W (eds) Image, language, brain. Papers from the first mind articulation project symposium. MIT Press, Cambridge, pp 95–126 Gildea D, Temperley D (2010) Do grammars minimize dependency length? Cogn Sci 34:286–310 Hawkins JA (1994) A performance theory of order and constituency. Cambridge University Press, Cambridge Hawkins JA (2004) Efficiency and complexity in grammars. Oxford University Press, Oxford Stallings LM, MacDonald MC (2011) It’s not just the ‘‘heavy NP’’: relative phrase length modulates the production of heavy-NP shift. J Psycholing Res 40(3):177–187 Uszkoreit H, Brants T, Duchier D, Krenn B, Konieczny L, Oepen S, Skut W (1998) Studien zur performanzorientierten Linguistik: Aspekte der Relativsatzextraposition im Deutschen. Kognitionswissenschaft 7:129–133 Wasow T (2002) Postverbal behavior. CSLI Publications, Stanford
How is information distributed across speech and gesture? A cognitive modeling approach Kirsten Bergmann, Sebastian Kahl, Stefan Kopp Bielefeld University, Germany Abstract In naturally occurring speech and gesture, meaning occurs organized and distributed across the modalities in different ways. The underlying cognitive processes are largely unexplored. We propose a model
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 based on activation spreading within dynamically shaped multimodal memories, in which coordination arises from the interplay of visuospatial and linguistically shaped representations under given cognitive resources. A sketch of this model is presented together with simulation results. Keywords Speech, Gesture, Conceptualization, Semantic coordination, Cognitive modeling Introduction Gestures are an integral part of human communication and they are inseparably intertwined with speech (McNeill, Duncan 2000). The detailed nature of this connection, however, is still a matter of considerable debate. The data that underlie this debate have for the most part come from studies on the coordination of overt speech and gestures showing that the two modalities are coordinated in their temporal arrangement and in meaning, but with considerable variations. When occurring in temporal proximity, the two modalities express the same underlying idea, however, not necessarily identical aspects of it: Iconic gestures can be found to be redundant with the information encoded verbally (e.g., ‘round cake’ + gesture depicting a round shape), to supplement it (e.g., ‘cake’ + gesture depicting a round shape), or even to complement it (e.g., ‘looks like this’ + gesture depicting a round shape). These variations in meaning coordination—together with temporal synchrony—led to different hypotheses about how the two modalities encode aspects of meaning and what mutual influences between the two modalities could underlie this. However, a concrete picture of this and in particular of the underlying cognitive processes is still missing. A couple of studies have investigated how the frequency and nature of gesturing, including its coordination with speech is influenced by cognitive factors. There is evidence that speakers indeed produce more gestures at moments of relatively high load on the conceptualization process for speaking (Kita, Davies 2009; Melinger, Kita 2007). Moreover, supplementary gestures are more likely in cases of problems of speech production (e.g. disfluencies) or when the information conveyed is introduced into the dialogue (and thus conceptualized for the first time) (Bergmann, Kopp 2006). Likewise, speakers are more likely to produce non-redundant gestures in facetip-face dialogue as opposed to addressees who are not visible (Bavelas, Kenwood, Johnson and Philips 2002). Chu et al. (Chu, Meyer, Foulkes and Kita 2013) provided data from an analysis of individual differences in gesture use demonstrating that poorer visual/spatial working memory is correlated with a higher frequency of representational gestures. However, despite this evidence, Hostetter and Alibali (Hostetter, Alibali 2007) report findings suggesting that speakers who have stronger visual-spatial skills than verbal skills produce higher rates of gestures than other speakers. A follow-up study demonstrated that speakers with high spatial skills also produced a higher proportion of non-redundant gestures than other speakers, whereas verbal-dominant speakers tended to produce such gestures more in case of speech disfluencies (Hostetter, Alibali 2011). Taken together this suggests that non-redundant gesturespeech combinations are the result of speakers having both strong spatial knowledge and weak verbal knowledge simultaneously, and avoiding the effort of transforming the one into the other. In the literature, different models of speech and gesture production have been proposed. One major distinguishing feature is the point where in the production process cross-modal coordination can take place. The Growth Point Theory (McNeill, Duncan 2000) assumes that gestures arise from idea units combining imagery and categorical content. Assuming that gestures are generated ‘‘pre-linguistically’’,Krauss, Chen and Gottesman (2000) hold that the readily planned and executed gesture facilitates lexical retrieval through crossmodal priming. De Ruiter (2000) proposed that speech-gesture coordination arises from a multimodal conceptualization process that
Cogn Process (2014) 15 (Suppl 1):S1–S158 selects the information to be expressed in each modality and assigns a ¨ zyu¨rek (2003) agree that gesperspective for the expression. Kita, O ture and speech are two separate systems interacting during the conceptualization stage. Based on crosslinguistic evidence, their account holds that language shapes iconic gestures such that the content of a gesture is determined by bidirectional interactions between speech and gesture production processes at the level of conceptualization, i.e. the organization of meaning. Finally, Hostetter, Alibali (2008) proposed the Gestures as Simulated Action framework that emphasizes how gestures may arise from an interplay of mental imagery, embodied simulations, and language production. According to this view, language production evokes enactive mental representations which give rise to motor activation. Inspite of a consistent theoretical picture starting to emerge, many questions about the detailed mechanisms remain open. A promising approach to explicate and test hypotheses are cognitive models that allow for computational simulation. However, such modeling attempts for the production of speech and gestures are almost inexistent. Only Breslow, Harrison and Trafton (2010) proposed an integrated production model based on the cognitive architecture ACT-R (Anderson, Bothell, Byrne, Lebiere and Qin 2004). This model, however, has difficulties to explain gestures that clearly complement or supplement verbally encoded meaning. A Cognitive Model of Semantic Coordination In recent and ongoing work we develop a model for multimodal conceptualization that accounts for the range of semantic coordination we see in real-life speech-gesture combinations. This account is embedded into a larger production model that comprises three stages: (1) conceptualization, where a message generator and an image generator work together to select and organize information to be encoded in speech and gesture, respectively; (2) formulation, where a speech formulator and a gesture formulator determine appropriate verbal and gestural forms for this; (3) motor control and articulation
S85 to finally execute the behaviors. Motor control, articulation, and formulation have been subject of earlier work (Bergmann, Kopp 2009). In the following we provide a sketch of the model, details can be found in (Kopp, Bergmann and Kahl 2013; Bergmann, Kahl and Kopp 2013). Multimodal Memory The central component in our model is a multimodal memory which is accessible by modules of all processing stages. We assume that language production requires a preverbal message to be formulated in a symbolic-propositional representation that is linguistically shaped (Levelt 1989) (SPR, henceforth). During conceptualization the SPR, e.g., a function-argument structure denoting a spatial property of an object, needs to be extracted from visuo-spatial representations (VSR), i.e., the mental image of this object. We assume this process to involve the invocation and instantiation of memorized supramodal concepts (SMC, henceforth), e.g. the concept ‘round’ which links the corresponding visuo-spatial properties to a corresponding propositional denotation. Figure 1 illustrates the overall relation of these tripartite multimodal memory structures. To realize the VSR and part of the SMC, we employ a model of visuo-spatial imagery called Imagistic Description Trees (IDT) (Sowa, Kopp 2003). The IDT model unifies models from (Marr, Nishihara 1978; Biederman 1987; Lang 1989) and was designed, based on empirical data, to cover the meaningful visuo-spatial features in shape-depicting iconic gestures. Each node in an IDT contains an imagistic description which holds a schema representing the shape of an object or object part. Important aspects include (1) a tree structure for shape decomposition, with abstracted object schemas as nodes, (2) extents in different dimensions as an approximation of shape, and (3) the possibility of dimensional information to be underspecified. The latter occurs, e.g., when the axes of an object schema cover less than the three dimensions of space or when an exact dimensional extent is left open but only a coarse relation
Fig. 1 Overall production architecture
123
S86 between axes like ‘‘dominates’’ is given. This allows to represent the visuo-spatial properties of SMCs such as ‘round’, ‘left-of’ or ‘longish’. Applying SMC to VSR is realized through graph unification and similarity matching between object schemas, yielding similarity values that assess how well a certain SMC applies to a particular visuospatially represented entity (cf. Fig. 1). SPR are implemented straight forward as predicate-argument sentences. Overall production process Figure 1 shows an outline of the overall production architecture. Conceptualization consists of cognitive processes that operate upon the abovementioned memory structures to create a, more or less coherent, multimodal message. These processes are constrained by principles of memory retrieval, which we assume can be modeled by principles of activation spreading (Collins, Loftus 1975). As in cognitive architectures like ACT-R (Anderson et al. 2004), activations oat dynamically, spread across linked entities (in particular via SMCs), and decay over time. Activation of more complex SMCs are assumed to decay more slowly than activation in lower VSR or SPR. Production starts with the message generator and image generator inducing local activations of modal entries, evoked by a communicative goal. VSRs that are sufficiently activated invoke matching SMCs, leading to an instantiation of SPRs representing the corresponding visuo-spatial knowledge in linguistically shaped ways. The generators independently select modal entries and pass them on to the formulators. As in ACT-R, highly activated features or concepts are more likely to be retrieved and thus to be encoded. Note that, as activation is dynamic, feature selection depends on the time of retrieval and thus available resources. The message generator has to map activated concepts in SPR onto grammatically determined categorical structures, anticipating what the speech formulator is able to process (cf. Levelt 1989). Importantly, interaction between generators and formulators in each modality can run top-down and bottom-up For example, a proposition being encoded by the speech formulator results in reinforced activation of the concept in SPR, and thus increased activation of associated concepts in VSR. In result, semantic coordination emerges from the local choices generators and formulators take, based on the activation dynamics in multimodally linked memory representations. Redundant speech and gesture result from focused activation of supramodally linked mental representations, whereas non-redundant speech and gesture arise when activations scatter over entries not connected via SMCs. Results and outlook To quantify our modeling results we ran simulation experiments in which we manipulated the available time (in terms of memory update cycles) before the model had to come up with a sentence and a gesture (Kopp et al. 2013; Bergmann et al. 2013). We analyzed the resulting multimodal utterances with respect to semantic coordination: Supplementary (i.e., non-redundant) gestures were dominant in those runs with stricter temporal limitations, while redundant ones become more likely when time available is increased. The model, thus, offers a natural account for the empirical finding that non-redundant gestures are more likely when conceptualization load is high, based on the assumption that memory-based cross-modal coordination consumes resources (memory, time), and is reduced or compromised when such resources are limited. To enable a direct evaluation of our simulation results in comparison with empirical data, we currently conduct experiments to set up a reference data corpus. In this study, participants are engaged in a dyadic description task and we manipulate the preparation time available for utterance planning. The verbal output will subsequently be analyzed with respect to semantic coordination of speech and gestures based on a semantic feature coding approach as already applied in (Bergmann, Kopp 2006). In ongoing work we extend the model to also account for complementary speech-gesture ensembles in which deictic expressions in speech refer to their cospeech gesture as in ‘‘the window looks like
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 this’’. To this end, we advance and refine the feedback signals provided by the behavior generators to allow for the fine-grained coordination as it is necessary for the production of this kind of utterances. With this extension the model will allow to further investigate predictions as postulated in the lexical retrieval hypothesis (Krauss, Chen and Chawla 1996; Rauscher, Krauss and Chen 1996; Krauss et al. 2000). Although that model was set up on the basis of empirical data, it was subject to much criticism based on psycholinguistic experiments and data. Data from detailed simulation experiments based on our cognitive model can provide further arguments in this debate.
References Anderson J, Bothell D, Byrne M, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):1036–1060 Bavelas J, Kenwood C, Johnson T, Philips B (2002) An experimental study of when and how speakers use gestures to communicate. Gesture 2(1):1–17 Bergmann K, Kahl S, Kopp S (2013) Modeling the semantic coordination of speech and gesture under cognitive and linguistic constraints. In Aylett R, Krenn B, Pelachaud C, Shimodaira H (eds) Proceedings of the 13th international conference on intelligent virtual agents. Springer, Berlin, pp 203–216 Bergmann K, Kopp S (2006) Verbal or visual: how information is distributed across speech and gesture in spatial dialog. In: Proceedings of SemDial2006, pp 90–97 Bergmann K, Kopp S (2009) GNetIc Using Bayesian decision networks for iconic gesture generation. In: Proceedings of IVA 2009. Springer, Berlin, pp 76–89 Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94:115–147 Breslow L, Harrison A, Trafton J (2010) Linguistic spatial gestures. In: Proceedings of cognitive modeling 2010, pp 13–18 Chu M, Meyer AS, Foulkes L, Kita S (2013) Individual differences in frequency and saliency of speech-accompanying gestures: the role of cognitive abilities and empathy. J Exp Psychol Gen 143(2):694–709 Collins AM, Loftus EF (1975) A spreading-activation theory of semantic processing. Psychol Rev 82(6):407–428 de Ruiter J (2000) The production of gesture and speech. In McNeill D (ed) Language and gesture. Cambridge University Press, Cambridge, pp 284–311 Hostetter A, Alibali M (2007) Raise your hand if you’re spatial relations between verbal and spatial skills and gesture production. Gesture 7:73–95 Hostetter A, Alibali M (2008) Visible embodiment: gestures as simulated action. Psychon Bull Rev 15/3:495–514 Hostetter A, Alibali M (2011) Cognitive skills and gesture-speech redundancy. Gesture 11(1):40–60 Kita S, Davies TS (2009) Competing conceptual representations trigger cospeech representational gestures. Lang Cogn Process 24(5):761–775 ¨ zyu¨rek A (2003) What does cross-linguistic variation in Kita S, O semantic coordination of speech and gesture reveal? Evidence for an interface representation of spatial thinking and speaking. J Memory Lang 48:16–32 Kopp S, Bergmann K, Kahl S (2013) A spreading-activation model of the semantic coordination of speech and gesture. In: Proceedings of the 35th annual conference of the cognitive science society (cogsci 2013). Cognitive Science Society, Austin, pp 823–828 Krauss R, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: what do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389–450 Krauss R, Chen Y, Gottesman R (2000) Lexical gestures and lexical access: a process model. In McNeill D (ed) Language and gesture. Cambridge University Press, Cambridge, pp 261–283
Cogn Process (2014) 15 (Suppl 1):S1–S158 Lang E (1989) The semantics of dimensional designation of spatial objects. In Bierwisch M, Lang E (eds) Dimensional adjectives: grammatical structure and conceptual interpretation. Springer, Berlin, pp 263–417 Levelt WJM (1989) Speaking: from intention to articulation. MIT Press Marr D, Nishihara H (1978) Representation and recognition of the spatial organization of three-dimensional shapes. In: Proceedings of the royal society of London, vol 200, pp 269–294 McNeill D, Duncan S (2000) Growth points in thinking-for-speaking. In: Language and gesture. Cambridge University Press, Cambridge, pp 141–161 Melinger A, Kita S (2007) Conceptualisation load triggers gesture production. Lang Cogn Process 22 (4):473–500 Rauscher F, Krauss R, Chen Y (1996) Gesture, speech, and lexical access: the role of lexical movements in speech production. Psychol Sci 7:226–231 Sowa T, Kopp S (2003) A cognitive model for the representation and processing of shape-related gestures. In: Procedings of European cognitive science conference
Towards formally well-founded heuristics in cognitive AI systems Tarek R. Besold Institute of Cognitive Science, University of Osnabru¨ck, Germany Abstract We report on work towards the development of a framework for the application of formal methods of analysis to cognitive systems and computational models (putting special emphasis on aspects concerning the notion of heuristics in cognitive AI) and explain why this requires the development of novel theoretical methods and tools. Keywords Cognitive Systems, Heuristics, Complexity Theory, Approximation Theory Heuristics in Cognitive Systems An ever-growing number of researchers in cognitive science and cognitive psychology, starting in the 1970s with (Kahneman, Slovic and Tversky 1982)’s ‘‘heuristics and biases’’ program and today prominently heralded, for instance, by (Gigerenzer, Hertwig and Pachur 2011), argues that humans in their common sense reasoning do not apply any full-edged form of logical or probabilistic reasoning to possibly highly complex problems, but instead rely on heuristics as (mostly automatic and unconscious) mechanisms that allow them to circumvent the impending complexity explosion and nonetheless reach acceptable solutions to the original problems. All of these mechanisms are commonly subsumed under the general term ‘‘heuristics’’ and, following the paradigmatic example given by (Newell, Simon 1976)’s notion of ‘‘heuristic search’’, under this label are also often (re)implemented in cognitive AI.4 Still, on theoretical grounds, from a computational point of view at least two quite different general types of approach can be imagined: Either the complexity of solving a problem can be reduced by reducing the problem instance under consideration to a simpler (but solution 4
Whilst this type of work clearly has lost some of its popularity over the years, and has been replaced with efforts invested in finding answers to questions where an optimal solution can provably be achieved (although under possibly unrealistic or impractical time and/ or space requirements), the study of heuristics-based approaches and techniques still are a lively field of active research, see, for example, ((Bridewell & Langley, 2011; MacLellan, 2011)).
S87 equivalent) one, or the problem instance stays untouched but—instead of being perfectly (i.e., precisely) solved—is dealt with in a good enough (i.e., approximate) way. Against this background, two crucial question arise: Which problems can actually be solved by applying heuristics? And how can the notion of heuristics be theoretically modeled on a sufficiently high level so as to allow for a general description? In what follows we want to provide a sketch of work towards an approach to answering these questions using techniques originating from complexity theory and hardness of approximation analysis. This choice of formal methods is justified by the observation that, although computational in nature, systems as developed in cognitive AI and cognitive systems research can be considered as physical systems which need to perform their tasks in limited time and with a limited amount of space at their disposal and thus formal computational properties (and restrictions on these) are relevant parameters. Two and a Half Formal Perspectives on Heuristics in Cognitive Systems Returning to the two different types of heuristics identified above and having a look at recent work in complexity and approximation theory, we find a natural correspondence between the outlined conceptual approaches and well-known concepts from the respective fields. The Reduction Perspective: Over the last years, complexity theory has turned its attention more and more towards examples of problems which have algorithms that have worst-case exponential behavior, but tend to work quite well in practice if certain parameters of the problem are restricted. This has led to the introduction of the class of fixed-parameter tractable problems FPT (see, e.g., (Downey, Fellows 1999)): Definition 1 (FPT) A problem P is in FPT if P admits an O(f(j)nc) algorithm, where n is the input size, j is a parameter of the input constrained to be ‘‘small’’, c is an independent constant, and f is some computable function. A non-trivial corollary can be derived from FPT-membership: Any instance of a problem in FPT can be reduced to a problem kernel. Definition 2 (Kernelization) Let P be a parameterized problem. A kernelization of P is an algorithm which takes an instance x of P with parameter j and maps it in polynomial time to an instance y such that x 2 P, if and only if y 2 P, and the size of y is bounded by f(j) (f a computable function). Theorem 1 (Kernelizability (Downey, Fellows 1999)) A problem P is in FPT if and only if it is kernelizable. This essentially entails that, if a positive FPT result can be obtained, then (and only then) there is a ‘‘downward reduction’’ for the underlying problem to some sort of smaller or less-complex instance of the same problem, which can then be solved. Returning to the initial quest for finding a formal characterization of reduction-based heuristics, we notice that, by categorizing problems according to kernelizability we can establish a distinction between problem classes which are solvable by the presented type of reduction and those which are not—and can thus already a priori decide whether a system implementing a mechanism based on a kernelization account generally is (un)able to solve a certain class. What remains to be shown is the connection between kernelization and the notion of reduction-based heuristics (or rather the suitability of kernelization as conceptual characterization of the notion of reduction in the examined type of heuristics). The connection is explicated by the correspondence between FPTmembership and kernelizability of a problem: If heuristics are to be as fast and frugal as commonly claimed, considering them anything but (at worst) polynomial-time bounded processes seems questionable. But now, if the reduced problem shall be solvable under resource-critical conditions, using the line of argument introduced above, we can just hope for it to be in FPT. Finally, combining the FPT-membership of the reduced problem with the polynomial-time complexity of the reduction process (i.e., the presumed heuristics), already the original problem had
123
S88 to be fixed-parameter tractable. This should not come as a surprise, as the contrary (i.e., a heuristics reducing the overall complexity of solving a superpolynomial problem to polynomial-time computation by means of a reduction of the original problem within the same class) would contradict the class membership of the original problem and thus break the class hierarchy (assuming P = NP). Still, kernelization-based heuristics are not trivialized by these considerations: Although original and reduced problem are in FPT, the respective size of the parameters may still differ between instances (making an important difference in application scenarios for implemented systems). The Approximation Perspective: The second perspective on heuristics uses approximation algorithms: Instead of precisely solving a kernel as proposed by reduction-based heuristics, we try to compute an approximate solution to the original problem (i.e., the solution to a relaxed problem). The idea is not any more to perfectly solve the problem (or an equivalent instance of the same class), but to instead solve the problem to some ‘‘satisfactory degree’’. A possible analog to FPT in the Tractable AGI thesis is APX, the class of problems allowing polynomial-time approximation algorithms: Definition 3 (APX) An optimization problem P is in APX if P admits a constant factor approximation algorithm, i.e., there is a constant factor e [ 0 and an algorithm which takes an instance of P of size n and, in time polynomial in n, produces a solution that is within a factor 1 þ e of being optimal (or 1 e for maximization problems). This notion in practice crucially depends on the bounding constant for the approximation ratio: If the former is meaningfully chosen with respect to the problem, constant-factor approximation allows for quantifying the ‘‘good enough’’ aspect of the solution and, thus, might even offer a way of modeling the notion of ‘‘satisficing’’ introduced by (Simon 1956) (which in turn is central to many heuristics considered in cognitive science and psychology, providing additional empirical grounding for the computational systems in cognitive AI). Joining Perspectives: What if the system architect, instead of deciding whether to solve a certain type of task applying one of the two types of heuristic and then conducting the respective analysis, just wants to directly check whether the problem at hand might be solvable by any of the two paradigms? Luckily, FPT and APX can be integrated via the concept of fixedparameter approximability and the corresponding problem class FPA: Definition 4 (FPA) The fixed-parameter version P of a minimization problem is in FPA if—for a recursive function f, a constant k, and some fixed recursive function g—there exists an algorithm such that for any given problem instance I with parameter k, and question OPT(I) B k, the algorithm which runs in O(f(k)nc) (where n = |I|) either outputs ‘‘no’’ or produces a solution of cost at most g(k). As shown by (Cai, Huang 2006), both polynomial-time approximability and fixed-parameter tractability with witness (Cai, Chen 1997) independently imply the more general fixed-parameter approximability. And also on interpretation level FPA artlessly combines both views of heuristics, at a time in its approximability character accommodating for the notion of satisficing and in its fixed-parameter character accounting for the possibility of complexity reduction by kernelizing whilst keeping key parameters of the problem fixed. The Wrong Type of Approximation? Approximation-based heuristics have been introduced as solution procedures for a problem producing solutions which are not optimal but (at least when using a standard like the proposed APX) fall within a certain defined neighborhood of the optimal one. Here, the degree of optimality of a solution is measured in terms of proximity of the solution’s value to the optimal value for the optimization problem at hand. But this is not the only possible way of conceptualizing approximation: What if emphasis would be put on finding a solution which is structurally as similar as possible to the original one—so what if the quality of approximation would be measured in similarity of structure instead of proximity of values?
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 At first sight this seems to be either a trivial issue, or not an issue at all, depending on whether it is assumed that value similarity and structural similarity coincide, or it is decided that structure is not of interest. Still, we believe that dismissing the issue this easily would be ill-advised: Especially in the context of cognitive systems and highlevel AI in many cases the structure of a problem’s solution can be of great relevance. As an example consider a cognitive system built for maintaining a network of maximally coherent beliefs about complex domains as, e.g., presented by (Thagard 2000). Whilst, for instance, (Millgram 2000) has shown that the required form of maximal coherence over this type of network in its full form is NP-hard, (Thagard, Verbeurght 1998) proposed several (valuebased) approximation algorithms. Still, a mere value-based approximation scheme does not yield the desired results: As also demonstrated by (Millgram 2000), two belief assignments can be arbitrarily close in coherence value and at the same time still arbitrarily far from each other in terms of which beliefs are accepted and which are rejected. Unfortunately, whilst our knowledge and command of value-based approximation has greatly developed over the last decades, structurebased approximation has rarely been studied. (Hamilton, Mu¨ller, van Rooij and Wareham 2007) present initial ideas and define basic notions possibly forming the foundations of a formal framework for structure-based approximation. And although these are still only very first steps towards a complete and well-studied theory, the presented concepts already allow for several important observations. The most relevant for the introduced cognitive systems setting is the following: Value approximation and structural approximation are distinct in general—and whilst very careful use of the tools of value-based approximation might partially mitigate this divergence (the most naive ad-hoc remedy being the use of problem-specific and highly non-generalizable optimization functions which also take into account some basic form of structural similarity and not only outcome values of solutions) it cannot be assumed in general that both notions coincide in a meaningful way. Future Work In the long run we therefore want to develop the presented roots into an overall framework addressing empirically-inspired aspects of cognitive system in general. Also, in parallel to the corresponding theoretical work, we want to put emphasis on showing the usefulness and applicability of the proposed methods in different prototypical examples from relevant fields (such as, for example, models of epistemic reasoning and interaction, cognitive systems in general problem-solving, or models for particular cognitive capacities), allowing for a mutually informed development process between foundational theoretical work and application studies. Acknowledgments I owe an ever-growing debt of gratitude to Robert Robere (University of Toronto) for introducing me to the fields of parameterized complexity theory and approximation theory, reliably providing me with theoretical/technical backup and serving as a willing partner for feedback and discussion. References Bridewell W, Langley P (2011) A computational account of everyday abductive inference. In: Proceedings ofthe 33rd annual meeting of the cognitive science society, pp 2289–2294 Cai L, Chen J (1997) On fixed-parameter tractability and approximability of fNPg optimization problems. J Comput Syst Sci 54(3):465–474 Cai L, Huang X (2006) Fixed-parameter approximation: conceptual framework and approximability results. In Bodlaender H, Langston M (eds) Parameterized and exact computation. Springer, pp 96–108 Downey RG, Fellows MR (1999) Parameterized complexity. Springer Gigerenzer G, Hertwig R, Pachur T (eds) (2011) Heuristics: the foundation of adaptive behavior. Oxford University Press
Cogn Process (2014) 15 (Suppl 1):S1–S158 Hamilton M, Mu¨ller M, van Rooij I, Wareham T (2007) Approximating solution structure. In: Dagstuhl seminar proceedings Nr. 07281. IBFI, Schloss Dagstuhl Kahneman D, Slovic P, Tversky A (1982) Judgment under uncertainty: Heuristics and Biases. Cambridge University Press MacLellan C (2011) An elaboration account of insight. In: AAAI fall symposium: advances in cognitive systems Millgram E (2000) Coherence: the price of the ticket. J Philos 97:82–93 Newell A, Simon HA (1976) Computer science as empirical inquiry: symbols and search. Commun ACM 19(3):113–126 Simon HA (1956) Rational choice and the structure of the environment. Psychol Rev 63:129–138 Thagard P (2000) Coherence in thought and action. The MIT Press Thagard P, Verbeurght K (1998) Coherence as constraint satisfaction. Cogn Sci 22:1–24
Action planning is based on musical syntax in expert pianists. ERP evidence Roberta Bianco1, Giacomo Novembre2, Peter Keller2, Angela Friederici1, Arno Villringer1, Daniela Sammler1 1 Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany; 2 MARCS Institute, University of Western Sydney, Australia Action planning of temporally ordered elements within a coherent structure is a key element in communication. The specifically human ability of the brain to variably combine discrete meaningful units into rule-based hierarchical structures is what is referred to as ‘‘syntactic processing’’ and has been defined as core aspect of language and communication (Friederici 2011; Hauser et al. 2002; Lashley 1952). While similarities in the syntactic organization of language and Western tonal music have been increasingly consolidated (Katz, Pesetsky 2011; Patel 2003; Rohrmeier, Koelsch 2012), analogies with the domain of action, in terms of hierarchical and combinatorial organization (Fitch, Martins 2014; Pastra, Aloimonos 2012; Pulvermu¨ller 2014), remain conceptually controversial.(Moro 2014). To investigate the syntax of actions, piano performance based on the tonal music is an ideal substrate. First, playing chord progressions is the direct motoric translation of musical syntax, a theoretically established hierarchical system of rules governing music structure (Rohrmeier 2011). Second, it gives the possibility to investigate action planning at different levels of action hierarchy, from lower immediate levels of movement selection to higher levels of distal goals (Grafton, Hamilton 2007; Haggard 2008; Uithol et al. 2012). Finally, it offers the perspective to investigate the influence of expertise on the relative weighting of different action features (i.e., goal and manner) in motor programming (Palmer, Meyer 2000; Wohlschla¨ger et al. 2003). Novembre, Keller (2011) and Sammler et al. (2013) have shown that expert pianists—during intense practice—might motorically learn syntactic regularities governing musical sequences and therefore generate motor predictions based on their acquired long-term syntactic knowledge. In a priming paradigm, pianists were asked to imitate on a mute piano silent videos of a hand playing chord sequences. The last chord was either syntactically congruent or incongruent with the preceding musical context. Despite the absence of sounds, the authors found slower imitation times of syntactically incongruent chords as well as motor facilitation (i.e. faster responses) of the syntactically congruent chords. In the ERPs (Sammler et al. 2013), the imitation of the incongruent chord elicited a late posterior negativity, index of reprogramming of an anticipated motor act (Leuthold, Jentzsch 2002) primed by the syntactic structure of the
S89 musical sequence (i.e. the musical goal). In line with models of incremental planning of serial actions (Palmer, Pfordresher 2003), these findings suggest that the notion of syntax translates to a ‘‘grammar of musical action’’ in expert pianists. According to the notion of goal priority over the means in action hierarchy (Bekkering et al. 2000; Grafton 2009; Wohlschla¨ger et al. 2003), in musical motor acts the musical goal determined by the context (Syntax) should take priority over the specific movement selection adopted for the execution (Manner), especially at advanced skill levels (Novembre, Keller 2011; Palmer, Meyer 2000). However, through intensive musical training, frequently occurring musical patterns (i.e., scales, chord progressions) may have codified for some fixed matching fingering configuration (Gellrich, Parncutt 2008; Sloboda et al. 1998). Thus, from this perspective, it may also be that motor pattern familiarity has a role in motor predictions during the execution of common chord progressions. To what extent motor predictive mechanisms operate at the level of musical syntax or arise due to motor pattern familiarity will be addressed here. Whether a progressively more syntax-based motor control independent of the manner correlates with expertise will be also discussed. To this end, we asked pianists to watch and simultaneously execute on a mute piano chord progressions played by a performing pianist’s hand presented in a series of pictures on a screen. To negate exogenously driven auditory predictive processes, no sound was used. To explore the effect of expertise on syntax-based predictions, pianists ranging from 12 to 27 years of experience were tested behaviorally and with electroencephalography (EEG). To induce different strengths of predictions, we used 5-chord or 2-chord sequences (long/short Context) presenting the target chord in the last position. In a 2 x 2 factorial design, we manipulated the target chord of the sequences in terms of keys (Syntax congruent/incongruent), to violate the syntactic structure of the sequence, and in terms of fingering (Manner correct/incorrect), to violate the motor familiarity. Crucially, the manipulation of the manner, while keeping the syntax congruent, allowed us to dissociate behavioral and neural patterns elicited by the execution of either the violation of the syntactic structure of the sequence (Syntax) or of a general violation of familiar movements (Manner). Additionally, the 2 x 2 factorial design permitted us to investigate syntax-related mechanisms on top of the concurrent manner violation in order to test whether in motor programming high levels of syntactic operations are prioritized over mechanisms of movement parameter specification. We hypothesized that, if motor predictions, during execution of musical chords sequences, are driven by musical syntax rather than motor pattern familiarity, then the violation of the Syntax should evoke specific behavioral and electrophysiological patterns, different from those related to the Manner. Also, we expected to observe syntax-based prediction effects irrespectively of the fingering used to play, thus even in presence of the concurrent manner violation. Finally, if at advanced skill levels the more abstract musical motor goals increase weighting in motor programming, we expected to observe a positive dependency between the strength of syntax-based prediction and expertise levels. We found that the production of syntactically incongruent compared to the congruent chords showed a response delay that was larger in the long compared to the short context and that was accompanied by the presence of a central posterior negativity (520–800 ms) in the long and not in the short context. Conversely, the execution of the unconventional manner was not delayed as a function of Context, and elicited an opposite electrophysiological pattern (a posterior positivity between 520 and 800 ms). Hence, while the effects associated to the Syntax might reflect a signal of movement reprogramming of a prepotent response in face of the incongruity to be executed (Leuthold, Jentzsch 2002; Sammler et al. 2013), the effects associated with the Manner were stimulus- rather than response-related and might reflect the perceptual surprise
123
S90 (Polich 2007) of the salient fingering manipulation, recognized by the pianists as obvious target manipulation. Finally, syntax-related effects held when only considering the manner incorrect trials, and their context dependency was sharper with increasing expertise level (computed as cumulated training hours across all years of piano playing). This suggests that syntactic mechanisms take priority over movements’ specifications, especially in more expert pianists being more affected by the priming effect of the contextual syntactic structure. Taken together, these findings indicate that, given a contextual musical structure, motor plans for distal musical goal are generated coherently with the context and forehead those ones underlying specific, immediate movement selection. Moreover, the increase of syntax-based motor control with expertise might hint at the action planning based on musical syntax as a slowly acquired skill built on top of the acquisition of motor flexibility. More generally, this finding indicates that, similarly to music perception, music production too relies on generative syntactic rules. References Bekkering H, Wohlschla¨ger A, Gattis M (2000) Imitation of gestures in children is goal-directed. Quart J Exp Psychol Human Exp Psychol 53(1):153–64. doi:10.1080/713755872 Fitch WT, Martins MD (2014) Hierarchical processing in music, language, and action: Lashley revisited. Ann N Y Acad Sci 1–18. doi:10.1111/nyas.12406 Friederici AD (2011) The brain basis of language processing: from structure to function. Physiol Rev 91(4):1357–1392. doi:10.1152/ physrev.00006.2011 Gellrich M, Parncutt R (2008) Piano technique and fingering in the eighteenth and nineteenth centuries: bringing a forgotten method back to life. Br J Music Educ 15(01):5–23. doi:10.1017/S0265051 700003739 Grafton ST, Hamilton AFDC (2007) Evidence for a distributed hierarchy of action representation in the brain. Human Movement Sci 26(4):590–616. doi:10.1016/j.humov.2007.05.009 Haggard P (2008) Human volition: towards a neuroscience of will. Nature Rev Neurosci 9(12): 934–46. doi:10.1038/nrn2497 Hauser MD, Chomsky N, Fitch WT (2002) The faculty of language: what is it, who has it, and how did it evolve? Science (New York, N.Y.) 298(5598):1569–1579. doi:10.1126/science.298.5598.1569 Katz J, Jean I, Paris N, Pesetsky D (2011) The Identity Thesis for Language and Music (January) Lashley K (1952) The problem of serial order in behavior. In: Jeffress LA (ed) Cerebral mechanisms in behavior. Wiley, New York, pp 112–131 Leuthold H, Jentzsch I (2002) Spatiotemporal source localisation reveals involvement of medial premotor areas in movement reprogramming. Exp Brain Res. Experimentelle Hirnforschung. Expe´rimentation Ce´re´brale 144(2):178–88. doi:10.1007/s00221002-1043-7 Moro A (2014) On the similarity between syntax and actions. Trend Cogn Sci 18(3):109–10. doi:10.1016/j.tics.2013.11.006 Novembre G, Keller PE (2011) A grammar of action generates predictions in skilled musicians. Conscious Cogn 20(4):1232–1243. doi:10.1016/j.concog.2011.03.009 Palmer C, Meyer RK (2000) Conceptual and motor learning in music performance. Psychol Sci 11(1):63–68. Retrieved from http://www. ncbi.nlm.nih.gov/pubmed/11228845 Palmer C, Pfordresher PQ (2003) Incremental planning in sequence production. Psychol Rev 110(4):683–712. doi:10.1037/0033295X.110.4.683 Pastra K, Aloimonos Y (2012) The minimalist grammar of action. Philos Trans R Soc Lond Ser B Biol Sci 367(1585):103–117. doi: 10.1098/rstb.2011.0123
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Polich J (2007) Updating P300: an integrative theory of P3a and P3b. Clin Neurophysiol: Off J Int Feder Clin Neurophysiol 118(10):2128–2148. doi:10.1016/j.clinph.2007.04.019 Pulvermu¨ller F (2014) The syntax of action. Trend Cogn Sci 18(5):219–220. doi:10.1016/j.tics.2014.01.001 Rohrmeier M (2011) Towards a generative syntax of tonal harmony. J Math Music 5(1):35–53. doi:10.1080/17459737.2011.573676 Rohrmeier M, Koelsch S (2012) Predictive information processing in music cognition. A critical review. Int J Psychophysiol Off J Int Organ Psychophysiol 83(2):164–175. doi:10.1016/j.ijpsycho. 2011.12.010 Sammler D, Novembre G, Koelsch S, Keller PE (2013) Syntax in a pianist’s hand: ERP signatures of ‘‘embodied’’ syntax processing in music. Cortex J Devoted Study o Nervous Syst Behav 49(5):1325–1339. doi:10.1016/j.cortex.2012.06.007 Sloboda JA, Clarke EF, Parncutt R, Raekallio M (1998) Determinants of finger choice in piano sight-reading. J Exp Psychol Human Percept Performance 24(1):185–203. doi:10.1037//0096-1523.24.1.185 Uithol S, van Rooij I, Bekkering H, Haselager P (2012) Hierarchies in action and motor control. J Cogn Neurosci 24(5):1077–1086. doi: 10.1162/jocn_a_00204 Wohlschla¨ger A, Gattis M, Bekkering H (2003) Action generation and action perception in imitation: an instance of the ideomotor principle. Philos Trans R Soc Lond Ser B Biol Sci 358(1431): 501–515. doi:10.1098/rstb.2002.1257
Motor learning in dance using different modalities: visual vs. verbal models Bettina Bla¨sing1, Jenny Coogan2, Jose´ Biondi2, Liane Simmel3, Thomas Schack1 1 Neurocognition and Action Research Group & Center of Excellence Cognitive Interaction Technology (CITEC), Bielefeld University, Germany; 2 Palucca Hochschule fu¨r Tanz Dresden, Germany; 3 tamed Tanzmedizin Deutschland e.V., Fit for Dance Praxis und Institut fu¨r Tanzmedizin, Mu¨nchen, Germany Keywords Motor learning, Observation, Visual model, Verbal instruction, Dance Introduction Observational learning is viewed as the major mode of motor learning (Hodges et al. 2007). Empirical evidence shows that observational learning primarily takes place in an implicit way, by activating shared neural correlates of movement execution, observation and simulation (Jeannerod 2004; Cross et al. 2006, 2009). It has been shown that the use of language (in terms of verbal cues) can facilitate or enhance motor learning by guiding attention towards relevant features of the movement and making these aspects explicit (see Wulf and Prinz 2001). In dance training (and other movement disciplines), observational learning from a visual model is most commonly applied, and is often supported by verbal cue-giving. Evidence from practice suggests that explicit verbal instructions and movement descriptions play a major role in movement learning by supporting the understanding, internalizing and simulating of movement phrases. In modern and contemporary dance, however, choreographers often do not expect the dancers to simply reproduce movement phrases in adequate form, but to develop movement material on their own, in accordance with a given idea, description or instruction, aiming at a more personal expression and higher artistic quality of the developed movement material. In this study, we investigate dancers’ learning of movement phrases based on the exclusive and complementary use of visual model observation and verbal instruction (movement description).
Cogn Process (2014) 15 (Suppl 1):S1–S158 Dance students learned comparable movement material via two different modes: via observation of a model and via listening to a verbal movement description (as example, a part of a model sequence is displayed in Fig. 1). In a second step, the complementary mode was added. After both learning steps, the students’ performance of the learned movement phrases was recorded and rated by independent experts. A retention test was applied to evaluate long-term effects of the learning processes. We expected the dance students to learn successfully from the visual model, their most commonly practiced mode of movement learning. From the verbal instruction, we expected that performed movement phrases would vary more strongly, but could possibly be performed with more artistic quality. We also expected performance after the second learning step to be improved compared to the first learning step in both conditions. Method Learning task: Eighteen students (age: 18.4 ± 1.0 years, 11 female) from the BA Dance study program at the Palucca Hochschule fu¨r Tanz Dresden learned two dance phrases of similar length (approx. 30 s) and complexity, one via visual observation of a demonstration video, the other one via a recorded verbal description (see Fig. 1). In a first learning step (Step 1), one of the dance phrases was presented five times either visually (video) or verbally (audio), and the participant was instructed to learn it by watching or listening, and by marking movements as required. After a short practice, the participant performed the learned dance phrase while being recorded on video. In a second learning step (Step 2), the participant was twice presented the same dance phrase in the complementary presentation mode (i.e., video for the verbally learned phrase and vice versa), and the performance was recorded again. The other dance phrase was then learned and performed using the same procedure, but was presented in
Fig. 1 Images illustrating approximately two-thirds of Phrase 1, choreographed by Jenny Coogan and performed by Robin Jung. The phrase was presented as video of 26 s and as audio recording of a verbal description (speaker: Alex Simkins). Phrase 2, choreographed by Jose´ Biondi, was of similar length and complexity and contained similar movement elements as Phrase 1, and was performed and spoken by the same dancer and speaker in the video and audio recording, respectively. The verbal description of the dance sequence shown in the pictures reads as follows: ‘‘Stand facing the front left diagonal of the room in first position. At the same time extend your left leg forward and your two arms sideways to the horizontal. Allow your right hand to continue moving until it arrives to a high diagonal. Gradually let the shape melt back into its beginning position as you shift your weight into the right hip, bending both knees, sinking your head to the left to make a big C-curve. Continue into falling, then catch the weight with a step of the left leg crossing to the right. Follow with two steps sideward, in the same direction while throwing both arms in front of your shoulders. Keeping your arms close to you, spiral to the right diagonal, then, kick your right leg, left arm and head forward as you throw your right arm behind you. Bring the energy back into you quickly bending both elbows and the right knee close to the body, spine vertical. Drop your arms and take a step back onto your right leg turning fully around while dragging your left leg behind you. Finish with the weight low, left leg behind, spine rounded forward, arms wrapped around the body, right arm front, left arm back. Stretch your legs and gradually lengthen your spine horizontally. Allow your arms to follow the succession of your spine, right front, left back’’
S91 the remaining learning mode (verbal or visual) in the Step 1, complemented by the other mode in the Step 2. The order of the dance phrases (Phrase 1, Phrase 2) and of the initial leaning modes (visual, verbal) was balanced between the participants (the experimental design of the study is illustrated in Table 1). The experimental procedure took place in a biomechanics laboratory and lasted approximately one hour for each participant. Additional to the evaluation of the recorded performances, questionnaires and psychometric tests were applied to investigate the students’ learning success and their personal impressions of the different learning processes. Expert ratings of the reproduced material: Two independent experts rated the recorded performance trials from the recorded and cut video clips, one of each demonstration condition (visual, visual + verbal, verbal, verbal + visual). The experts rated each of the recorded performances by filling out a questionnaire consisting of six-point Likert-scale type questions assigned to two categories, accordance with the model (AM; 10 questions) and artistic performance quality (PQ, 5 questions). For each category of questions, ratings of the questions were averaged to achieve general measures for the main criteria AM and PQ. Each expert independently watched the recordings from the students’ performances and marked one answer for each question, without knowing about the learning condition of the recorded performance. Non-parametric tests (Wilcoxon signed-rank, Mann–Whitney U) were used to compare the averaged ratings of the two experts for the different conditions (visual, visual + verbal, verbal, verbal + visual) within each criterion (AM, PQ) and for the two criteria within each demonstration condition. Retention test: Thirteen of the dance students (8 female) participated in a retention test that was carried out 10–13 days after the experimental learning task. The retention test included the video-recorded performance of the remembered movement material, psychometric tests and questionnaires. In the performance part of the test, each student was asked to perform both dance phrases as completely as possible. Students were allowed to practice for several minutes before being recorded, but were not given any assistance in reproducing the phrases. Each student was recorded individually and on his/her own in a separate dance studio. The video recordings of the students’ performance in the retention test were annotated for the completeness of the phrases by two annotators. Each phrase was segmented into eleven partial phrase, or elements, of similar content (note that the phrases had been choreographed to resemble each other in complexity, duration and structure). The annotators independently watched the recordings and marked the completeness of each of the eleven elements as value between 0 and 1 (0: the element was not danced at all, or was not recognizable; 1: the element was clearly recognizable and was performed without error); ratings of the two annotators were then averaged. Each student thereby received for each of the two phrases a value between 0 (no partial phrase was reproduced at all) and 11 (all partial phrases were reproduced perfectly). Non-parametric tests (Wilcoxon signed-rank, Mann–Whitney U) were used to compare averaged completeness scores between dance phrases (Phrase 1, Phrase 2) and learning modes (visual first, verbal first). Results Expert ratings: Ratings of the two experts were positively correlated for both criteria, AM (r = 0.528; p \ .001) and PQ (r = 0.513; p \ .001). After Step 1, ratings of PQ were significantly better than ratings for AM (visual: 3.82, 3.33; Z = -2.987, p = .003; verbal: 3.73, 2.69; Z = -3.529, p \ .001), whereas ratings did not differ after Step 2. AM ratings after learning only from verbal description was lower (2.69) than after all other conditions (verbal + visual: 3.48, Z = -3.724, p \ .001; visual: 3.33, Z = -3.624, p \ .001; visual + verbal: 3.65, Z = -3.682, p \ .001), and AM ratings after visual + verbal learning were higher than after visual learning (Z = -2.573, p = .01). PQ ratings did not differ for any of the learning conditions.
123
S92
Cogn Process (2014) 15 (Suppl 1):S1–S158
Table 1 Experimental design of the learning task Learning task
Group 1a N = 4
Group 2a N = 4
Group 2b N = 5
Group 1b N = 5
Pre-test questionnaires Step 1
Phrase 1
Verbal (5x)
Visual (5x)
+Visual (2x)
+Verbal (2x)
Record 1–3x
Record 1–3x
Visual (5x)
Verbal (5x)
Step 2
+Verbal (2x)
Performance
Record 1–3x
Step 2 Performance Step 1
Phrase 2
Phrase 2
Phrase 1
Verbal (5x)
Visual (5x)
+Visual (2x)
+Verbal (2x)
Record 1–3x
Record 1–3x
Visual (5x)
Verbal (5x)
+Visual (2x)
+Verbal (2x)
+Visual (2x)
Record 1–3x
Record 1–3x
Record 1–3x
N=3
N=4
N=4
N=2
Record 1x
Record 1x
Record 1x
Record 1x
Post-test questionnaire, psychometric tests, interview Retention Performance
Phrases 1, 2
Phrases 1, 2
Retention questionnaire, psychometric tests Step 1, 2: successive learning steps; Phrase 1, 2: movement material; visual, verbal: demonstration mode; Performance: video-recorded performance of the learned dance phrase Retention test: Completeness scores given by the two annotators were highly correlated for both sequences (Phrase 1: r = 0.942, p \ .001; Phrase 2: r = 0.930, p \ .001). No differences were found between the groups (Group 1: Phrase 1 verbal first, N = 5; Group 2: Phrase 1 visual first, N = 8) in general, and no differences were found between the two sequences (Phrase 1: 7.64; Phrase 2: 6.90). Scores were better for the first visually learned phrase (8.32) than for the phrase first learned from verbal description (6.23) (Z = -1.992, p = .046). When the sequences were regarded separately, groups differed for Phrase 2 (Group 1: 9.17; Group 2: 5.48), but not for Phrase 1 (Group 1: 7.42; Group 2: 7.78), with Group 1 performing better than Group 2 (Z = -2.196, p = .028) (see Fig. 2). When comparing ratings for the individual elements (1 to 11), primacy effects were found for both dance phrases, in terms of higher scores for the first 3 and 2 parts in Phrase 1 and Phrase 2, respectively (Phrase 1: element 1 differed from 6, 7, 8, 9 and 11; 2 differed from 5, 6, 7, 8; 3 differed from 4, 5, 6, 7, 8, 9 and 10; Phrase 2: 1 differed from 3, 4, 5, 6, 7, 8, 9, 10 and 11; 2 differed from 4, 7, 9 and 10; all p \ =.05). Discussion Interdisciplinary projects linking dance and neurocognitive research have recently come to increasing awareness in artistic and scientific communities (see Bla¨sing et al. 2012; Sevdalis, Keller 2011). The presented project on observational (implicit) and verbal (explicit) movement learning in dance has been developed within
Fig. 2 Left Mean expert ratings of students’ performance for accordance with the model (AM; dark grey columns) and performance quality (PQ; light grey columns) after learning from one (visual, verbal) and two (visual + verbal, verbal + visual) modalities (ratings for both dance phrases are pooled); right completeness scores for students’ performance in the retention test for Phrases 1 and 2; dark grey columns Group 1 (Phrase 1 verbal, verbal + visual; Phrase 2 visual, visual + verbal); light grey columns Group 2 (Phrase 1 visual, visual + verbal; Phrase 2 verbal, verbal + visual)
123
an interdisciplinary network (Dance engaging Science; The Forsythe Company | Motion Bank), motivated by scientific, artistic and (dance-) pedagogical questions. We compared expert ratings for the recorded performance of two different movement phrases in 18 dance students who had learned one phrase initially via verbal description and the other one via observation of a video model. After dancing the phrase and being recorded, students received the complementary modality to learn from, and were recorded performing again. Ratings for performance quality were better than rating for model reproduction after the first learning step (one modality), but not after the second learning step (two modalities). After learning from only one modality, ratings for accordance with the model were better if the first learning modality was visual than verbal, whereas ratings for performance quality did not differ for visual vs. verbal learning. When the students had to reproduce the learned movement material in a retention test, the (initially) visually learned material was reproduced more completely than the verbally learned material, however, when the dance phrases were regarded separately, this result was only significant for one of the phrases. The results corroborate findings regarding observational learning of movements in dance and other disciplines or tasks, but also suggest dissociation between the exact execution of a model phrase and the artistic quality of dance, even in the learning phase. As expected, accordance with the model phrases was stronger after visual learning and after two compared to one modalities (which might as well have been influenced by the additional practice, as this was always the second learning step.) Regarding artistic quality of performance, the students danced the newly learned material after learning from verbal description as well as after learning from visual observation, but not better, as we had expected. Questionnaires and psychometric tests are currently being analyzed to complement the reported findings of this study. We expect the outcomes to contribute to our understanding of explicit and implicit motor learning on the basis of different modalities, and also to yield potential implications for teaching and training in dance-related disciplines. While explicit learning (via verbal instruction) and implicit learning (via observation and practice) have been found to work synergistically in skilled motor action (Taylor and Ivry 2013), the situation might be different for dance and potentially for dance-like movement in general (see Schachner and Carey 2013), in which skilful movement execution largely depends on kinesthetic awareness; further research is needed at this point. Further implications could be derived for
Cogn Process (2014) 15 (Suppl 1):S1–S158
S93
learning in general, specifically regarding the potential benefit of combining different modes (or modalities) for conveying information in order to shape and optimize learning success. References Bla¨sing B, Calvo-Merino B, Cross ES, Jola C, Honisch J, Stevens CJ (2012) Neurocognitive control in dance perception and performance. Acta Psychol 139:300–308 Cross ES, Hamilton AF, Grafton ST (2006) Building a motor simulation de novo:observation of dance by dancers. NeuroImage 31:1257–1267 Cross ES, Kraemer DJ, Hamilton AF, Kelley WM, Grafton ST (2009) Sensitivity of the action observation network to physical and observational learning. Cereb Cortex 19:315–326 Hodges NJ, Williams AM, Hayes SJ, Breslin G (2007) What is modelled during observational learning? J Sport Sci 25:531–545 Jeannerod M (2004) Actions from within. Int J Sport Exercise Psychol 2:376–402 Schachner A, Carey S (2013) Reasoning about ‘irrational’ actions:when intentional movements cannot be explained, the movements themselves are seen as the goal. Cognition 129:309–327 Sevdalis V, Keller PE (2011) Captured by motion:dance, action understanding, and social cognition. Brain Cogn 77:231–236 Wulf G, Prinz W (2001) Directing attention to movement effects enhances learning:a review. Psychon B Rev 8:648–660 Taylor JA, Ivry RB (2013) Implicit and explicit processes in motor learning. Action Sci:63–87
A frontotemporoparietal network common to initiating and responding to joint attention bids Nathan Caruana, Jon Brock, Alexandra Woolgar ARC Centre of Excellence in Cognition and its Disorders, Department of Cognitive Science, Macquarie University, Sydney, Australia Joint attention is the ability to interactively coordinate attention with another person to objects of mutual interest, and is a fundamental component of daily interpersonal relationships and communication. According to the Parallel Distributed Processing model (PDPM; Mundy, Newell 2007), responding to joint attention bids (RJA) is supported by posterior-parietal cortical regions, while initiating joint attention (IJA) involves frontal regions. Although the model emphasizes their functional and developmental divergence, it also suggests that the integration of frontal and posterior-parietal networks is crucial for the emergence of complex joint attention behavior, allowing individuals to represent their own attentional perspective as well as the attentional focus of their social partner in parallel. However, little is known about the neural basis of these parallel joint attention processes, due to a lack of ecologically valid paradigms. In the present study, we used functional magnetic resonance imaging to directly test the claims of the PDPM. Thirteen subjects (9 male, Mage = 24.85, SD = 5.65) were scanned as they engaged with an avatar whom they believed was operated by another person outside the scanner, but was in fact controlled by a gaze-contingent computer algorithm. The task involved catching a burglar who was hiding inside one of six houses displayed on the screen. Each trial began with a ‘search phase’, during which there was a division of labor between the subject and their virtual partner. Subjects were required to search a row of three houses located at either the top or bottom of the screen, whilst the avatar searched the other row. When the subject fixated one of their designated houses, the door opened to reveal an empty house or the burglar (see Fig. 1a). The location of the subject’s designated
Fig. 1 a This is an example of the stimuli used in the social condition (i.e. RJA and IJA). b This is an example of the stimuli used in the control conditions (i.e. RJAc and IJAc) Note that for a and b, the eyeshaped symbol represents the subject’s eye movement resulting in joint attention. This was not part of the stimulus visible to subjects houses was counterbalanced across acquisition runs. Subjects were instructed that whoever found the burglar on each trial had to guide their partner to that location by first establishing mutual gaze and then looking at the appropriate house. On RJA trials, subjects searched their designated houses, each of which would be empty. The avatar would then complete his search and guide the subject to the burglar’s location. Once the subject responded and joint attention was achieved, positive feedback was provided with the burglar appearing behind bars to symbolize that he had been successfully captured. On IJA trials, the subject would find the burglar inside one of their designated houses. Once the avatar had completed his search and mutual gaze was established, the subject was then required to initiate joint attention by saccading towards the correct location. The avatar responded by gazing at the location fixated by the subject, regardless of whether it was correct or not. Again, positive feedback was provided when joint attention was achieved at the burglar’s location. Negative feedback was also provided if the subject failed to make a responsive eye movement within three seconds, or if they responded or initiated by fixating an incorrect location. During the search phase, the avatar’s gaze behavior was controlled so that he only completed his search after the subject completed their search and fixated back on the avatar. This meant that subjects were required to monitor the avatar’s attention during their interaction, before responding to, or initiating a joint attention bid. In this paradigm—as in ecological interactions—establishing mutual gaze was therefore essential in determining whether the avatar was ready to guide the subject, or respond to the subject’s initiation of joint attention. The onset latencies of the avatar’s gaze behavior (i.e. alternating between search houses, establishing mutual gaze, and executing responding or initiating saccades) were also jittered with a uniform distribution between 500 and 1,000 ms. This served to enhance the avatar’s ecological appearance. The subject’s social role as a ‘responder’ or ‘initiator’ only became apparent throughout the course of each trial. Our paradigm thereby created a social context that (1) elicited intentional, goaldriven joint attention (2) naturally informed subjects of their social role without overt instruction, and (3) required subjects to engage in social attention monitoring. In order to account for the effect of non-social task features, the neural correlates of RJA and IJA were investigated relative to nonsocial control conditions that were matched on attentional demands, number of eye movements elicited and task complexity. During these trials, the avatar remained on the screen with his eyes closed, and subjects were told that both partners were completing the task independently. In the IJA control condition (IJAc), subjects found the burglar, looked back to a central fixation point and, when this turned green, saccaded towards the burglar location. In the RJA control condition (RJAc), the fixation point became an arrow directing them to the burglar location (see Fig. 1b). A synchronization pulse was used at the beginning of each acquisition run to allow for the BOLD and eye tracking data to be
123
S94
Fig. 2 Threshold maps are displayed for a Responding to joint attention (RJA - RJAc), b Initiating joint attention (IJA - IJAc), c Initiating over and above Responding [(IJA - IJAc) - (RJA RJAc)], and d Activation common to Responding and Initiating, t [ 3.70, equivalent to p \ 0.05 FDR correction in a, with extent threshold 10 voxels. The threshold for p \ 0.05 FDR correction would have been 2.87, 3.18 and 3.10 in b, c and d respectively. No voxels survived FDR correction for Responding over and above Initiating contrast [(RJA - RJAc) - (IJA - IJAc)] temporally aligned. Our analyses of BOLD data focused on the ‘joint attention phase’ of each trial. Accordingly, event onset times were defined as the time at which the participant opened the last empty house (RJA and RJAc) or found the burglar (IJA and IJAc). Events were modelled as box cars lasting until the time at which joint attention was achieved and the burglar captured. This assisted in accounting for variation in reaction times between trials. All second level t-images were thresholded at t [ 3.70, equivalent to p \ 0.05, with a false discovery rate (FDR) correction for multiple comparisons in the comparison of RJA and RJAc (see Fig. 2a). This threshold was more conservative than p \ 0.05 with FDR correction in any other contrast tested. The use of a single threshold for visualization allowed the results to be more easily compared. Relative to their corresponding control conditions, both RJA (Fig. 2a) and IJA (Fig. 2b) activated a broad frontotemporoparietal network, largely consistent with previous findings (Redcay et al. 2010; Schilbach et al. 2010). Additionally, IJA resulted in more distributed activation across this network, relative to RJA, after controlling for non-social attention (Fig. 2c). A conjunction analysis identified a right-lateralized subset of this network that was common to both RJA and IJA, over and above activation associated with the non-social control conditions (Fig. 2d). Regions included the dorsal portion of the middle frontal gyrus (MFG), inferior frontal gyrus (IFG), middle temporal gyrus (MTG), precentral gyrus, posterior superior temporal sulcus (pSTS), temporoparietal junction (TPJ) and precuneus. The existing literature associates many of these regions with tasks involving perspective taking processes. Specifically, TPJ has been implicated in tasks where subjects form representations of other’s mental states (Samson, Apperly, Chiavarino and Humphreys 2004). The precuneus has been recruited in tasks that involve representing first person (self) and third person (other) visual perspectives (Vogeley et al. 2004). Involvement of IFG has been reported in dyadic tasks where subjects make competitive profit-oriented decisions which intrinsically involve self-other comparisons (Halko, Hlushchuk, Hari and Schu¨rmann 2009). Finally, modulation of pSTS activation has been reported during tasks where
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 subjects determine the intentionality of another’s behavior (Morris, Pelphrey and McCarthy 2008). Together with previous findings, the frontotemporoparietal network identified in our study is consistent with the PDPM’s claim that the neural mechanisms of RJA and IJA have a shared neural basis in adulthood. This may support the ability to simultaneously represent the attentional state of the self and others during interactions (Mundy, Newell 2007). These self-other representations are essential for the achievement of joint attention in ecological contexts, as one must represent the attentional focus of their partner to determine when they can respond to or initiate joint attention. One also must represent their own attentional focus so as to plan initiations of joint attention, and to shift their attentional focus when guided. Furthermore, a portion of the frontoparietal network common to RJA and IJA—including IFG, TPJ and precuneus—revealed additional activation during IJA trials, compared to RJA trials (see Fig. 2c). This is again consistent with the role of this network in simultaneously representing self- and other- oriented attention perspectives, as IJA trials required subjects to represent an additional shift in their partner’s attentional focus (avatar searches, then waits for guidance, then responds), relative to RJA trials (avatar searches, then guides). Our data contributes to ongoing debates in the social neuroscience literature concerning the social specificity of many of the regions included in this network, such as TPJ (Kincade, Abrams, Astafiev, Shulman and Corbetta 2005). Due to the implementation of closely matched non-social conditions, the present study provides further evidence that these substrates may be particularly sensitive to social engagement. This is the first imaging study to directly investigate the neural correlates common to RJA and IJA engagement and thus support the PDPM’s claim that a broad integrated network supports the parallel aspects of both initiating and responding to joint attention. These data inform a neural model of joint attention in adults, and may guide future clinical applications of our paradigm to investigate whether the developmental delay of joint attention in autism is associated with a differential organization of this integrated network.
References Halko M-L, Hlushchuk Y, Hari R, Schu¨rmann M (2009) Competing with peers: mentalizing-related brain activity reflects what is at stake. NeuroImage 46:542–548. doi:10.1016/j.neuroimage.2009.01.063 Kincade JM, Abrams RA, Astafiev SV, Shulman GL, Corbetta M (2005) An event-related functional magnetic resonance imaging study of voluntary and stimulus-driven orienting of attention. J Neurosci 25:4593–4604. doi: 10.1523/JNEUROSCI.0236-05.2005 Morris JP, Pelphrey KA, McCarthy G (2008) Perceived causality influences brain activity evoked by biological motion. Soc Neurosci 3:16–25. doi:10.1080/17470910701476686 Mundy P, Newell L (2007) Attention, joint attention and social cognition. Curr Dir Psychol Sci 16:269–274 Redcay E, Dodell-Feder D, Pearrow MJ, Mavros PL, Kleiner M, Gabrieli JDE, Saxe R (2010) Live face-to-face interaction during fMRI: a new tool for social cognitive neuroscience. NeuroImage 50:1639–1647. doi:10.1016/j.neuroimage.2010.01.052 Samson D, Apperly IA, Chiavarino C, Humphreys GW (2004) Left temporoparietal junction is necessary for representing someone else’s belief. Nat Neurosci 7:499–500. doi:10.1038/nn1223 Schilbach L, Wilms M, Eickhoff SB, Romanzetti S, Tepest R, Bente G, Vogeley K (2010) Minds made for sharing: initiating joint attention recruits reward-related neurocircuitry. J Cogn Neurosci 22:2702–2715. doi:10.1162/jocn.2009.21401 Vogeley K, May M, Ritzl A, Falkai P, Zilles K, Fink GR (2004) Neural correlates of first-person perspective as one constituent of human self-consciousness. J Cogn Neurosci 16:817–827. doi: 10.1162/089892904970799
Cogn Process (2014) 15 (Suppl 1):S1–S158
Action recognition and the semantic meaning of actions: how does the brain categorize different social actions? Dong-Seon Chang1, Heinrich H. Bu¨lthoff1, Stephan de la Rosa1 Max Planck Institute for Biological Cybernetics, Dept. of Human Perception, Cognition and Action, Tu¨bingen, Germany Introduction The visual recognition of actions occurs at different levels (Jellema and Perrett 2006; Blake and Shiffrar 2007; Prinz 2013). At a kinematic level, an action can be described as the physical movement of a body part in space and time, whereas at a semantic level, an action can carry various social meanings such as about the goals or intentions of an action. In the past decades, a substantial amount of neuroscientific research work has been devoted to various aspects of action recognition (Casile and Giese 2005; Blake and Shiffrar 2007; Prinz 2013). Still, the question at which level the representations for different social actions might be encoded and categorically ordered in the brain is largely left unanswered. Does the brain categorize different actions according to their kinematic similarities, or in terms of their semantic meanings? In the present study, we wanted to find out whether different actions were ordered according to their semantic meaning or kinematic motion by employing a visual action adaptation aftereffect paradigm as used in our previous studies (de la Rosa et al. 2014). Materials and methods We used motion capture technology (MVN Motion Capture Suit from XSense, Netherlands) to record different social actions often observed in everyday life. The four social actions chosen as our experimental stimuli were handshake, wave, punch, yopunch (fistbump), and each of the actions were similar or different with the other actions either in terms of their semantic meaning (e.g. handshake and wave both meant a greeting, whereas punch meant an attack and yopunch meant a greeting) or kinematic motion (e.g. the movement of a punch and a yopunch were both similar, whereas the movement of a punch and a wave were very different). To quantify these similarities and differences between each action, a total of 24 participants rated the four different social actions pairwise in terms of their perceived differences in either semantic meaning or kinematic motion on a visual analogue scale ranging from 0 (exactly same) to 10 (completely different). All actions were processed into short movie clips (\ 2 s) showing only the joint movements of an actor (point-light stimuli) from the side view to the participants. Then, the specific perceptual bias for each action was determined by measuring the size of the action adaptation aftereffect in each participant. Each of the four different social actions were shown as a visual adaptor each block (30 s prolonged exposure in the start, 3 x repetitions each trial) while participants had to engage in a 2-Alternative-Forced-Choice (2AFC) task where they had to judge which action was shown. The test stimuli in the 2AFC task were action morphs in 7 different steps between two actions which were presented repeatedly (18 repetitions each block) and randomized. Finally, the previously obtained meaning and motion ratings were used to predict the measured adaptation aftereffect for each action using linear regression. Results The perceived differences in the ratings of semantic meaning significantly predicted the differences in the action adaptation aftereffects (p \ 0.001). The rated differences in kinematic motion alone was not able to significantly predict the differences in the action adaptation aftereffects, although the interaction of meaning and motion was also able to significantly predict the changes in the action adaptation aftereffect for each action (p \ 0.01). Discussion Previous results have demonstrated that the action adaptation aftereffect paradigm could be a useful paradigm for determining the specific perceptual bias for recognizing an action, since depending on the adaptor
S95 stimulus (e.g. if the adaptor was the same action as in one of the test stimuli) a significant shift of the point of subjective equality (PSE) was consistently observed in the psychometric curve judging the difference between two different actions (de la Rosa et al. 2014). This shift of PSE is representing a specific perceptual bias for each recognized action because it is assumed that this shift (adaptation aftereffect) would not be found if there would be no specific adaptation of the underlying neuronal populations recognizing each action (Clifford et al. 2007; Webster 2011). Using this paradigm we showed for the first time that perceived differences between distinct social actions might be rather encoded in terms of their semantic meaning than kinematic motion in the brain. Future studies should confirm the neuroanatomical correlates to this action adaptation aftereffect. The current experimental paradigm also serves as a useful method for further mapping the relationship between different social actions in the human brain. References Blake R, Shiffrar M (2007) Perception of human motion. Ann Rev Psychol 58:47–73. doi:10.1146/annurev.psych.57.102904.190152 Casile A, Giese MA (2005) Critical features for the recognition of biological motion. 348–360. doi:10.1167/5.4.6 Clifford CWG, Webster M a, Stanley GB, et al. (2007) Visual adaptation: neural, psychological and computational aspects. Vision Res 47:3125–3131. doi:10.1016/j.visres.2007.08.023 De la Rosa S, Streuber S, Giese M et al. (2014) Putting actions in context: visual action adaptation aftereffects are modulated by social contexts. PloS ONE 9:e86502. doi:10.1371/journal.pone. 0086502 Jellema T, Perrett DI (2006) Neural representations of perceived bodily actions using a categorical frame of reference. Neuropsychologia 44:1535–1546. doi:10.1016/j.neuropsychologia.2006.01.020 Prinz W (2013) Action representation: Crosstalk between semantics and pragmatics. Neuropsychologia 1–6. doi:10.1016/j.neuropsych ologia.2013.08.015 Webster MA (2011) Adaptation and visual coding. 11:1–23. doi: 10.1167/11.5.3.Introduction
Understanding before language5 Anna Ciaunica Institute of Philosophy, London, England & Institute of Philosophy, Porto, Portugal Abstract How can an infant unable to articulate meaning in verbal communication be an epistemic agent capable of attributing false beliefs? Onishi, Baillargeon (2005) demonstrated false belief understanding in young children through completely nonverbal measures such as violation of expectation (VOE)6 looking paradigm and showed that children younger than 3 years of age, who consistently fail the standard verbal false-belief task (SFBT), can anticipate others’ actions based on their attributed false beliefs. This gave rise to the so-called ‘‘Developmental Paradox’’ (DP): if preverbal human infants have the capacity to respond to others’ false beliefs from at least 15 months, why should they be unable to verbally express their capacity to recognize false 5
An extended version of this paper has been recently accepted for publication in the Review of Philosophy and Psychology, under the title ‘Under Pressure: Processing Representational Decoupling in False-Belief Tasks’. Springer Science + Business Media Dordrecht 2014. 6 The VOE task tests whether children look longer when agents act in a manner that is inconsistent with their false beliefs and relies on the basic assumption that when an individual’s expectations are violated, she is surprised and thus she looks longer at an unexpected event rather than at an expected event.
123
S96 beliefs until they are 4-years old, a full 33 months later? The DP teaches us that visual perception plays a crucial role in processing the implicit false-belief condition as opposed to the explicit/verbal-report condition. But why is perception, in some cases, ‘‘smarter’’ than explicit and verbalized thinking? In this paper I briefly sketch the solution proposed by De Bruin, Ka¨stner (2012), the Dynamic Embodied Cognition and I raise an objection regarding their use of the term ‘‘metarepresentation’’ in explaining the puzzle. Recently, evidence has been mounting to suggest that infants have much more sophisticated social-cognitive skills than previously suspected. The issue at stake is crucial since as Sommerville, Woodward (2010:84) pointed out, assessing infants’ understanding of others’ behavior provides not only a ‘‘snapshot of the developing mind of the child, but also a panorama of the very nature of cognition itself.’’ Consider this challenge: P1. Empirical evidence strongly suggests that basic cognition is smart (since 15-month olds understand false-beliefs). P2. Smart cognition necessarily involves computations and representations (of false beliefs). P3. Hence, basic cognition necessarily involves computations and representations (of false beliefs). De Bruin, Ka¨stner (2012) recently proposed a reconciliatory middle-ground solution between representationalist and enactivist accounts, i.e. Dynamic Embodied Cognition (DEC). They claim that the Developmental Puzzle is best addressed in terms of the relation between ‘‘coupled (online) and decoupled (offline) processes for basic and advanced forms of (social) cognition’’ as opposed to merely representing/not representing false beliefs. They argue that rephrasing the issue in terms of online/offline processing provides us with an explanation of the Developmental Puzzle. How exactly does this work? First, the authors take for granted the premise that infants are equipped with ‘‘implicit’’ abilities that start out as grounded in basic online processes, albeit partly decoupled. It is crucial for their project that these basic implicit abilities already involve decoupling. This is in line with the cognitivist distinction between (a) sub-doxastic mental states that do not possess truth-evaluable propositional content and (b) robust mental states (Spaulding 2010:123). In a second step, they hold that infants’ ‘‘implicit’’ abilities develop gradually into more sophisticated explicit abilities that rely on offline processes to a much larger extent. The coupling and decoupling relations between agent and environment advocated by DEC are dynamic in the sense that ‘‘they are a matter of degree and never an end in itself. (…) The dynamic interplay of decoupled and coupled processes may be used for optimization of cognitive processing.’’ (De Bruin and Ka¨stner 2012: 552 emphasis added). There is definitely much more to be said about DEC, but this gives us the basic flavor. Clearly, DEC borrows from the weak-strategy theorists such as Apperly and Butterfill (2009) the idea that early mechanisms are ‘‘cheap’’ and ‘‘efficient’’, while the late-emerging mechanisms are ‘‘costly’’ but ‘‘flexible’’. But they also borrow from rich theorists (Baillargeon et al. 2010) the idea that preverbal human infants are already capable of decoupling, i.e. taking their own realitycongruent perspective offline, albeit in a very limited way. An important concern regards the use of the term ‘‘metarepresentation’’. As S. Scott (2001) pointed out, there is danger of confusion—with serious consequences for the debate about the nature of higher-level cognition—between two distinct notions of ‘‘metarepresentation’’, as defined by philosophers (Dennett 1998) and by psychologists dealing with the question of autistic disorders (Leslie 1991). According to Dennett (1998), representations are themselves objects in the world, and therefore potential objects of (second-order or meta-) representations. Call this metarepresentation1. For example, drawing a cat on a piece of paper is a type of non-mental representation, which is represented in the mind of the person viewing it. The mental representation is of the drawing, but since the drawing is itself a representation, the viewer has a (mental) metarepresentation of whatever it is that the drawing represents, namely a cat. By contrast,
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Leslie uses the term ‘‘metarepresentation’’ to mean ‘‘(e.g., in the case of understanding pretence-in-others) an internal representation of an epistemic relation (PRETEND) between a person, a real situation and an imaginary situation (represented opaquely)…’’(Leslie 1991:73) Call this metarepresentation2.This definition does not sound at all like the definition of metarepresentation1 as second-order representation pursued above. There is nothing metarepresentational in the sense of ‘‘higher order representation’’ in Leslie’s formulation of the semantics of psychological predicates. Building on this distinction, S. Scott insightfully argues that a representation can contain other representations without being a metarepresentation1. Consider (P): (P) The child BELIEVES that Sally BELIEVES that the marble is in the basket. In what follows, I shall argue that although (P) is a straight-up second-order belief, this does not necessarily involve second-order representation, or metarepresentation1 in Dennett’s sense. Against De Bruin and Ka¨stner (2012), I hold that there are no additional secondorder ‘‘metarepresentational’’ skills involved in SFBT as compared with VOE trials. Much of what I have to say in this section parallels arguments from Scott (2001) with which I am in close agreement. Scott convincingly argued that second-order beliefs do not necessarily require metarepresentations1. It is only necessary to have the ability to represent first order beliefs in order to have second-order beliefs (Scott 2001: 940). Take the following example of a first order belief: (1) Melissa BELIEVES that her dog is dead. The crucial point here is that to simply hold a belief, Melissa need not be aware of her belief or to hold an explicit representation of it. In other words, she need not think to herself: ‘‘I believe my dog is dead’’ or ‘‘It is I who believes that my dog is dead’’. At this level of interpretation, we can speak of animals having this kind of online implicit beliefs, although we may find uncomfortable the idea of dogs having implicit beliefs. Now, consider the following example of a second order belief: (2) Anne BELIEVES that Melissa BELIEVES that her dog is dead. As Scott rightly points out, in order to get (2) Anne needs the representation of Melissa’s dog, the predicate DEAD, and so on. What she does not need is a representation of Melissa’s representation of her dog, the predicate DEAD, and so on. That is, she does not need a second-order representation of any of these things. She can get by with her own first-order representations. Given that neither Melissa nor Anne has any particular need of belief representation in order to be a believer, Anne’s representation of Melissa’s belief need not be second order. In addition, it would seem that what Anne also needs in order to get (2) is a ‘‘representation of Melissa’s BELIEF. That is to say, she needs a representation of Melissa’s mental state of believing in a way that Melissa does not’’ (Scott 2001:939, emphasis added). The question is: is there any metarepresentation1 involved here? Indeed, one might object that Melissa’s belief state involves already implicit or sub-personal representational processing. Now, the distinction between explicit versus implicit or sub-personal mental representations is a complicated issue and need not concern us here. For present purposes, it is sufficient to insist on the idea that Anne’s representation of Melissa’s first-order belief (regardless of the fact that the latter involves or not subpersonal representational processing in Melissa’s mind) does not amount to a second-order metarepresentation1 (in Anne’s mind). But let us suppose for the sake of the argument that Anne holds a representation of Melissa’s first-order implicit belief (B) which in turn involves a certain sub-personal representational processing (S) in Melissa’s brain. Now, if (S) is an implicit, sub-personal representation (in Melissa’s mind), then one consequence would be that in metarepresenting Melissa’s belief (B) [which involves (S)], Anne is only half-aware of what she is metarepresenting. Indeed, given that one member of the double
Cogn Process (2014) 15 (Suppl 1):S1–S158 representational layer, namely (S), remains opaque to her, Anne is aware only of what she is representing, namely (B). Note that this is not a problem per se. One could label this half-blind metarepresenting, metarepresentation3, say. If this is so, then it is difficult for me to see why metarepresenting3 in this sense is supposed to be cognitively more demanding (for Anne) than mere representing. In contrast, recall that in Dennett’s drawing example, the viewer is fully aware of the double representational layer: he forms a mental representation of a drawing of a cat and this makes his metarepresenting1 a genuine second-order cognitive achievement. Hence, it is not clear that metarepresenting1 is at work in the Sally/Anne scenario and this casts doubt on the idea that the ERTs require that infants not only represent but metarepresent. To sum up, according to De Bruin and Ka¨stner, ERTs involve a stronger form of decoupling (precisely because it involves metarepresentational skills and language processing), hence explaining the Developmental Puzzle. Although I agree with De Bruin and Ka¨stner in saying that (a) SFBTs require decoupling, and that (b) the verbal interaction with the experimenter during SFBT plays a crucial role in 3-year olds’ failure to report false-belief understanding, there is still something missing in the picture. Indeed, I fail to see how (a) and (b) alone can solve the Developmental Puzzle, since, as the authors themselves have insisted, the decoupling is supposed to lead to an optimization of cognitive processing. Everybody agrees that strong decoupling is an important evolutionary advantage. But the mystery of the Developmental Puzzle stems from the opposite situation. In order to truly solve the DP, they need to answer the following question: why does stronger decoupling impair (at least in some cases) rather than improve the ‘‘mental gymnastics’’ of representational manipulation? In other words: why do weaker forms of decoupling do a better job in a complex task such as false-belief understanding? Unlike De Bruin and Ka¨stner, I reject the idea that basic forms of mentality are representational and that during VOE scenarios, infants must rely on internal representations of visual information that is available to the other agent but not available to them. Rather, infants understand others’ intentional attitudes as currently and readily available (i.e. directly observable) in the environment. To support this claim, I appeal to empirical findings illustrating that (i) infants’ ability to understand other minds is rooted in their capacity to actively engage in interactive scenarios. Consistent with a burgeoning literature suggesting a common basis for both the production and perception of action, evidence has been mounting to illustrate that infants’ understanding of others is more robust within interactive contexts. In other words, the more engaged the interactions infants/ agents are the more robust the infants’ understanding of others becomes. Children first learn to discern or establish reference in situations that are not defined by differences in how self and other perceive agents and objects visually but by differences in their shared experiential backgrounds, i.e. in what they did, witnessed or heard. For example, Moll and Tomasello (2007) tested the child’s abilities to recall an adults’ knowledge of what she has experienced in three conditions: (1) the child and adult together interacted with a toy; (2) the infant handled the toy with another experimenter, while the adult watched (and the infant was alerted to this several times); (3) the adult handled a toy alone, while the infant watched. As Wilby (2012) pointed out, one might describe the difference in evidence that is available to the infant as follows: (1) X is aware that [I am aware that [X is aware that [p]]]. (2) X is aware that [I am aware that [p]]. (3) X is aware that p. Now, if we apply De Bruin and Ka¨stner’s ‘‘degrees of decoupling’’ explanatory strategy in this specific case, then one could expect that infants would find the first condition (1) the hardest, since it involves several embedded ‘‘layers’’ of decoupling. Yet, the evidence suggests the complete opposite. Hence, it is not clear that crediting infants with an implicit representational decoupling ability is the best strategy here.
S97 References Apperly I, Butterfill S (2009) Do humans have two systems to track beliefs and belief-like states? Psychol Rev 116:953–970 Baillargeon R, Scott RM, Zijing H (2010) False-belief understanding in infants. Trend Cogn Sci 14(3):110–118 Ciaunica A (2014) (in press) Under pressure- processing representational decoupling in false-belief tasks. Rev Philos Psychol. doi: 10.1007/s13164-014-0195-2 De Bruin LC, Ka¨stner L (2012) Dynamic embodied cognition. Phenomenol Cogn Sci 11(4):541–563 Dennett D (1998) Making tools for thinking. In: Sperber D (ed) (2000) Metarepresentation. Oxford University Press, New York Leslie AM (1991) Precursors to a theory of mind. In: Andrew W (ed), Natural theories of mind: evolution, development, and simulation of everyday mindreading. Blackwell, Oxford, pp 63–78 Moll H, Tomasello M (2007) How 14- and 18-month-olds know what others have experienced. Dev Psychol 43(2):309–317 Onishi KH, Baillargeon R (2005) Do 15-month-old infants understand false beliefs? Science 308(8):255–258 Scott S (2001) Metarepresentations in philosophy and psychology. In: Moore J, Stenning K (eds) Proceedings of the twenty-third annual conference of the cognitive science society, University of Edinburgh. LEA Publishers, London Sommerville JA, Woodward A (2010) In: Grammont et al. (eds) Naturalizing intention in action. MIT Press, Cambridge Wilby M (2012) Embodying the false-belief tasks, phenomenology and the cognitive sciences, Special Issue on ‘Debates on Embodied Mindreading’ (ed. S. Spaulding) December 2012, Volume 11, pp 519–540
An embodied kinematic model for perspective taking Stephan Ehrenfeld, Martin V. Butz Cognitive Modeling, Department of Computer Science, University of Tu¨bingen, Germany Abstract Spatial perspective taking (PT) is an important part of many social capabilities, such as imitation or empathy. It enables an observer to experience the world from the perspective of another actor. Research results from several disciplines suggest that the capability for PT is partially grounded in the postural structure of the own body. We investigate an option for enabling PT by employing a potentially learned, own, kinematic body model. In particular, we investigate if the modular modality frame model (MMF), which is a computational model of the brain’s postural representation of its own body, can be used for PT. Our results confirm that MMF is indeed capable of PT. In particular, we show that MMF can be used to infer a probabilistic estimate by recruiting the own, embodied kinematic knowledge for inferring the necessary spatial transformation for PT as well as for deducing object positions and orientations from the actor’s egocentric perspective. Keywords Perspective Taking, Embodiment, Frame of Reference Introduction Perspective taking (PT) may be defined as the ability to put oneself into another person’s spatial, bodily, social, emotional, or even logical reasoning perspective. Taking on an actor’s perspective in one or several of these respects, seems mandatory to be able to interact with the observed actor socially, to imitate the actor, to cooperate with the actor, to imagine situations, events, or episodes the actor has been in or may experience in the future, and to show and experience empathy (Buckner and Carroll, 2007). PT has often been associated with the mirror neuron system (Rizzolatti and Craighero, 2004). However, todate it is still under debate where mirror neurons come from—with
123
S98
123
modality & FoR axis
forward
perspective change
inverse
suggestions ranging from purely associative learning mechanisms, over adaptations for action understanding, to a consequence of epigenetic, evo-devo interactions (Heyes, 2010; Ferrari et al. 2013) Although PT has been addressed in various disciplines, few explicit computational models exist. For example, PT is often simply attributed to the concept of mirror neurons, without specifying how exactly mirror neurons accomplish PT, and which mechanisms are made use of in this process. Fleischer et al. (2012) simulated the development of mirror neurons based on relative object interaction encodings, suggesting and partially confirming that many mirror neurons should be view-point dependent. However, their neural model was mainly hard-coded and it did not include any information processing interaction mechanisms. Purely feed- forward processing allowed the inference of particular types of object-interactions. We believe that PT is an essential ingredient for learning an interactive mirror-neuron system during cognitive development. Essentially, for establishing view- independent mirror neuron activities, two requirements need to be met: First, in order to understand complex actions, the observer must encode an embodied representation of the actor and its surrounding environment. Second, seeing that the sensory information comes from the observer’s egocentric perspective, a change in the frame of reference (FoR) becomes necessary for transferring the perspective into the actor’s egocentric perspective. Biologically and computationally, both requirements may be met by employing attributes of a model of the own body during cognitive development. Doing so, the own body model can be used to continually filter and combine multiple (visual) cues into a rich estimate of an actor’s position and orientation. Moreover, the own body model can be used to compute the necessary spatial translation and rotations, both of which are inevitably a part of the kinematics of the own body model. Therefore, we propose that a PT mechanism may recruit a modular body model of its own inherent, bodily kinematic mappings. To the best of our knowledge, no explicit computational models have been applied to model an embodied PT mechanism. A nonembodied approach to PT may be found in Cabido-Lopes and SantosVictor (2003). To fill this gap, we show here that the modular modality frame model (MMF) (Ehrenfeld and Butz, 2013; Ehrenfeld et al. 2013), which constitutes a biologically inspired model of body state estimations, can exhibit embodied PT. The modular modality frame model (MMF) At its core, MMF is a Bayesian model of modular body state estimation. MMF distributes the body state over a set M of local modules mi, such that each module encodes a part pðxjmi Þ of the whole probabilistic body state pðxjM Þ. Bayesian filtering reduces noise over time, and kinematic mappings connect all mi , leading to continuous information flow between the modules. The modules and their connections are visualized in Fig. 1. Multiple frames of reference (FoRs) are shown in Fig. 1. The head-centered FoR is used to encode joint positions (first row), and limb orientations (second row), where limb orientations consist of three orthogonal vectors—one parallel to the limb, the other vectors specifying its intrinsic rotation. An additional FoR is centered on each body limb and is used to encode the relative orientation of the next distal limb (third row). Finally, Tait-Bryan angles between adjacent limbs are encoded (fourth row). An in-depth description can be found elsewhere (Ehrenfeld and Butz, 2013; Ehrenfeld et al. 2013). In summary, MMF executes transitions between FoRs and implements complex information fusion. MMF’s characteristics are ideal to model PT in an embodied way. When observing the body of an actor, MMF features two ways of accomplishing the necessary FoR transformation for PT. First, any visual information arriving in the first two rows, i.e. position or orientation information relative to the observer’s body, can be projected along the inverse kinematics (gray dotted arrows) to the third row (rectangles), thus inferring limb-relative representations. The result can
Cogn Process (2014) 15 (Suppl 1):S1–S158
eye or observer shoulder eye relative position
proximal
limb axis
distal
actor shoulder
actor elbow
actor wrist
actor fingertips
actor torso
upper arm
forearm
hand
actor shoulder
actor elbow
actor wrist
observer head or head torso relative orientation limb relative orientation
limb relative angles
Fig. 1 The body state is distributed over modules (depicted as circles, filled circles, rectangles and crossed-out rectangles). Along the horizontal axis, different body limbs are shown, and along the vertical axis, different modalities (positions, orientations and angles), which also use different FoRs: relative to a base (first two rows) or relative to the next proximal limb (third and fourth row). Arrows show the kinematic mappings, which connect the modules (dashdotted yellow are the forward kinematics, dotted gray the inverse kinematics, and solid red are distal-to-proximal mappings) be projected back along the forward kinematics (yellow dash-dotted arrows). When the actor’s shoulder position and torso orientation (filled circles) are substituted with a base frame of reference during this process (e.g. position (0,0,0) and orientation (1, 0, 0), (0, 1, 0), (0, 0, 1)), the result represents the actor’s limbs in the actor-relative FoR. Second, any observer-relative visual information arriving in an observer-relative FoR (first two rows of MMF) may also be directly transformed into actor- relative FoRs. As before, the model can accomplish such a transformation by projecting the sensory information along the inverse kinematics (gray dotted) into limb- relative orientation FoRs, only in this case the next proximal input is substituted with the position and orientation of the actor’s shoulder and torso, respectively. Due to the substitutions and because no normalization to the limb length is done, the result in the relative orientation FoR is actually equal to the actor’s limbs in the actor-relative FoR. Equally, the second method can be used to transform objects in the environment from an observer’s egocentric perspective to an actor’s egocentric perspective. As both methods rely exclusively on interactions that are built initially for establishing a distributed representation of the observer’s own body, the observer can simply recruit its own body model to infer the actor’s perspective. When the actor’s shoulder position and orientation are not visible to the observer, also this base perspective can be inferred by MMF given at least one shoulder and torso-relative position and orientation signal. By transforming multiple cues particularly along the distal-toproximal kinematic mappings and fusing them, MMF can build a robust estimate of the actor’s shoulder and torso. In the following, we detail and evaluate these three capabilities of MMF in further detail. Simulations An observer might not always be able to perceive an actor’s torso, while still being able to perceive other parts of an actor’s body. The torso might be occluded, or additional cues might be available for other body parts (e.g. the observer could touch the actor’s hand, providing additional cues, the actor’s hand could be placed on a wellestablished landmark, such as a door handle, or attention could be
Cogn Process (2014) 15 (Suppl 1):S1–S158
S99
0.3
0.2
0.1
10
20
time step Fig. 3 The estimate of the actor’s torso in the observer’s FoR is used to project an object from the observer’s FoR to the actor’s FoR. As the torso estimate improves (cf. Fig. 2), the object projection improves as well. The vertical axis is in units of limb lengths, error bars are SEM estimate inferred from the relative relations. The error of this fused estimate is shown in Fig. 4, green. The improvement of the green performance over the red performance is only possible, because the torso estimate is filtered over time. The results show how continuous filtering and information fusion can improve the body state estimate. Conclusion Recently, we applied MMF to multimodal sensor fusion, Bayesian filtering, and sensor error detection and isolation. As shown in Butz et al. (2014), MMF is also well-suited to model the Rubber Hand Illusion. Two important characteristics of MMF are its modularly distributed state representation and its rigorous multimodal Bayesian fusion, making it highly suitable to model PT in an embodied way. Our results show that MMF is able to infer position and orientation estimates of an actor’s body and objects in the environment from the actor’s egocentric perspective. We showed that this is even possible when the actor’s head and torso are occluded. Moreover, we showed that Bayesian filtering is able to improve the process. All results are obtained by exclusively using the observer’s own body model, i.e. no ‘‘new abilities’’ are required. Thus, the proposed PT approach is fully embodied. The resulting PT capability sets the stage for many skills that at least partially rely on PT. As an estimate of the actor’s whole body
position orientation
0.3
without global measurement with global measurement
2
0.2
0.1
position orientation
0.4
10
20
time step Fig. 2 Error of the estimation of an actor’s shoulder position and torso orientation in an observer’s egocentric FoR. The shoulder and torso themselves are occluded and are inferred via the body model. The vertical axis is in units of limb lengths, and error bars are SEM
position estimation error
estimation error
0.4
0.5
estimation error
focused on the hand). In the following, we show how MMF can use cues from the actor’s hand state and relative relations between adjacent limbs to build a probabilistic estimate of the actor’s torso orientation and shoulder position. In the following simulations, we assume that the actor’s torso is static and its hand moves along an unknown trajectory. To this end, we model the hand’s movement as Gaussian noise with mean zero and a standard deviation of 0.2 rad per angle. The arm has nine degrees of freedom (three on each joint). In each time step, noisy sensory input arrives in all modules depicted in Fig. 1 with a crossedout rectangle (standard deviation of 0.02 per dimension in units of limb length) or a non-crossed-out rectangle (standard deviation of 0.2). Thus, while the fingertip-position and hand-orientation are perceived rather accurately in the observer’s egocentric perspective, relations between adjacent limbs are perceived rather inaccurately. In each time step, MMF projects the sensory information along the solid red arrows to the torso’s position and orientation (filled circles), where Bayesian filtering reduces the sensor noise. The Euclidean distance of the resulting torso estimate from the real torso state is shown in Fig. 2. The results show that despite the high sensory noise in the relative FoRs and the high movement noise, the orientation of the actor’s torso can be inferred with a lower estimation error than the one inherent in most of the FoRs perceived. Results are averaged over 100 individual runs, where each run samples a different shoulder position, torso orientation, and arm trajectory. To infer the actor-relative orientation of objects, the second projection method is evaluated. For this purpose, in each run a new object with random positions and orientations is created in a sphere of one limb length around the actor’s torso. The error of the object’s projection into the actor’s egocentric FoR is shown in Fig. 3. It depends on both the shoulder position and torso orientation estimates (cf. Fig. 2). In accordance to the improvement of those estimates, the object’s projection into the actor’s FoR improves. Last, we evaluate the effects of multimodal sensor fusion and Bayesian filtering on the representation of the actor’s fingertip position in the actor’s egocentric perspective. At first glance, sensory input of the relative relations between adjacent limbs (non-crossedout rectangles in Fig. 1) are sufficient to infer the finger- tip position. The resulting estimation error is shown in Fig. 4, red. It is however advantageous to also include the eye-relative measurements (crossedout rectangles in Fig. 1). They are projected into the actor’s egocentric perspective in the same way environmental objects are projected. However this time the result is fused with the fingertip
1.5
1
0.5
10
20
time step Fig. 4 The vertical axis is in units of limb lengths, and error bars are SEM
123
S100
Cogn Process (2014) 15 (Suppl 1):S1–S158
state is maintained over time, angular estimates and changes in these angular estimates, which allow the inference of current motor activities, are readily available. Because MMF represents these estimates body-relative, the inferred motor activities are the same no matter if the observer or another actor acts. As a consequence, motor primitives can be activated and movements may be classified according to these activities. The observer could for example recognize a motion as a biological motion or even infer the desired effect an actor is trying to achieve. Even more so, the close integration of PT in the body model should allow for easy online imitation learning. Overall, MMF can be considered to either precede the mirror neuron system and provide it with input or to be part of the mirror neuron system for simulating and understanding observed bodily motions of an observed actor. References Buckner RL, Carroll DC (2007) Self-projection and the brain. Trends Cogn Sci 11:49–57 Butz MV, Kutter EF, Lorenz C (2014) Rubber hand illusion affects joint angle perception. PLoS ONE 9(3):e92854 Cabido-Lopes M, Santos-Victor J (2003) Visual transformations in gesture imitation: What you see is what you do. In: IEEE internatonal conference robot, vol 2, pp 2375–2381 Ehrenfeld S, Butz MV (2013) The modular modality frame model: continuous body state estimation and plausibility-weighted information fusion. Biol Cybern 107(1):61–82 Ehrenfeld S, Herbort O, Butz MV (2013) Modular neuron-based body estimation: maintaining consistency over different limbs, modalities, and frames of reference. Front Comput Neurosci 7 Ferrari PF, Tramacere A, Simpson EA, Iriki A (2013) Mirror neurons through the lens of epigenetics. Trends Cogn Sci 17(9):450–457 Fleischer F, Christensen A, Caggiano V, Thier P, Giese MA (2012) Neural theory for the perception of causal actions. Psychol Res 76(4):476–493 Heyes C (2010) Where do mirror neurons come from? Neurosci Biobehav R 34(4):575–583 Rizzolatti G, Craighero L (2004) The mirror-neuron system. Annu Rev Neurosci 27:169–192
The under-additive effect of multiple constraint violations Emilia Ellsiepen, Markus Bader Goethe-Universita¨t, Institut fu¨r Linguistik, Frankfurt am Main, Germany Keywords Gradient grammaticality, Harmonic grammar, Quantitative linguistics Introduction The quest for quantitative evidence in syntactic and semantic research has led to the development of experimental methods for investigating linguistic intuitions with experimental rigor (see overview in Schu¨tze and Sprouse, 2014). This in turn has inspired a renewed interest in grammar formalisms built on constraints with numerical weights (Smolensky and Legendre, 2006; see overview in Pater, 2009). These formalisms assign each sentence a numerical harmony value as defined in (1). The harmony H of a sentence S is the negative weighted sum of all grammatical constraints Ci that are violated by S X wðCi ÞvðS; Ci Þ ð1Þ Harmony of sentence S HðSÞ ¼ i
As discussed in detail by Pater (2009), such formalisms have a great potential in bringing generative linguistics and cognitive science in close contact again.
123
One of the challenges brought about by these developments is how harmony relates to quantitative linguistic evidence. With regard to corpus frequencies, a large body of research has shown that this relationship is non-linear (e.g., Goldwater and Johnson, 2003). With regard to gradient linguistic judgments, the most transparent relationship between constraint weight and perceived grammaticality is postulated by Linear Optimality Theory (LOT) (Keller, 2006) which explicitly aims at providing a model of gradient grammaticality judgments, in particular as obtained by the method of magnitude estimation (ME). This method allows participants to judge sentences on an open-ended continuous numerical scale relative to a predefined reference sentence. Following Bard et al. (1996), magnitude estimation has become a kind of gold standard for assessing grammaticality, although more recently its validity has been questioned (e.g., Sprouse, 2011). LOT claims that the weight of a constraint reflects the decrease in acceptability that results from violating the constraint. A further claim of LOT is that multiple constraints combine additively. Thus, if a sentence contains two constraint violations, its decrease in acceptability should be the sum of the acceptability decrease of violating each constraint in isolation. Some evidence against this assumption was found by Hofmeister et al. (2014). The combined effect of two syntactic constraint violations was under-additive, that is, less than the sum of the separate effects of the two constraints. However, Hofmeister et al. (2014) used a non-standard judgment procedure (the thermometer judgment methodology of Featherston, 2007), and the interaction between the two constraints was only marginally significant. We ran two experiments using a standard magnitude estimation procedure in order to investigate the effect of multiple constraint violations. Experiment 1 investigates the effect of two severe (hard) constraint violations; Experiment 2 investigates the effect of a severe constraint violation coinciding with a mild (soft) constraint violation. In both cases, we find evidence for under-additivity. In the last part, we therefore develop the idea of mapping harmony values to acceptability judgments using a sigmoid linking function to preserve linear cumulativity in harmony, while allowing for under-additive effects in acceptability judgments. Experiment 1 Experiment 1 tested German sentences that contained either no violation at all (2-a), a violation of the position of the finite auxiliary (2-b), an agreement violation (2-c), or both violations at once (2-d). While sentences (2b-d) are all ungrammatical in a binary system, LOT predicts that sentence (2-d) is even less acceptable than (2b-c). The corresponding constraints AuxFirst and Agree are both considered hard constraints in the sense of Sorace and Keller (2005), that is, both violations should cause severe decreases in acceptability. (2) Ich finde, dass die Eltern im Winter an die See… I think, that the parents in winter at the sea… a.
ha¨tten reisen sollen. have travel should
b. *reisen sollen ha¨tten. c. *ha¨tte reisen sollen. d. *reisen sollen ha¨tte. Method The ME procedure closely follows the description of the ME method in Bard et al. (1996) and consisted of a customization phase, where participants were acquainted with the method by judging the length of lines and the acceptability of ten training sentences, and the experimental phase. In each phase, participants first saw the reference stimulus (either a line or a sentence) and assigned it a numerical value. Afterwards, the experimental stimuli were displayed one by one, and participants judged each stimulus relative to the reference stimulus, which remained visible throughout the experiment. The reference sentence
Cogn Process (2014) 15 (Suppl 1):S1–S158
S101
the doctor the patient have helped could b. *der Doktor dem Patienten helfen ko¨nnen ha¨tte. c. ?dem Patienten der Doktor ha¨tte helfen ko¨nnen. d. *dem Patienten der Doktor helfen ko¨nnen ha¨tte.
0.2
SO OS predicted OS
-0.6
-0.2
-0.2
0.2
Agree noAgree predicted noAgree
-0.6
Acceptability (log ratios)
Method The procedure was the same as in Experiment 1. 32 sentences were created, all appearing in four versions according to the two violation types introduced above (4). The experimental sentences were distributed onto four lists according to a Latin square design and combined with 62 filler sentences. 36 students took part in the study. Results Similar to Experiment 1, a repeated measures ANOVA yielded two main effects of AuxFirst and S [ O as well as an interaction (F(35,1) = 20.32, p [ .001). As can be seen on the right hand side of
Aux first
Aux last
Verb cluster order Fig. 1 Results of Experiments 1 and 2
Aux first
Aux last
Verb cluster order
a. linking (xh) = tanh (xh) * 0.75 b. linkinginv (xh) = artanh (xh / 0.75 ) To estimate the weights to be used, we first run a linear model on the acceptability judgment data of Experiment 1 and extract the coefficients of the two simple effects of AuxFirst and Agree, disregarding the interaction term. We apply the inverse of the linking function to these coefficients to obtain the weights to be used in our model which allow us to calculate the harmony values for our four candidates:
0.5
der Doktor dem Patienten ha¨tte helfen ko¨nnen.
a.
0.0
32 sentences were created, all appearing in four versions according to the two violation types introduced above (2). The experimental sentences were distributed onto four lists according to a Latin square design and combined with 278 filler sentences. 36 students, all native speakers of German, took part in the study. Results The acceptability judgments as obtained by the ME procedure were first normalized with the judgment of the reference sentence and then log transformed as is standard practice with ME data. A repeated measures ANOVA revealed significant main effects for both factors (AuxFirst and Agree), as well as a significant interaction between the two (F(35,1) = 14.7, p \ .001). As illustrated on the left hand side of Fig. 1, the effects were not additive (the predicted mean for full additive effects is indicated by the x). The difference between conditions (2-b) and (2-d), however, was still significant, as indicated by a paired t-test (t(35) = 3.55, p \ .01). Experiment 2 Experiment 2 has a similar design as Experiment 1, the difference being that instead of an agreement violation, a violation of subject-object order (S [ O) is investigated, again in combination to a violation of AuxFirst. As shown in (4), the normal order between subject and object is SO. When the order constraint on subject and object is violated, sentence acceptability decreases. However, in contrast to Agree, this is not a hard but a soft constraint in the sense of Sorace and Keller (2005), resulting in a comparatively mild decrease in acceptability. (4) Ich glaube, dass… I think, that…
-0.5
(3) Ich glaube, dass den Bericht der Chef in I believe that the.ACC report the.NOM boss in seinem Bu¨ro gelesen hat. his office read has
Fig. 1, the difference between conditions (4-b) and (4-d) is even smaller than in Experiment 1, but this difference was still significant (t(35) = 3.32,p \ .01). We can conclude that both hard and soft constraints affect acceptability in a cumulative fashion in that additional violations systematically lead to lower acceptability. The effects of the two constraint violations in isolation, however, do not add up in the case of sentences containing both constraint violations. Instead, they combine in an under-additive way. Modelling under-additivity: a sigmoid function to link harmony with acceptability While the results above suggest a cumulative effect of constraint violations in that additional violations always lower acceptability, this cumulativity does not result in a linear decrease. There are at least two explanations for this under-additivity effect: Either harmony is not a linear combination of constraint weights multiplied by the number of violations, or acceptability is not proportional to harmony. In this case, it is not unlikely that acceptability can still be derived from harmony, but by a linking function that accounts for the apparent floor effect we found in the experiments. In this section we will explore the possibility of using a sigmoid function to link harmony values to acceptability judgments. If we assume that the constraint weights and thus the harmony values are given, the appropriate linking function should intuitively preserve the relative distance between harmony values in the upper range, while progressively reducing the difference between harmony values in the lower range, possibly leveling off at a horizontal asymptote which would correspond to the lower bound of acceptability. A sigmoid function with its inflection point at 0 and an asymptote which corresponds to the maximal difference in acceptability could serve this requirement as the 0-point (a structure which does not violate any constraint) is mapped to zero while increasingly lower values are mapped to values that are closer together. If we want to estimate the weights from the acceptability data itself, however, it gets more complicated. If we were to use the differences between acceptability judgments as the weights, we would subsequently predict higher acceptability than observed for structures which exhibit only one violation. This problem, however, can be avoided by first applying the inverse of the sigmoid linking function to the weights as derived by acceptability judgments. As for choosing the correct asymptote, this seems to be an empirical question. As an example, we chose the hyperbolic tangent function with a reduced asymptote of -0.75 instead of -1 and its corresponding inverse (Fig. 2):
tanh(x)*0.75 (adjusted harmony)
(3), almost literally taken from Keller (2000, sentence (B.18)/page 377), is a sentence with non-canonical word order.
-2
-1
1
2
x (harmony) Fig. 2 A linking function between harmony and adjusted harmony
123
S102
Cogn Process (2014) 15 (Suppl 1):S1–S158
Constraint
Coefficient
Weight
AuxFirst
0.46
0.72
Agree
0.40
0.59
H(a) = - (0 + 0*0.72 + 0*0.59) = 0 H(b) = - (0 + 1*0.72 + 0*0.59) = - 0.72 H(c) = - (0 + 0*0.72 + 1*0.59) = - 0.59 H(d) = - (0 + 1*0.72 + 1*0.59) = - 1.31
(5)
-0.5 -0.75
(transformed) harmony
-0.25
0.2 -0.2
Acceptability Agree noAgree Predicted harmony noAgree sigmoid noAgree linear
-0.6
Acceptability (log ratios)
To compare the harmony values to the acceptability judgments, we now apply the linking function to the harmony values and plot the values in Fig. 3 after aligning the 0-point of the transformed harmony scale (Axis on the right) to the zero violation case. Because of rescaling the weights with the inverse, the predicted values for the two single violation cases match exactly. For the condition with two violations, the predicted value comes close to the measured value, and it is much closer than the value predicted under the assumption of a direct proportionality between harmony and acceptability, i.e., linear decrease in acceptability. While it is possible to determine a function that leads to an exact match by choosing a different multiplier in (5), we leave this step to further research as ideally this function should be fitted to a variety of experiments and not only a single one. The existence of a single linking function for all acceptability judgment experiments
Aux first
Aux last
Verb cluster order
Acceptability
-0.5 -0.75 Aux first
(transformed) harmony
0 -0.25
0.2 -0.2 -0.6
Acceptability (log ratios)
Fig. 3 Measured vs predicted values in Experiment 1
SO OS Predicted harmony
Aux last
Verb cluster order Fig. 4 Measured vs predicted values in Experiment 2
123
OS sigmoid OS linear
presupposes a fixed lower bound of acceptability relative to the fully grammatical candidate. To test whether this assumption is reasonable, we apply the same linking function as above to the results of Experiment 2 and plot all values in Fig. 4. The transformed harmony value for the two-violation case here diverges slightly more from the measured mean, but is still much closer than the one predicted by a linear combination of weights. Discussion The current study makes two independent contributions to the area of gradient grammaticality research: Firstly, it provides strong evidence that multiple constraint violations combine in an under-additive fashion in acceptability judgments measured by magnitude estimation. This holds both for concurrent violations of two hard constraints, that is constraints that are context independent and cause severe unacceptability, and for soft constraints which can depend on context and only cause mild degradation. Secondly, we demonstrated that using an appropriate linking function that maps harmony to ME acceptability judgments we are able to model the under-additive effect in judgments while preserving full additive cumulativity in harmony values. It remains to be tested whether this linking function, or an alternative function based on more data, can account for the whole set of ME judgment data. The under-additivity in ME judgments that we observed suggests the existence of a lower bound of perceived grammaticality. If such a lower bound exists, this questions the appropriateness of ME in two ways: If there is a cognitive lower bound, the motivation for using an open-ended scale rather than a bounded scale, like a Likert Scale of suitable size, seems to disappear. Alternatively, it is possible that the method itself is not well-suited to capture differences below a certain threshold as the perception of linear differences might not be linear in general, as is the case with loudness. References Bard EG, Robertson D, Sorace A (1996) Magnitude estimation of linguistic acceptability. Language 72(1):32–68 Featherston S (2007) Data in generative grammar: the stick and the carrot. Theor Ling 33(3):269–318 Goldwater S, Johnson M (2003) Learning OT constraint ranking using a maximum entropy model. In: Spenader J, Eriksson A, Dahl S (eds) Proceedings of the Stockholm workshop on variation within optimality theory. University of Stockholm, pp 111–120 Hofmeister P, Casasanto LS, Sag IA (2014) Processing effects in linguistic judgment data:(super-) additivity and reading span scores. Lang Cogn 6(01):111–145 Keller F (2000) Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. PhD thesis, University of Edingburgh Keller F (2006) Linear optimality theory as a model of gradience in grammar. In: Fanselow G, Fe´ry C, Vogel R, Schlesewsky M (eds) Gradience in grammar: generative perspectives. Oxford University Press, New York, pp 270–287 Pater J (2009) Weigthed constraints in generative linguistics. Cogn Sci 33:999–1035 Schu¨tze CCT, Sprouse J (2014) Judgment data. In Podesva RJ, Sharma D (eds) Research methods in linguistics. Cambridge University Press, Cambridge, pp 27–50 Smolensky P, Legendre G (2006) The harmonic mind: from neural computation to optimality-theoretic grammar (2 Volumes). MIT Press, Cambridge Sorace A, Keller F (2005) Gradience in linguistic data. Lingua 115(11):1497–1524 Sprouse J (2011) A test of the cognitive assumptions of magnitude estimation: Commutativity does not hold for acceptability judgments. Language 87(2):274–288
Cogn Process (2014) 15 (Suppl 1):S1–S158
Strong spatial cognition Christian Freksa University of Bremen, Germany Motivation The ability to solve spatial tasks is crucial for everyday life and thus of great importance for cognitive agents. A common approach to modeling this ability in artificial intelligence has been to represent spatial configurations and spatial tasks in form of knowledge about space and time. Augmented by appropriate algorithms such representations allow the computation of knowledge-based solutions to spatial problems. In comparison, natural embodied and situated cognitive agents often solve spatial tasks without detailed knowledge about underlying geometric and mechanical laws and relationships; they can directly relate actions and their effects due to spatio-temporal affordances inherent in their bodies and their environments. Against this background, we argue that spatial and temporal structures in the body and the environment can substantially support (or even replace) reasoning effort in computational processes. While the principle underlying this approach is well known—for example, it is applied in descriptive geometry for geometric problem solving—it has not been investigated as a paradigm of cognitive processing. The relevance of this principle may not only be to overcome the need for detailed knowledge that is required for a knowledge-based approach; it is also in understanding the efficiency of natural problem solving approaches. Architecture of cognitive systems Cognitive agents such as humans, animals, and autonomous robots comprise brains (resp computers) connected to sensors and actuators. These are arranged in their (species-specific) bodies to interact with their (species-typical) environments. All of these components need to be well tuned to one another to function in a fully effective manner. For this reason, it is appropriate to view the entire aggregate (cognitive agent including body and environment) as a ‘full cognitive system’ (Fig. 1). Our work aims at investigating the distribution, coordination, and execution of tasks among the system components of embodied and situated spatial cognitive agents. From a classical information processing/AI point of view, the relevant components outside the brain or computer would be formalized in some knowledge representation language or associated pattern in order to allow the computer to perform formal reasoning or other computational processing on this representation. In effect, physical, topological, and geometric relations are transformed into abstract information about these relations and the tasks are then performed entirely on the information processing level, where true physical, topological, and geometric relations no longer persist. This classical information-processing oriented division between brain/computer on one hand and perception, action, body, and environment on the other hand is only one way of distributing the
Fig. 1 Structure of a full cognitive system
S103 activities involved in cognitive processing [Wintermute and Laird, 2008]. Alternative ways would be (1) to maintain some of the spatial relations in their original form or (2) to use only ‘mild abstraction’ for their representation. Maintaining relations in their original form corresponds to what Norman [1980] named knowledge in the world. Use of knowledge in the world requires perception of the world to solve a problem. The best-known example of mild abstraction is geographic paper maps; here certain spatial relations can be represented by identical spatial relations (e.g. orientation relations); others could be transformed (e.g. absolute distances could be scaled). As a result, physical operations such as perception, route-following with a finger, and manipulation may remain enabled similarly as in the original domain. Again, perception is required to use these mildly abstracted representations—but the perception task can be easier than the same task under real-world conditions, for example due to the modified scale. A main research hypothesis for studying physical operations and processes in spatial and temporal form in comparison to formal or computational structures is that spatial and temporal structures in the body and the environment can substantially support reasoning effort in computational processes. One major observation we can make when comparing the use of such different forms of representation (formal, mild abstraction, original) is that the processing structures of problem solving processes differ [Marr 1982]. Different processing structures facilitate different ease of processing [Sloman 1985]. Our hypothesis can be plainly formulated as: manipulation + perception simplify computation While the principle underlying this hypothesis is well known—for example, it is applied in descriptive geometry for geometric problem solving—it has not been investigated as a principle of cognitive processing. Reasoning about the world can be considered the most advanced level of cognitive ability; this ability requires a comprehensive understanding of the mechanisms responsible for the behavior of bodies and environments. But many natural cognitive agents (including adults, children, and animals) lack a detailed understanding of their environments and still are able to interact with them rather intelligently. For example, they may be able to open and close doors in a goal-directed fashion without understanding the mechanisms of the doors or locks on a functional level. This suggests that knowledgebased reasoning may not be the only way to implementing problem solving in cognitive systems. In fact, alternative models of perceiving and moving goal-oriented autonomous systems have been proposed in biocybernetics and AI research to model aspects of cognitive agents [e.g. Braitenberg 1984; Brooks 1991; Pfeifer and Scheier, 2001]. These models physically implement perceptual and cognitive mechanisms rather than describing them formally and coding them in software. Such systems are capable of intelligently dealing with their environments without encoding knowledge about the mechanisms behind the actions. The background of the present work has been discussed in detail in [Freksa 2013; Freksa and Schultheis, in press]. Approach With our present work, we go an important step beyond previous embodied cognition approaches to spatial problem solving. We introduce a paradigm shift which not only aims at preserving spatial structure, but also will make use of identity preservation; in other words, we will represent spatial objects and configurations by themselves or by physical spatial models of themselves, rather than by abstract representations. This has a number of advantages: we can avoid loss of information due to early representational commitments: we do not have to decide prematurely which aspects of the world to represent and which aspects to abstract from. This can be decided partly during the problem solving procedure. At this stage, additional contextual information may become available that can guide the choice of the specific representation to be used.
123
S104 Perhaps more importantly, objects and configurations frequently are aggregated in a natural and meaningful way; for example, a chair may consist of a seat, several legs, and a back; if I move one component of a chair, I automatically (and simultaneously!) move the other components and the entire chair, and vice versa. This property is not intrinsically given in abstract representations of physical objects; but it may be a very useful property from a cognitive point of view, as no computational processing cycles are required for simulating the physical effects or for reasoning about them. Thus, manipulability of physical structures may become an important feature of cognitive processing, and not merely a property of physical objects. Similarly, we aim at dealing with perception dynamically, for example allowing for ‘‘on-the-fly’’ creation of suitable spatial reference frames: by making direct use of spatial configurations, we can avoid deciding a priori for a specific spatial reference system in which to perceive a configuration. As we know from problem solving in geometry and from spatial cognition, certain reference frames may allow a spatial problem to collapse in dimensionality and difficulty. For example, determining the shortest route between two points on a map boils down to a 1-dimensional problem [Dewdney 1988]. However, it may be difficult or impossible to algorithmically determine a reference frame that reduces the task given on a 2- or 3-dimensional map to a 1-dimensional problem. A spatial reconfiguration approach that makes use of the physical affordance ‘shortcut’, easily reduces the problem from 3D or 2D to 1D. In other cases, it may be easier to identify suitable spatial perspectives empirically in the field than analytically by computation. Therefore we may be better off by allowing certain operations to be carried out situation-based in the physical spatial configuration as part of the overall problem solving process. In other words, our project investigates an alternative architecture of artificial cognitive systems that may be more closely based on role models of natural cognitive systems than our purely knowledgebased AI approaches to cognitive processing. We focus on solving spatial and spatio-temporal tasks, i.e. tasks having physical aspects that are directly accessible by perception and can be manipulated by physical action. This will permit ‘outsourcing’ some of the ‘intelligence’ for problem solving into spatial configurations. Our approach is to first isolate and simplify the specific spatial problem to be solved, for example by identifying an appropriate taskspecific spatial reference system, by removing task-irrelevant entities from the spatial configuration, or by reconstructing the essence of the spatial configuration by minimal abstraction. In general, it may be difficult to prescribe the precise steps to preprocess the task; for the special case of spatial tasks it will be possible to provide rules or heuristics for useful preprocessing steps; these can serve as metaknowledge necessary to control actions on the physical level. After successful preprocessing, it may be possible in some cases to ‘read’ an answer to the problem through perception directly off the resulting configuration; in other cases the resulting spatial configuration may be a more suitable starting point for a knowledge-based approach to solving the problem. Discussion The main hypothesis of our approach is that the ‘intelligence’ of cognitive systems is located not only in specific abstract problemsolving approaches, but also—and perhaps more importantly—in the capability of recognizing characteristic problem structures and of selecting particularly suitable problem-solving approaches for given tasks. Formal representations may not facilitate the recognition of such structures, due to a bias inherent in the abstraction. This is, where mild abstraction can help: mild abstraction may abstract only from few aspects while preserving important structural properties. The insight that spatial relations and physical operations are strongly connected to cognitive processing may lead to a different division of labor between the perceptual, the representational, the
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 computational, and the locomotive parts of cognitive interaction than the one we currently pursue in AI systems: rather than putting all the ‘intelligence’ of the system into the computer, the proposed approach aims at putting more intelligence into the interactions between components and structures of the full cognitive system. More specifically, we aim at exploiting intrinsic structures of space and time to simplify the tasks to be solved. We hypothesize that this flexible assignment of physical and computational resources for cognitive problem solving may be closer to natural cognitive systems than the almost exclusively computational approach; for example, when we as cognitive agents search for certain objects in our environment, we have at least two different strategies at our disposal: we can represent the object in our mind and try to imagine and mentally reconstruct where it could or should be— this would correspond to the classical AI approach; or we can visually search for the object in our physical environment. Which approach is better (or more promising) depends on a variety of factors including memory and physical effort; frequently a clever combination of both approaches may be best. Although the general principle outlined may apply to a variety of domains, we will constrain our work in the proposed project to the spatio-temporal domain. This is the domain we understand best in terms of computational structures; it has the advantage that we have well-established and universally accepted reference systems to describe and compute spatial and temporal relations. Our research aims at identifying a bag of cognitive principles and ways of combining them to obtain cognitive performance in spatiotemporal domains. We bring together three different perspectives, in this project: (1) the cognitive systems perspective which addresses cognitive architecture and trade-offs between explicit and implicit representations; (2) the formal perspective which characterizes and analyzes the resulting structures and operations; and (3) the implementation perspective which constructs and explores varieties of cognitive system configurations. In the long-term, we see potential technical applications of physically supported cognitive configurations for example in the development of future intelligent materials (e.g. ‘smart skin’ where distributed spatio-temporal computation is required but needs to be minimized with respect to computation cycles and energy consumption). Naturally, the proposed approach will not be as broadly applicable as some of the approaches we pursue in classical AI. But it might discover broadly applicable cognitive engineering principles, which will help the design of tomorrow’s intelligent agents. Our philosophy is to understand and exploit pertinent features of space and time as modality-specific properties of cognitive systems that enable powerful specialized approaches in the specific domain of space and time. However, space and time are most basic for perception and action and ubiquitous in cognitive processing; therefore we believe that understanding and use of their specific structures may be particularly beneficial. In analogy to the notion of ‘strong AI’ (implementing intelligence rather than simulating it [Searle 1980]) we call this approach ‘strong spatial cognition’, as we employ real space rather than simulating its structure. Acknowledgments I acknowledge discussions with Holger Schultheis, Ana-Maria Olteteanu, and the R1-[ImageSpace] project team of the SFB/TR 8 Spatial Cognition. This work was generously supported by the German Research Foundation (DFG). References Braitenberg V (1984) Vehicles: experiments in synthetic psychology. MIT Press, Cambridge Brooks RA (1991) Intelligence without representation, Artif Intell 47:139–159
Cogn Process (2014) 15 (Suppl 1):S1–S158 Dewdney AK (1988) The armchair universe. W.H. Freeman & Company, San Francisco Freksa C (2013) Spatial computing—how spatial structures replace computational effort. In: Raubal M, Mark D, Frank A (eds) Cognitive and linguistic aspects of geographic space. Springer, Heidelberg Freksa C, Schultheis H (in press) Three ways of using space. In: Montello DR, Grossner KE, Janelle DG (eds) Space in mind: concepts for spatial education. MIT Press, Cambridge Marr D (1982) Vision. MIT Press, Cambridge Norman DA (1980) The psychology of everyday things. Basic Books, Inc, New York Pfeifer R, Scheier C (2001) Understanding intelligence. MIT Press, Cambridge Searle J (1980) Minds, brains and programs. Behav Brain Sci 3(3): 417–457 Sloman A (1985) Why we need many knowledge representation formalisms. In Bramer M (ed) Research and development in expert systems. Cambridge University Press, New York, pp 163–183 Wintermute S, Laird JE (2008) Bimodal spatial reasoning with continuous motion. In: Proceedings of AAAI, pp 1331–1337
Inferring 3D shape from texture: a biologically inspired model architecture Olman Gomez, Heiko Neumann Inst. of Neural Information Processing, Ulm University, Germany Abstract A biologically inspired model architecture for inferring 3D shape from textures is proposed. The model is hierarchically organized into modules roughly corresponding to visual cortical areas in the ventral stream. Initial orientation selective filtering decomposes the input into low-level orientation and spatial frequency representations. Grouping of spatially anisotropic orientation responses builds sketch-like representations of surface shape. Gradients in orientation fields and subsequent integration infers local surface geometry and globally consistent 3D depth. Keywords 3D Shape, Texture, Gradient, Neural Surface Representation Introduction The representation of depth structure can be computed from various visual cues such as binocular disparity, kinetic motion and texture gradients. Based on findings from experimental investigations (Liu et al. (2004); Tsutsui et al. (2002)) we suggest that depth of textured surfaces is inferred from monocular images by a series of processing stages along the ventral stream in visual cortex. Each of these stages is related to individual cortical areas or a strongly clustered group of areas (Markov et al. 2013). Based on previous works that develop generic computational mechanisms of visual cortical network processing (Thielscher and Neumann (2003); Weidenbacher et al. (2006)) we propose a model that transforms initial texture gradient patterns into representations of intrinsic structure of curved surfaces (lines of minimal curvature, local self- occlusions) and 3D depth (Li and Zaidi (2000); Todd (2004)). Previous work Visual texture can assume different component structure which suffers from compression along the direction of surface slant when the object appearance curves away from the viewer’s sight. Texture gradients provide a potent cue to local relative depth (Gibson, 1950). Several studies have investigated how size, orientation or density of texture elements convey texture gradient information (Todd and Akerstrom, 1987). Evidence suggests that patterns of changing energy convey the
S105 basic information to infer shape from texture that need to be integrated along characteristic intrinsic surface lines (Li and Zaidi, 2000). Previous computational models try to estimate surface orientation from distortions of the apparent optical texture in the image. The approaches can be subdivided according to their task specificity and the computational strategies involved. Geometric approaches are suggested to reconstruct the structure of the metric surface geometry (e.g., Aloimonos and Swain (1985); Bajcsy and Lieberman (1976); Super and Bovik (1995)). Neural models, on the other hand, infer the relative or even ordinal structure from initial spatial frequency selective filtering, subsequent grouping of the resulting output responses and a depth mapping step (Grossberg et al. 2007; Sakai and Finkel, 1997). The LIGHTSHAFT model of Grossberg et al. (2007) utilizes scale-selective initial orientation filtering and subsequent long-range grouping. Relative depth in this model is inferred by depth-to-scale mapping associating coarse-to-fine filter scales to depth using orientation sensitive grouping cells which define scale- sensitive spatial compartments to fill-in qualitative depth. Grouping mechanisms can be utilized to generate a raw surface sketch to establish lines of minimal surface curvature as a ridge-based qualitative geometry representation (Weidenbacher et al. 2006). Texture gradients can be integrated to derive local maps of relative surface orientation (as suggested in Li and Zaidi (2000); Sakai and Finkel (1997)). Such responses may be integrated to generate globally consistent relative depth maps from such local gradient responses (Liu et al. 2004). The above mentioned models are limited to simple objects most dealing only with regular textures and do not give an explanation as to how the visual system mechanistically produces a multiple depth order representation of complex objects. Model description Our model architecture consists of a multi-stage network of interacting areas that are coupled bidirectionally (extension of (Weidenbacher et al. 2006); Fig. 1). The architecture is composed of four functional building blocks or modules, each one consists of three stages corresponding to the compartment structure of cortical areas: feedforward input is initially filtered by a mechanism specific to the model area, then resulting activity is modulated by multiplicative feedback signals to enhance their gain, and finally a normalization via surround competition utilizes a pool of cells in the space-feature domain. The different stages can be formally denoted by the following steady-state equations (with the filter output modulated by feedback and inhibition by activities from a pool of cells (Eq. 1) and the inhibitory pool integration (Eq. 2)): I;FB I;in n qi;feat b f ðF ðr 0 ÞÞ 1 þ neti;feat þg I ð1Þ ri;feat ¼ I;FB þ qI;in a þ c f ðF ðr 0 ÞÞ 1 þ neti;feat i;feat ! X X pool I;in I I ri;feat þ e max ri;feat Kij qi;feat ¼ d ð2Þ feat
j
feat
I,FB where the P feedbackII signal I isII defined by neti,feat = [kFB II ri,feat] + z2{feat,loc}rz . Here r , r denote output activation of the generic modules (I, II: two subsequent modules in the hierarchy). The different three-stage modules roughly correspond to different cortical areas with different feature dimensions represented neurally (compare Fig. 1): Cortical area V1 computes orientation selective responses using a spatial frequency decomposition of the input; area V2 accomplishes orientation sensitive grouping of initial items into boundaries in different frequency channels to generate representations of surface curvature properties. Different sub-populations of cells in V4/IT are proposed to detect different surface features from distributed responses: One is used to extract discontinuities in the orientation fields (indicative for self-occlusions), another extracts and analyzes anisotropies in the orientation fields of grouping responses to
123
S106
Cogn Process (2014) 15 (Suppl 1):S1–S158
Fig. 1 General overview of models schematics. Texture inputs are decomposed into a space-orientation-frequency domain representation. The cascaded processing utilizes computational stages with cascades of filtering, top-down modulation via feedback, and competition with activity normalization
determine slanted surface regions, and one that integrates patches of anisotropic orientation field representations in order to infer local 3D depth. The approach suggests that the generation of 2D sketch representation of surface invariants seeks to enhance surface border lines, while integrating regions with high response anisotropies in the orientation domain (over spatial frequencies) allows the inference of qualitative depth from texture gradients. The proposed network architecture is composed of four blocks, or modules, each of which defines a cascade of processing stages as depicted in Fig. 1. Module I employs 2D Gabor filters resembling simple cells in area V1. In module II output responses from the previous module are grouped to form extended contour arrangements. Activations are integrated by pairs of 2D anisotropic Gaussian filters separated along the major axis along the target orientation axis of each orientation band (like in area V2). Grouping is computed separately in each frequency band. This is all similar to the LIGHTSHAFT model (Grossberg et al. (2007)) to compute initial spatial frequency- selective responses and subsequently group them into internal boundaries. Unlike LIGHTSHAFT we employ frequency-related response normalization such that relative frequency energy in different channels provide direct input for gradient estimation. The sum of the responses here give a measure of texture compression. In module III the grouping responses are in turn filtered by mechanisms that employ oriented dark-light–dark anisotropic Gaussian spatial weightings with subsequent normalization (like in Thielscher and Neumann (2003)). The output is fed back to module II to selectively enhance occlusion boundaries and edges of the apparent object surface shape. This recurrence helps to extract a sketch-like representation of the surface structure similar to (Weidenbacher et al. 2006). Module IV combines the output of the previous modules and serves as a gradient
123
detector using coarse-grained oriented filters with on/off-subfields (like area V4). In addition, model area IT functions as directed integrator of gradient responses using pairs of anisotropic Gaussian long-range grouping mechanisms truncated by a sigmoid function. These integrate the gradient cell responses to generate an activation that is related to the surface depth profile. Results We show few results in order to demonstrate the functionality of the new proposed model architecture. In Fig. 2 the result of computing surface representations from initial orientation sensitive filtering and subsequent grouping to create a sketch-like shape representation are shown. Then a map of strong anisotropy in the texture energy is shown. These anisotropies refer to locations of local slant in the surface orientation relative to the observer view point and operate independent of the particular texture pattern that appears on the surface. In Fig. 3 the results of orientation sensitive integration of texture gradient responses is shown that leads to a viewer-centric surface depth representation. These results are compared against the ground truth surface height map in order to demonstrate the invariance of the inferred shape independent of the texture pattern in the input. Discussion and conclusion A neural model is proposed that extracts 3D relative depth shape representations of complex textured objects. The architecture utilizes a hierarchical computational scheme of different stages referring to cortical areas V1, V2, V4 and IT along the ventral pathway to generate representations of shape and the recognition of objects. The model also generates a 2D surface sketch from texture images. Such a sketch contains depth cues such as T-junctions or occlusion
Cogn Process (2014) 15 (Suppl 1):S1–S158
S107 selectively integrated for different orientations to generate qualitative surface depth. Acknowledgments O.G. is supported by a scholarship of the German DAAD, ref.no. A/10/90029.
Fig. 2 Result of grouping initial filter responses in space-orientation domain (separately for individual frequency channels) for the input image (upper left). Texture gradient information is calculated over the normalized responses of cells in different frequency channels (upper right). Stronger response anisotropies are mapped to white. The short axis of the anisotropies (strongest compression) coheres with the slant direction (surface tilt). The maximum responses over frequency and orientation (white) create a sketch-like representation of the ridges of a surface corresponding with the orientation of local minimal curvature (bottom left). Also local junctions occur due to selfocclusions generated by concave surface geometry. The result of orientation contrast detection (bottom right) is fed back to enhance the sketch edges
References Aloimonos J, Swain MJ (1985) Shape from texture. In: Proceedings of the 9th IJCAI, Los Angeles, CA, pp 926–931 Bajcsy R, Lieberman L (1976) Texture gradient as a depth cue. Comput Graph Image Process 5(1):52–67 Gibson JJ (1950) The perception of the visual world. Houghton, Mifflin Grossberg S, Kuhlmann L, Mingolla E (2007) A neural model of 3d shape-from-texture: multiple-scale filtering, boundary grouping, and surface filling-in. Vision Res 47(5):634–672 Li A, Zaidi Q (2000) Perception of three-dimensional shape from texture is based on patterns of oriented energy. Vision Res 40(2):217–242 Liu Y, Vogels R, Orban GA (2004) Convergence of depth from texture and depth from disparity in macaque inferior temporal cortex. J Neurosci 24(15):3795–3800 Markov NT, Ercsey-Ravasz M, Van Essen DC, Knoblauch K, Toroczkai Z, Kennedy H (2013) Cortical high-density counterstream architectures. Science 342(6158):1238406 Sakai K, Finkel LH (1997) Spatial-frequency analysis in the perception of perspective depth. Netw Comput Neural Syst 8(3):335–352 Super BJ, Bovik AC (1995) Shape from texture using local spectral moments. IEEE Trans PAMI 17(4):333–343 Thielscher A, Neumann H (2003) Neural mechanisms of cortico– cortical interaction in texture boundary detection: a modeling approach. Neuroscience 122(4):921–939 Todd JT (2004) The visual perception of 3d shape. Trend Cogn Sci 8(3):115–121 Todd JT, Akerstrom RA (1987) Perception of three-dimensional form from patterns of optical texture. J Exp Psychol Human Percept Performance 13(2):242 Tsutsui KI, Sakata H, Naganuma T, Taira M (2002) Neural correlates for perception of 3d surface orientation from texture gradient. Science 298(5592):409–412 Weidenbacher U, Bayerl P, Neumann H, Fleming R (2006) Sketching shiny surfaces: 3d shape extraction and depiction of specular surfaces. ACM Trans Appl Percept 3(3):262–285
An activation-based model of execution delays of specific task steps Fig. 3 3D depth structure computed for different input textures for the same surface geometry (left). Results of inferred depth structure are shown (right) for given ground truth pattern (bottom). Relative error RE measures are calculated to determine the deviation of depth estimation from the true shape boundaries as well as ridge-like structures depicting lines of minimum surface curvature. Unlike previous approaches the model goes beyond a simple detection of local energies of oriented filtering to explain how such localized responses are integrated into a coherent depth representation. Also it does not rely on a heuristic scale-to-depth mapping, like LIGHTSHAFT, to assign relative depth to texture gradients and also does not require diffusive filling- in of depth (steered by a boundary web representation). Instead, responses distributed anisotropically in the orientation feature domain are
Marc Halbru¨gge, Klaus-Peter Engelbrecht Quality and Usability Lab, Telekom Innovation Laboratories, Technische Universita¨t Berlin, Germany Abstract When humans use devices like ticket vending machines, their actions can be categorized into task-oriented (e.g. selecting a ticket) and device-oriented (e.g. removing the bank card after having paid). Device-oriented steps contribute only indirectly to the user’s goal; they take longer than their task-oriented counterparts and are more likely to be forgotten. A promising explanation is provided by the activation-based memory for goals model (Altmann and Trafton 2002). The objectives of this paper are, first, to replicate the step prolongation effect of device-orientation in a kitchen assistance context, and secondly, to investigate whether the activation construct
123
S108 can explain this effect using cognitive modeling. Finally, a necessity and sensitivity analysis provides more insights into the relationship between goal activation and device-orientation effects. Keywords Cognitive Modeling, Human–Computer Interaction, ACT-R, Memory, Human Error Introduction and related work While the research on task completion times in human- computer interaction (HCI) has brought many results of both theoretical and practical nature during the last decades (see John and Kieras 1996, for an overview), the relationship between interface design and user error is still unclear in many parts. Notable exceptions are post-completion errors, when users fail to perform an additional step in a procedure after they have already reached their main goal (Byrne and Davis 2006). This concept can be extended to any step that does not directly support the users’ goals, independently of the position in the action sequence, and has been termed device-orientation in this context (Ament et al. 2009). The opposite (i.e. steps that do contribute to the goal) is analogously called task-orientation. Device-oriented steps take longer and are more prone to omission than task-oriented ones (Ament 2011). A promising theoretical explanation for the effects of device-orientation is provided by the memory for goals model (MFG; Altmann and Trafton 2002). The main assumption of the MFG is that goals underlie effects that are usually connected to memory traces, namely time- dependent activation and associative priming. Within the theoretical framework of the MFG, post-completion errors and increased execution times for post-completion steps are caused by lack of activation of the respective sub-goal. A computational implementation of the MFG that can be used to predict sequence errors has been created by Trafton et al. (2009). This paper aims at investigating the concept of device- orientation on the background of the MFG using cognitive modeling with ACT-R (Anderson et al. 2004). The basic research question is whether human memory constructs as formalized within ACT-R can explain the completion time differences between task- and device-oriented steps found in empirical data. Experiment As the empirical basis for our investigation, we decided not to rely on synthetic laboratory tasks like the Tower of Hanoi game, but instead use an application that could be used by everyone in an everyday environment. Our choice fell on a HTML-based kitchen assistant that had been created for research on ambient assisted living. Besides other things, the kitchen assistant allows to search for recipes depending on regional cuisine (French, Italien, German, Chinese) and type of dish (main dish, appetizer, dessert, pastry). Our experiment was built around this search feature. 12 subjects (17 % female, Mage = 28.8, SDage = 2.4) were invited into the lab kitchen and performed 34 individual search tasks of varying difficulty in five blocks. The user interface (UI) of the kitchen assistant was presented on a personal computer with integrated touch screen. Task instructions were given verbally and all user clicks were recorded by the computer system.7 Individual trials consisted of five phases: 1. Listening to and memorizing the instructions for the given trial. 2. Entering the search criteria (e.g. ‘‘German’’ and ‘‘Main dish’’) by clicking on respective buttons on the screen. This could also contain deselecting criteria from previous trials. 3. Initiating the search using a dedicated ‘‘Search’’ button. This also initiated switching to a new screen containing the search results list if this list was not present, yet. 7
The experiment as described here was embedded in a larger usability study. See Quade et al. (2014) for more details. The instructions are available for download at http://www.tu-berlin. de/?id=135088.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 4. Selecting the target recipe (e.g. ‘‘Lamb chops’’) in the search results list. 5. Answering a simple question about the recipe (e.g. ‘‘What is the preparation time?’’) as displayed by the kitchen assistant after having selected the recipe. We did not analyze the first and last phase as they do not create observable clicks on the touch screen. Of the remaining three phases, entering search criteria and recipe selection are task-oriented, while the intermediate ‘‘Search’’-click is device-oriented. Results We recorded a total of 18 user errors. Four were intrusions, nine were omissions, five were selections of wrong recipes. The application logic of the kitchen assistant inhibits overt errors during the deviceoriented step We therefore focused on completion time as dependent variable and discarded all erroneous trials. As our focus is on memory effects, we concentrated on steps that task only the memory and motor system. We removed all subtasks that need visual search and encoding (phase 4: searching for the target recipe in the results list and clicking on it), and steps that incorporated substantial computer system response times (i.e. moving to another UI screen). 817 clicks remained for further analysis; 361 (44 %) of these were device-oriented. The average time to perform a click was 764 ms (SD = 381) for task-oriented and 977 ms (SD = 377) for deviceoriented steps. As the kitchen assistant has been created for research in an area different from HCI, it introduces interfering variables that need to be controlled. The motor time needed to perform a click on a target element (i.e. button) depends strongly on the size and distance of the target as formalized in Fitts’ law (Fitts 1954). Fitts’ index of difficulty (ID) cannot be held constant for the different types of clicks, we therefore introduced it into the analysis. As the click speed (i.e. Fitts’ law parameters) differs between subjects, we used linear mixed models (NLME; Pinheiro et al. 2013) with subject as grouping factor and Fitts’ law intercept and slope within subject. We also observed a small, but consistent speed-up during the course of the experiment that led us to the introduction of the trial block as additional interfering variable. The analysis of variance was conducted using R (R Core Team 2014). All three factors yielded significant results, we obtained a prolongation effect for device-oriented steps of 104 ms. The results are summarized in Table 1. Discussion The first objective of this paper is met, we could identify a significant execution time delay for device-oriented steps. How does this effect relate to the existing literature? Ament et al. (2009) report an insignificant difference of 181.5 ms between task-oriented and deviceoriented steps. This fits well with the empirical averages reported at the beginning of the results section, although the experimental procedure used there (flight simulation game) led to longer steps with completion times well above two seconds. What remains open is whether the proposed cognitive mechanism behind the time difference, namely lack of activation, can account for this time difference. The next section addresses this question.
Table 1 Regression coefficients (coef.) with confidence intervals (CI) and analysis of variance results for the experiment Factore Name
Coef.
95 % CI of coef.
F1,802
p \.001
Fitts’ ID
165 ms
126 to 204 ms
111.1
trial block
-55 ms
-71 to -39 ms
45.9
\.001
Device-orient.
104 ms
53 to 154 ms
16.4
\.001
Individual slopes for Fitts’ difficulty (ID) ranged from 121 to 210 ms/bit
Cogn Process (2014) 15 (Suppl 1):S1–S158
S109
Table 2 Average click time (Mtime), average memory retrieval time (Mmem), determination coefficient (R2), root mean squared error (RMSE), maximum likely scaled difference (MLSD), and maximum relative difference (%diff) for different amounts of activation spreading (mas) mas
Mtime
Mmem
R2
RMSE
MLSD
%diff
2
1785 ms
591 ms
.759
982 ms
16.5
66 %
4
1509 ms
315 ms
.738
687 ms
12.1
58 %
6
1291 ms
99 ms
.881
477 ms
8.5
50 %
8
1231 ms
37 ms
.912
422 ms
7.9
48 %
10
1210 ms
15 ms
.893
406 ms
7.8
48 %
8
The ACT-R code of the model is available for download at http://www.tu-berlin.de/?id=135088.
Results We evaluated the overall fit of the model by dividing the clicks into eight groups by the screen areas of the origin and target click position (e.g. from type of dish to search; from search to recipe selection) and compared the average click times per group between our human sample and the model. Besides the traditional goodness of fit measures R2 and root mean squared error (RMSE), we applied the maximum likely scaled difference (MLSD; Stewart and West 2010) which also takes the uncertainty in the human data into account. The relative difference between the empirical means and the model predictions is given in percent (%diff). The results for five different amounts of activation spreading are given in Table 2. The model is overall slower than the human participants, resulting in moderately high values for RMSE, MLSD, and relative difference. The explained variance (R2) on the other hand is very promising and hints at the model capturing the differences between different clicks quite well. Sensitivity and necessity analysis In order to test whether our model also displays the device- orientation effect, we conducted a statistical analysis identical to the one used on the human data and compared the resulting regression coefficients. While an acceptable fit of the model is necessary to support the activation spreading hypothesis, it is not sufficient to prove it. By manipulating the amount of activation spreading, we can perform a sensitivity and necessity analysis that provides additional insight about the consequences of our theoretical assumptions (Gluck et al. 2010). Average coefficients from a total of 400 model runs are displayed in Fig. 1. It shows an inverted U-shaped relationship between spreading activation and the device-orientation effect. For intermediate spreading activation values, the time delay predicted by the model falls within the confidence interval of the empirical coefficient, meaning perfect fit given the uncertainty in the data.
Device orientation effect size [ms]
The MFG model We implemented the memory for goals theory based on the mechanism provided by the cognitive architecture ACT-R (Anderson et al. 2004), as the MFG is originally based on the ACT-R theory (Altmann and Trafton 2002). Within ACT-R, memory decay is implemented based on a numerical activation property belonging to every chunk (i.e. piece of knowledge) in declarative memory. Associative priming is added by a mechanism called spreading activation. This led to the translation of the tasks used in our experiment into chains of goal chunks. Every goal chunk represents one step towards the target state of the current task. One element of the goal chunk (‘‘slot’’ in ACT-R speak) acts as a pointer to the next action to be taken. After completion of the current step, this pointer is used to retrieve the following goal chunk from declarative memory. The time required for this retrieval depends on the activation of the chunk to be retrieved. If the activation is too low, the retrieval may fail completely, resulting in an overt error. The cognitive model receives the task instructions through the auditive system, just like the human participants did. For reasons of simplicity, we reduced the information as much as possible. The user instruction ‘‘Search for German dishes and select lamb chops’’ for example translates to the model instruction ‘‘German on; search push; lamb-chops on’’. The model uses this information to create the necessary goal chunks in declarative memory. No structural information about the kitchen assistant is hard coded into the model, only the distinction that some buttons need to be toggled on, while others need to be pushed. While the model should in principle be able to complete the recipe search tasks of our experiment with the procedural knowledge described above, it actually breaks down due to lack of activation. Using unaltered ACT-R memory parameters, the activation of the goal chunks is too low to be able to reach the target state (i.e. recipe) of a given task. We therefore need to strengthen our goals and spreading activation is the ACT-R mechanism that helps us doing so. How we apply spreading activation in our context is inspired by close observation of one of our subjects who used self-vocalization for memorizing the current task information. The self-vocalization contained only the most relevant parts of the task, which happen to be identical to the task- oriented steps of the procedure. We analogously theorize that the goal states representing task-oriented steps receive more spreading activation than their device-oriented counterparts. This assumption is also in line with the discussion of post-completion errors on the basis of the memory for goals model in Altmann and Trafton (2002). For the evaluation of the model, we used ACT-CV (Halbru¨gge 2013) to connect it directly to the HTML-based user interface of the kitchen assistant. In order to be able to study the effect of spreading activation in isolation, we disabled activation noise and manipulated the value of the ACT-R parameter that controls the maximum amount of spreading activation (mas). The higher this parameter, the more additional activation is possible.8
150
100
50
0 2
4
6
8
10
ACT–R activation spreading parameter (mas)
Fig. 1 Device orientation effect size depending on spreading activation amount. The shaded area between the dotted lines demarks the 95 % confidence interval of the effect in the human sample
123
S110 Discussion The MFG model is able to replicate the effects that we found in our initial experiment. The model being overall slower than the human participants could be caused by the rather low Fitts’ law parameter used within ACT-R (100 ms/bit) compared to the 165 ms/bit that we observed. Spreading activation is not only necessary for the model to be able to complete the tasks, but also to display the device-orientation effect (Fig. 1). We can infer that the activation assumption is a sound explanation of the disadvantage of device-oriented steps. Too much spreading activation reduces the effect again, though. This can be explained by a ceiling effect: The average retrieval time gets close to zero for high values of mas (Mmem in Table 2), thereby diminishing the possibility for timing differences. How relevant is a 100 ms difference in real life? Probably not too much by itself. What makes it important is its connection to user errors. Errors itself are hard to provoke in the lab without adding secondary tasks that interrupt the user or create strong working memory strain, thereby substantially lowering external validity. Conclusions The concept of device-orientation versus task-orientation is an important aspect of human–computer interaction. We could replicate that the device-oriented parts of simple goal- directed action sequences take approximately 100 ms longer than their task-oriented counterparts. With the help of cognitive modeling, associative priming could be identified as a possible explanation for this effect. Acknowledgments The authors gratefully acknowledges financial support from the German Research Foundation (DFG) for the project ‘‘Automatische Usability-Evaluierung modellbasierter Interaktionssysteme fu¨r Ambient Assisted Living’’ (AL-561/13-1).
References Altmann EM, Trafton JG (2002) Memory for goals: an activationbased model. Cogn Sci 26(1):39–83 Ament MG, Blandford A, Cox AL (2009) Different cognitive mechanisms account for different types of procedural steps. In: Taatgen NA, van Rijn H (eds) Proceedings of the 31nd annual conference of the cognitive science society, Amsterdam, NL, pp 2170–2175 Ament MGA (2011) The role of goal relevance in the occurrence of systematic slip errors in routine procedural tasks. Dissertation, University College London Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):1036–1060 Byrne MD, Davis EM (2006) Task structure and postcompletion error in the execution of a routine procedure. Hum Factors 48(4):627– 638 Fitts PM (1954) The information capacity of the human motor system in controlling the amplitude of movement. J Exp Psychol 47(6):381–391 Gluck KA, Stanley CT, Moore LR, Reitter D, Halbru¨gge M (2010) Exploration for understanding in cognitive modeling. J Artif Gen Intell 2(2):88–107 Halbru¨gge M (2013) ACT-CV: Bridging the gap between cognitive models and the outer world. In: Brandenburg E, Doria L, Gross A, Gu¨nzlera T, Smieszek H (eds) Grundlagen und Anwendungen der Mensch-Maschine-Interaktion—10. Berliner Werkstatt Mensch- Maschine-Systeme, Universita¨tsverlag der TU Berlin, Berlin, pp 205–210 John BE, Kieras DE (1996) Using GOMS for user interface design and evaluation: which technique? ACM Trans Comput Hum Interact (TOCHI) 3(4):287–319
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2013) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-113 Quade M, Halbru¨gge M, Engelbrecht KP, Albayrak S, Mo¨ller S (2014) Predicting task execution times by deriving enhanced cognitive models from user interface development models. In: Proceedings of the 2014 ACM SIGCHI symposium on engineering interactive computing systems, ACM, New York, pp 139–148 R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org. Accessed 7 May 2014 Stewart TC, West RL (2010) Testing for equivalence: a methodology for computational cognitive modelling. J Artif Gen Intell 2(2):69–87 Trafton JG, Altmann EM, Ratwani RM (2009) A memory for goals model of sequence errors. In: Howes A, Peebles D, Cooper RP (eds) Proceedings of the 9th International Conference of Cognitive Modeling, Manchester, UK
How action effects influence dual-task performance Markus Janczyk, Wilfried Kunde Department of Psychology III, University of Wu¨rzburg, Germany Doing multiple tasks at once typically involves performance costs in at least one of these tasks. This unspecific dual-task interference occurs regardless of the exact nature of the tasks. On top of that, several task characteristics determine how well tasks fit with each other. For example, if two tasks require key press responses with the left and right hand, performance—even in the first performed Task 1—is better if both responses entailed the same spatial characteristic (i.e., if two left or two right responses are required compared with when one left and one right response is required), the so-called ‘‘backward-crosstalk effect’’ (BCE; Hommel, 1998). Similarly, a mental rotation is faster when it is preceded by or performed simultaneously with a manual rotation into the same direction compared to when both rotations go into opposite directions (Wexler, Kosslyn, Berthoz, 1998; Wohlschla¨ger, Wohlschla¨ger, 1998). These examples are cases of specific dual-task interference. Given that the aforementioned tasks require some form of motor output, one may ask: how is this motor output selected? A simple solution to this question has been offered already by philosophers of the 19th century (e.g., Harleß, 1861) and has experienced a revival in psychology in recent decades (e.g., Hommel, Mu¨sseler, Aschersleben, Prinz, 2001): the ideomotor theory (IT). The basic idea of IT is that, first, bidirectional associations between motor output and its consequences (= action effects) are learned. Later on, this bidirectionality is exploited for action selection: motor output is accessed by mentally anticipating the action effects. Conceptually, action effects can be distinguished as being environment-related (e.g., a light that is switched on by pressing a key) or body-related (e.g., the proprioceptive feedback from bending the finger). Against this background, consider again the case of mental and manual rotations. Turning a steering wheel clockwise gives rise to body-related proprioceptive feedback resembling a clockwise turn and even to obvious environment-related action effects, because one sees his/her hand and the wheel turning clockwise. According to IT, exactly these effects are anticipated to select the motor output. However, the rotation directions of (anticipated) effects and of the actual motor output are confounded then. Consequently, one may wonder whether the manual rotation or rather the (anticipated) effect rotation is what determines the specific interference with a mental rotation. The same argument applies to the BCE: Pressing a left of two response keys requires anticipation of, for example, a left body-
Cogn Process (2014) 15 (Suppl 1):S1–S158
S111
related action effect, which is thus similarly confounded with the spatial component of the actual motor output. Given the importance IT attributes to action effects for action selection, we hypothesized that action effects determine the size and direction of specific interference in such cases. We here present results from two studies that aimed to disentangle the contributions of motor output and the respective action effects. Conceivably, it is rather difficult to manipulate body-related action effects, the approach was thus to couple the motor output with environment-related action effects. In a first study we have investigated the interplay of manual and mental rotations (Janczyk, Pfister, Crognale, Kunde, 2012). To disentangle the directions of manual and effect rotations, we resorted to an instrument from aviation known as the attitude indicator or artificial horizon. This instrument provides the pilot with information about deviations from level-flight (perfect horizontal flying). Notably, two versions of this instrument are available (Previc, Ercoline, 1999; see also Fig. 1). In a plane-moving display, the horizon remains fixed and turns of a steering wheel are visualized by the corresponding turns of the plane. Consequently, turning a steering wheel counterclockwise results in an action effect rotating into the same direction. Obviously, manual and effect rotation are confounded but provide a benchmark against which the critical condition using a horizonmoving display can be compared. In this display, the plane remains fixed but the horizon rotates. Consequently, turning the steering wheel counter-clockwise gives rise to an action effect turning clockwise. In our experiments, a mental rotation task (Shepard, Metzler, 1971) was followed by a manual rotation task that required turning a steering wheel. The plane’s curve due to this steering wheel turn was either visualized with the plane-moving or the horizon-moving display. First, with the plane-moving display the manual rotation was initiated faster when the preceding mental rotation went into the same direction (essentially replicating the Wohlschla¨ger, Wohlschla¨ger, 1998, and Wexler et al. 1998, results but with the reversed task order). Second, with the horizon-moving display, the manual rotation was initiated faster when the preceding mental rotation was into the opposite direction (Exp 3). Here, however, the mental and the effect rotation were into the same direction. Thus, these results suggest that important for the specific interference between mental and manual rotations is not so much the motor output itself but rather what follows from this motor output as a consequence. In a second study we used a similar strategy to investigate the origin of the BCE (Janczyk, Pfister, Hommel, Kunde, 2014). In Experiment 1 of this study, participants were presented with colored letters as stimuli (a green or red H or S). Task 1 required a response to the color with a key press of the left hand (index or middle finger) and Task 2 required a subsequent key press depending on the letter identity with the right hand (index or middle finger). The BCE in this case would be reflected in better performance in Task 1 if both tasks required a left or a right response (compatible R1-R2 relations) compared to when one task required a left response and the other a right response (incompatible R1-R2 relations). The critical addition
was that pressing a response key with the right hand briefly flashed a left or right light (i.e., an environment-related action effect), and this was the participants’ goal. One group of participants (the R2-E2 compatible group; see Fig. 2, left part) flashed the left light with a left key press (of the right hand) and the right light with a right key press (of the right hand). This group produced a BCE, that is, better Task 1 performance was observed with compatible R1-R2 relations (see Fig. 2, middle part). Again though, relative locations of motor output and action effects were confounded. Therefore, another group of participants (the R2-E2 incompatible group; see Fig. 2, right part) flashed the left light with a right key press (of the right hand) and the right light with a left key press (of the right hand). Now Task 1 performance was better with incompatible R1-R2 relations (see Fig. 2, middle part). This, however, means that the relative locations of (body-related) action effects of the Task 1 response and the environment-related action effects of Task 2 were compatible. This basic outcome was replicated with continuous movements and action effects (Exp 2) and also when both tasks resulted in environmentrelated action effects (Exp 3). The generative role of anticipated action effects for action selection, a pillar of IT, has been investigated in single task settings in numerous studies. The studies summarized in this paper extend this basic idea to dual-task situations and tested our assertion that mainly the (anticipated) action effects determine the size and direction of specific interference phenomena. In sum, the results presented here provide evidence for this (see also Janczyk, Skirde, Weigelt, Kunde, 2009, for converging evidence). In broader terms, action effects can be construed as action goals. Thus, it is not so much the compatibility of motor outputs and effectors but rather the compatibility/similarity of action goals that induces performance costs or facilitation. Such an interpretation also bears potential for improving dual-task performance and ergonomic aspects in, for example, working environments.
Fig. 1 Illustration of the tasks used by Janczyk et al. (2012)
Acknowledgments This research was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG; projects KU 1964/2-1, 2). References Harleß E (1861) Der Apparat des Willens. Z Philos philos Kri 38: 50–73 Hommel B (1998) Automatic stimulus–response translation in dualtask performance. J Exp Psychol Human 24: 1368–1384 Hommel B, Mu¨sseler J, Aschersleben G, Prinz W (2001) The theory of event coding: a framework for perception and action. Behav Brain Sci 24: 849–878 Janczyk M, Pfister R, Crognale MA, Kunde W (2012) Effective rotations: action effects determine the interplay of mental and manual rotations. J Exp Psychol Gen 141: 489–501
Fig. 2 Illustration of the tasks used by Janczyk et al. (2014) and the results of their Experiment 1. Error bars are within-subject standard errors (Pfister, Janczyk 2013), computed separately for each R2-E2 relation group (see also Janczyk et al. 2014)
123
S112 Janczyk M, Pfister R, Hommel B, Kunde W (2014) Who is talking in backward crosstalk? Disentangling response- from goal-conflict in dual-task performance. Cognition 132: 30–43 Janczyk M, Skirde S, Weigelt M, Kunde, W (2009) Visual and tactile action effects determine bimanual coordination performance. Hum Movement Sci 28: 437–449 Pfister R, Janczyk, M (2013) Confidence intervals for two sample means: Calculation, interpretation, and a few simple rules. Adv Cogn Psychol 9: 74–80 Previc FH, Ercoline WR (1999) The ‘‘outside-in’’ attitude display concept revisited. Int J Aviat Psychol 9: 377–401 Shepard RN, Metzler J (1971) Mental rotations of three-dimensional objects. Science 171: 701–703 Wexler M, Kosslyn SM, Berthoz A (1998) Motor processes in mental rotation. Cognition 68: 77–94 Wohlschla¨ger A, Wohlschla¨ger A (1998) Mental and manual rotations. J Exp Psychol Human 24: 397–412
Introduction of an ACT-R based modeling approach to mental rotation Fabian Joeres, Nele Russwinkel Technische Universita¨t Berlin, Department of cognitive modeling in dynamic human machine systems, Berlin, Germany Introduction The cognitive processes of mental rotation as postulated by Shepard and Metzler (1971) have been extensively studied throughout the last decades. With the introduction of numerous human–machine interface concepts that are integrated the human’s spatial environment (e.g. augmented-reality interfaces such as Google Glass or virtualreality interfaces such as Oculus Rift), human spatial competence and its understanding have become more and more important. Mental rotation is seen as one of three main components of human spatial competence (Linn and Petersen, 1985). A computational model of mental rotation was developed to help understand the involved cognitive processes. This model integrates a wide variety of empirical findings on mental rotation. It was validated in an experimental study and can be seen as a promising approach for further modeling of more complex, application-oriented tasks that include spatial cognitive processes. Mental Rotation In an experiment on object recognition, Shepard and Metzler (1971) displayed two abstract three-dimensional objects from different perspectives to their participants. Both images showed either the same object (same-trials) or mirrored versions of the same object (differenttrials). The objects were rotated around either the vertical axis or within the picture plane. Subjects were asked to determine if both images showed the same object. The authors found that the reaction time needed to match two objects forms a linear function of the angular disparity between those objects. The slope of that linear function is called rotation rate. Following Shepard and Metzler’s interpretation of some analogue rotation process, this means that a high rotation rate represents slow rotation, whereas fast rotation is expressed by a low rotation rate. Since Shepard and Metzler’s (1971) experiment, numerous studies have been conducted on the influences that affect the rotation rate of mental rotation. Based on these findings and on process concepts suggested by various authors, a cognitive model has been developed. The following section summarizes the three main assumptions that the model is based on. First, it is assumed that the linear dependence of angular displacement and reaction time is based on an analogue transformation of mental
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 images. Besides this widely found linearity, this claim is supported by psychophysiological findings that are summarized in Kosslyn (1996). The second assumption stems from findings about the influence of object complexity (e.g. Bethell-Fox and Shepard, 1988; Yuille and Steiger, 1982). It is assumed that objects can be rotated holistically if they are sufficiently familiar. If an object is not, it will be broken down into its components until these components are simple (i.e., familiar) enough to rotate. Then these components will be rotated subsequently. Third, the mental images that are maintained and transformed throughout the rotation task are assumed to be subject to activation processes. This means that they have to be reactivated during the process. This assumption is suggested by Kosslyn (1996) and fits Cowan’s (1999) activation of working memory contents. It is furthermore supported by Just and Carpenter’s (1976) results. Analyzing eye movement during a mental rotation task, the authors found frequent fixation changes between both object images. Cognitive Model A full description of the model is beyond the scope of this paper. This section gives a short overview of the process steps that were derived from the abovementioned assumptions. The described process applies to mental rotation tasks in which both stimuli are presented simultaneously. 1. Stimulus encoding The first image is encoded and a three-dimensional object representation (mental image) is created. 2. Memory retrieval Based on the three-dimensional representation, long term memory is called to check if the encoded object is familiar enough to process its representation. If so, the representation is stored in working memory and the second image is encoded. The created representation is used as reference in the following process steps (reference image). If the object is not familiar enough, the same retrieval is conducted for an object component and the information about the remaining component(s) is stored in working memory. 3. Initial Search Several small transformations (i.e., rotations around different axes by only a few degrees) are applied to the mental image that was first created (target image). After each small rotation, the model evaluates if the rotation reduced the angular disparity between both mental images. The most promising rotation axis is chosen. The decision (as well as the monitoring process in the following step) is based on previously identified corresponding elements of the object representations. 4. Transform and Compare After defining the rotation axis in step 3, the target image is rotated by this axis. During this process, the target representation’s orientation is constantly monitored and compared to the reference representation. The rotation is stopped when both representations are aligned. 5. Confirmation If the object was processed piecemeal, the previously defined rotation is applied to the remaining object components. After that propositional descriptions of all object parts is created for both mental images. A comparison of these delivers a decision for ‘‘same object’’ or ‘‘different objects’’. 6. Reaction Based on the decision a motor response is triggered. Steps 3, 4, and 5 are inspired by the equally named process steps suggested by Just and Carpenter (1976). However, although their purpose is similar to that of Just and Carpenter’s steps, the details of these sub-processes are different.
Cogn Process (2014) 15 (Suppl 1):S1–S158 Furthermore, due to the abovementioned activation processes, steps 3 to 5 can be interrupted if the activation of one or both mental images falls below a threshold. In that case, a reactivation sub-process is triggered that includes re-encoding of the corresponding stimulus. The model was implemented within the cognitive architecture ACT-R (Anderson et al. 2004). Since ACT-R does not provide structures for modeling spatial cognitive processes, an architecture extension based on Gunzelmann and Lyon’s (2007) concept was developed. Adding the structures for spatial processes to the architecture will enable ACT-R modelers to address a broad range of applied tasks that rely on spatial competence. Experiment Empirical validation is an integral part of the model development process. Therefore, an experimental study was conducted to test the assumptions about training effects and to predict these effects with the above mentioned model. Experimental approach The experimental task was a classic mental rotation task with threedimensional objects as in Shepard and Metzler’s (1971) study. In this task, two images were displayed simultaneously. Reaction times were measured for correctly answered same-trials. Different-trials and human errors include cognitive processes that are not addressed by the discussed cognitive model. To test the assumption of object-based learning, the objects occurred with different frequencies. The entire stimulus set consisted of nine objects, adopted from the stimulus collection of Peters and Battista (2008). One third of all trials included the same object, making this familiar object occur four times as often as each of the other eight unfamiliar objects. The object used as familiar was balanced over the participants. To capture training effects, the change of rotation rates was monitored. Following the approach of Tarr and Pinker (1989), the experiment was divided into eight blocks. In each block one rotation rate for the familiar object and one rotation rate for all unfamiliar objects were calculated, based on the measured reaction times. The described model is designed to predict learning-induced changes in the rotation rate. As Schunn and Wallach (2005) suggest, two measures for the model’s goodness of fit were used to evaluate the model. As the proportion of data variance that can be accounted for by the model, r2 is a measure of how well the model explains trends in the experimental data. RMSSD (Root Mean Squared Scaled Deviation), however, represents the data’s absolute deviation, scaled to the experimental data’s standard error. Experimental Design The study has a two-factorial within-subjects-design with repeated measurements. As first independent variable the experimental block was varied with eight levels. This variable represents the participants’ state of practice. The second independent variable was object familiarity (two levels: familiar object and unfamiliar objects). As dependent variable, two rotation rates were calculated per block and subject (one for each class of objects). Sample 27 subjects (18f, 9 m) participated in the study. The participants’ age ranged from 20 to 29 years (m = 26.1). Two persons received course credit, the others were paid 10 € for participation. Procedure After receiving instructions, the participants were required to complete eight experimental blocks, each including 48 trials. Of these 48 trials, 16 trials displayed the familiar object. Half the trials were same- the other half were different-trials. Results The experiment was repeated for two subjects because the number of correct same trials was too low to calculate valid rotation rates in
S113 numerous blocks. Also, the last two experimental blocks were excluded from data analysis because fatigue interfered with the training effects. Generally, the expected training and object familiarity effects occurred, as reported in Joeres and Russwinkel (accepted). The effects that were found in the experiment (Ex) and predicted by the model (M) are displayed in Fig. 1. It can be seen that the predicted rotation rates are considerably lower than the experimentally found. A possible explanation for this disparity can be found in the abovementioned reactivation process that includes re-encoding of the stimuli. The model, however, does not claim to address stimulus encoding validly. Therefore, duration differences in this process can cause the data deviation. Nevertheless, the trends, i.e. the shape of the learning curves, are validly predicted by the model. This is the case for the familiar object and for the unfamiliar objects, respectively. This first impression is confirmed by the goodness-of-fit measures, as listed in Table 1. Although no ‘golden standard’ exists for these measures, it can be said that the absolute value deviation is rather high with a mean RMSSD = 4.53. The data trends, however, were matched rather well, as indicated by the high r2 values (Fig. 1). Discussion The presented study showed that the model can validly replicate certain training effects in mental rotation. It can therefore be seen as a promising approach for modeling mental rotation and, with further research, mental imagery. As briefly discussed, the model assumptions are partially based on eye movement data. Therefore, further model validation data should be provided in a follow-up study in which eye movement during a mental rotation task is predicted by the model and evaluated experimentally.
Table 1 Goodness-of-fit measures
Condition
RMSSD
r2
Familiar object
5.03
.74
Unfamiliar object
4.02
.80
Mean
4.53
.77
Fig. 3 Experimental (Ex) and model (M) data
123
S114 If that study is successful, the model can be extended to further types of stimuli and to more complex, application-oriented tasks including mental imagery. References Anderson JR, Bothell D, Byrne MD, Douglass S, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychol Rev 111(4):1036–1060. doi:10.1037/0033-295X.111.4.1036 Bethell-Fox CE, Shepard RN (1988) Mental rotation: effects of stimulus complexity and familiarity. J Exp Psychol Human Percept Performance 14(1):12–23 Cowan N (1999) An embedded-process model of working memory. In Miyake A, Shah P (eds) Models of working memory. Mechanisms of active maintenance and executive control. Cambridge University Press, Cambridge, pp 62–101 Gunzelmann G, Lyon DR (2007) Mechanisms for human spatial competence. In Barkowsky T, Knauff M, Ligozat G, Montello DR (eds) Spatial cognition V: reasoning, action, interaction, pp 288–308 Joeres F, Russwinkel N (accepted). Object-related learning effects in mental rotation. In: Proceedings of the Spatial Cognition 2014, Bremen. Just MA, Carpenter PA (1976) Eye fixations and cognitive processes. Cogn Psychol 8(4):441–480. doi:10.1016/0010-0285(76)90015-3 Kosslyn SM (1996) Image and brain: the resolution of the imagery debate: the resolution of the imagery debate, 1st edn. A Bradford book. MIT Press, Cambridge Linn MC, PetersenAC (1985) Emergence and characterization of sex differences in spatial ability: a meta-analysis. Child Dev1479–1498 Peters M, Battista C (2008) Applications of mental rotation figures of the Shepard and Metzler type and description of a mental rotation stimulus library. Brain nd Cogn 66(3):260–264. doi:10.1016/j. bandc.2007.09.003 Schunn CD, Wallach D (2005) Evaluating goodness-of-fit in comparison of models to data. In: Psychologie der Kognition: Reden und Vortra¨ge anla¨sslich der Emeritierung von Werner Tack Shepard RN, Metzler J (1971) Mental rotation of three-dimensional objects. Science 171:701–703 Shepard S, Metzler D (1988) Mental rotation: effects of dimensionality of objects and type of task. J Exp Psychol Human Percept Performance 14(1):3–11 Tarr MJ, Pinker S (1989) Mental rotation and orientation-dependence in shape recognition. Cogn Psychol 21(2):233–282. doi:10.1016/ 0010-0285(89)90009-1 Yuille JC, Steiger JH (1982) Nonholistic processing in mental rotation: some suggestive evidence. Percept Psychophys 31(3):201–209
Processing linguistic rhythm in natural stories: an fMRI study Katerina Kandylaki1, Karen Bohn1, Arne Nagels1, Tilo Kircher1, Ulrike Domahs2, Richard Wiese1 1 Philipps Universita¨t Marburg, Germany; 2 Universita¨t zu Ko¨ln, Germany Keywords Rhythm rule, Speech comprehension, rhythmic irregularities, fMRI Abstract Language rhythm is assumed to involve an alternation of strong and weak beats within a certain linguistic domain, although the beats are not necessarily isochronously distributed in natural language. However, in certain contexts, as for example in compound words, rhythmically induced stress shifts occur in order to comply with the so-called Rhythm Rule (Liberman, Prince 1977). This rule operates when two stressed
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 adjacent syllables create a stress clash or adjacent unstressed syllables (stress lapse) occur. Experimental studies on speech production, judgment of stress perception, and event-related potentials (ERPs) (Bohn, Knaus, Wiese, Domahs 2013) have found differences in production, ratings, and ERP components respectively, between well-formed structures and rhythmic deviations. The present study builds up on these findings by using functional magnetic resonance imaging (fMRI) in order to localize rhythmic processing (within the concept of Rhythm Rule) in the brain. Other fMRI studies on linguistic stress found effects in the supplementary motor area, insula, precuneus, superior temporal gyrus, parahippo- campal gyrus, calcarine gyrus and inferior frontal gyrus (Domahs, Klein, Huber, Domahs 2013; Geiser, Zaehle, Jancke, Meyer 2008; Rothermich, Kotz 2013). However, what other studies have not investigated yet is rhythm processing in natural contexts, thus in the course of a story which is not further controlled for a metrically isochronous speech rhythm. Here we examine the hypotheses that a) well-formed structures are processed differently than rhythmic deviations in compound words for German, b) this happens in speech processing of stories in the absence of a phonologically related task (implicit rhythm processing). Our compounds consisted of three parts (A(BC)) that build a premodifier-noun combination. The modifier was either a monosyllabic noun (‘‘Holz’’, wood) or a bisyllabic noun (‘‘Plastik’’, plastic) with lexical stress on the initial syllable. The premodifier was followed by a disyllabic noun bearing compound stress on the initial syllable in isolation (‘‘Spielzeug’’, toy). When combining these two word structures the pre- modifier bears overall compound stress and the initial stress of the disyllabic noun should be shifted right- wards to its final syllable, in order to be in accordance to the Rhythm Rule: Holz-spiel-zeug (wooden toy(s)). On the other hand if the disyllabic noun is combined with a preceding disyllabic noun bearing initial stress, a shift is unnecessary allowing for the stress pattern: Pla- stik-spiel-zeug (plastic toy(s)). The first condition we call SHIFT and the second NO SHIFT. In contrast to these well-formed conditions we induce rhythmically ill- formed conditions: CLASH for the case that Holz- spiel-zeug keeps the initial stress of its compounds and LAPSE when we introduce the unnecessary shift in Pla- stik-spiel-zeug. We constructed 20 word pairs following the same stress patterns as ‘‘Holz-/Plastikspielzeug’’ and embedded them in 20 two-minute long stories. Our focus when embedding the conditions was the naturalness of the stories. For example, word-pair ‘‘Holzspielzeug’’ vs. ‘‘Plastikspielzeug’’ would thus appear in the following context:’The clown made funny grim- aces, reached into his red cloth bag and threw a small wooden toy to the lady in the front row.’ vs.’The toys, garden chairs and pillows remained however outside. The mother wanted to tidy up the plastic toys from the garden after dinner.’ We obtained images (3T) of 20 healthy right-handed German monolinguals (9 male) employing a 2x2 design: well-formedness (rhythmically well-formed vs. ill-formed) x rhythm-trigger (monosyllabic vs. disyllabic premodifier). Subjects were instructed to listen carefully and were asked two comprehension questions after each story. On the group level we analyzed the data in the 2x2 design mentioned above. Our critical events were the whole compound words. We report clusters of p \ .005 and volumes of at least 72 voxels (Monte Carlo corrected). For the main effect of well-formedness we found effects in the left cuneus, precuneus and calcarine gyrus. For the main effect of rhythmtrigger we found no significant differences at this supra-threshold level, which was expected since we did not hypothesize an effect of the length of the premodifier. Our main finding is the interaction of well-formedness and rhythmic-trigger in the precentral gyrus bilaterally and in the right supplementary motor area (SMA). Since the interaction was significant we calculated theoretically motivated pairwise contrasts within one rhythmic trigger level. For the monosyllabic premodifier CLASH vs. SHIFT revealed no significant clusters, but, interestingly, the opposite contrast (SHIFT vs. CLASH) showed differences in the right superior frontal gyrus, right inferior frontal gyrus
Cogn Process (2014) 15 (Suppl 1):S1–S158 (rIFG, BA 45), right lingual and calcarine gyrus, bilateral precentral gyrus (BA6,BA4), left precentral gyrus (BA3a). For the bisyllabic premodifier LAPSE vs. NO SHIFT activated significantly the left inferior temporal gyrus, left parahippocampal gyrus, left insula, bilateral superior temporal gyrus (STG), right pre- and post- central gyrus. NOSHIFT vs. LAPSE activated significantly the right lingual gyrus and the calcarine gyrus bi- laterally. We finally compared the two rhythmically ill- formed structures LAPSE vs. CLASH and found significant activation in the right supplementary motor area and premotor cortex. Our findings are in line with previous fMRI findings on rhythmic processing. Firstly, the superior temporal gyrus is robustly involved in rhythmic processing irrespective of the task of the study: semantic and metric task (Rothermich, Kotz 2013), speech perception of violated vs. correctly stressed words (Domahs, Klein, Huber, Domahs 2013) and in explicit and implicit isochronous speech rhythm tasks (Geiser, Zaehle, Jancke, Meyer 2008). To this we can add with our careful-listening task comparable to the semantic task of (Rothermich, Kotz 2013). Our contribution is that we found activations for the implicit task of careful listening which have only been found for explicit tasks before: these include the left insula, the bilateral precentral gyrus, the precuneus and the parahippocampal gyrus. Lastly, the activation in the supplementary motor areas completes the picture of rhythm processing regions in the brain. This finding is of special interest since it was strong for the comparison within rhythmically ill-formed conditions LAPSE vs. CLASH. This might be due to the fact that stress lapse structures contain two violations, i.e. a deviation from word stress which is not rhythmically licensed, while the clash structures contain only a rhythmically deviation but keep the original word stress. The differences in activations found for well-formedness show that even in implicit rhythmical processing the language parser is sensitive to subtle deviations in the alternation of strong and weak beats. This is particularly evident in the STG activation associated with the processing of linguistic prosody, SMA activation which has been suggested to be involved in temporal aspects of the processing of sequences of strong and weak syllables, and IFG activation associated with tasks requiring more demanding processing of suprasegmental cues. References Bohn K, Knaus J, Wiese R, Domahs U (2013) The influence of rhythmic (ir) regularities on speech processing: evidence from an ERP study on German phrases. Neuropsychologia 51(4):760–771 Domahs U, Klein E, Huber W, Domahs F (2013) Good, bad and ugly word stressfMRI evidence for foot structure driven processing of prosodic violations. Brain Lang 125(3):272–282 Geiser E, Zaehle T, Jancke L, Meyer M (2008) The neural correlate of speech rhythm as evidenced by metrical speech processing. J Cogn Neurosci 20(3):541–552 Liberman M, Prince A (1977) On stress and linguistic rhythm. Linguistic Inquiry 249–336 Rothermich K, Kotz SA (2013) Predictions in speech comprehension: fMRI evidence on the metersemantic interface. Neuroimage 70:89–100
Numbers affect the processing of verbs denoting movements in vertical space Martin Lachmair1, Carolin Dudschig 2, Susana Ruiz Ferna´ndez 1, Barbara Kaup 2 1 Leibniz Knowledge Media Research Center (KMRC), 2 Psychology, University of Tu¨bingen, Germany Recent studies have shown that nouns referring to objects that typically appear in the upper or lower visual field (e.g., roof vs. root) or
S115 verbs referring to movements in vertical space (e.g., rise vs. fall) facilitate upwards or downwards oriented sensorimotor processes, depending on the meaning of the word that is being processed (Lachmair, Dudschig, De Filippis, de la Vega and Kaup 2011; Dudschig, Lachmair, de la Vega, De Filippis and Kaup 2012). This finding presumably reflects an association of words with experiential traces in the brain that stem from the reader’s interactions with the respective objects and events in the past. When later the words are being processed in isolation, the respective experiential traces become reactivated, providing the possibility of interactions between language processing and the modal systems (cf. Zwaan and Madden 2005). Such interactions are also known from other cognitive domains, such as for instance number processing (Fischer, Castel, Dodd and Pratt 2003). Here high numbers facilitate sensorimotor processes in upper vertical space and low numbers in lower vertical space (Schwarz and Keus 2004). The question arises whether the observed spatial-association effects in the two domains are related. A recent study conducted in our lab investigated this question. The reasoning was as follows: If number processing activates spatial dimensions that are also relevant for understanding words, then we can expect that processing numbers may influence subsequent lexical access to words. Specifically, if high numbers relate to upper space, then they can be expected to facilitate understanding of an ‘‘up-word’’ such as bird. The opposite should hold for low numbers which should facilitate the understanding of a ‘‘down-word’’ such as root. This is exactly what we found in an experiment in which participants saw one of four digits (1,2,8,9) prior to the processing of up- and down-nouns in a lexical decision task (Lachmair, Dudschig, delaVega and Kaup 2014). In the present study we aimed at extending these findings by investigated whether priming effects can be observed for the processing of verbs referring to movements in the vertical dimension (e.g., rise vs. fall). Method Participants (N = 34) performed a lexical decision task with 40 verbs denoting an up- or downwards oriented movement (e.g., rise vs. fall) and 40 pseudowords. Verbs were controlled for frequency, length and denoted movement direction. The words were preceded by a number, one of the set {1, 2, 8, 9}. Correctly responding to the verbs required a key press on the left in half of the trials and on the right in the other half. The order of the response mapping was balanced across participants. Each trial started with a centered fixation cross (500 ms), followed by a number (300 ms). Afterwards the verb/pseudo-word stimulus appeared immediately and stayed until response. Response times (RTs) were measured as the time from stimulus onset to the key press response. Each stimulus was presented eight times, resulting in a total of 640 experimental trials (320 verb-trials + 320 pseudo wordtrials), subdivided into 8 blocks, separated by a self-paced break with error information. Each experimental half started with a short practice block. To ensure the processing of the digits, the participants were informed beforehand that they should report the numbers they had seen in a short questionnaire at the end of the experiment. The design of the experiment was a 2 (number magnitude: low vs. high) x 2 (verb direction: up vs. down) x 2 (response mapping) design with repeated measurements in all variables. Results The data of six participants were excluded due to a high number of errors ([10 %) in all conditions. Responses to pseudo words, responses faster than 200 ms, and errors were excluded from further analyses. We found no main effect of number magnitude (Fs \ 1.8), no effect of response mapping (Fs \ 1), but a main effect of verb direction with faster responses for down- compared to up-verbs (F1(1,26) = 5.61, p \ .05; F2 \ 1; 654 ms vs. 663 ms). Interestingly, we also found a significant interaction of number magnitude and verb direction, F1(1,26) = 5.23, p \ .05; F2(1,38) = 3.46, p = .07, with slower responses in congruent compared to incongruent trials [up verbs: 668 ms vs. 658 ms; down verbs: 654 vs. 654 ms]. To obtain
123
S116 more information with regard to whether this effect depends on how deeply the number primes were being processed, we conducted post hoc analyses. Participants were subdivided into two groups, with Group 1 including all participants who had correctly reported the numbers at the end of the experiment (N = 14), and Group 2 including the remaining participants. Group 1 again showed an interaction between number magnitude and verb direction (F1(1,26) = 9.47, p \ .01; F2(1,38) = 3.38, p = .07), however Group 2 did not (Fs \ 1). Mean RT of both groups are displayed in Fig. 1. Discussion The present findings show that reading verbs denoting an up- or downwards oriented movement is affected by the preceding processing of high and low numbers. As such, the presented findings provide evidence for the view that spatial associations observed in number and word processing may share a common basis (Barsalou 2008). Interestingly, in contrast to Lachmair et al. (2014), the results show interference instead of facilitation in spatially congruent conditions. Possibly, this deviating findings reflect the fact that verbs referring to movements in vertical space (such as rise or fall) are rather complex and implicitly refer to two spatial locations, namely the starting point and the end point of the movement. Maybe participants dynamically simulated the described movements beginning with the starting point. Considering that verbs are assumed to trigger complex comprehension processes (see Vigliocco, Vinson, Druks, Barber and Cappa 2011), it seems plausible to assume that our experimental task may have tapped into early rather than late simulation processes. This in turn may explain why interference rather than facilitation was observed in the present experiments. One could of course argue that this explanation is not very convincing considering that the study by Dudschig et al. (2012) also presented participants with verbs referring to upwards or downwards directed movements (as in the current study) and nevertheless observed facilitation in spatially congruent conditions, not interference. However, we think that differences concerning temporal aspects of the experimental task may explain the different results. The study by Dudschig et al. (2012) investigated the speed with which upwards
Cogn Process (2014) 15 (Suppl 1):S1–S158 or downwards directed movements could be initiated by the participants after having processed the motion verbs, whereas the current study investigated the effects of number primes that were presented prior to the processing of the motion verbs. Thus, it seems well possible that the task in Dudschig et al. tapped into later simulation processes than the task in the current study. Of course, future studies are needed to directly investigate this post hoc explanation of our results. One possibility would be to change the temporal aspects of the experimental task in the current paradigm such that participants spend more time processing the verbs, giving later simulation processes a chance to occur. Another possibility would be to present participants with verbs in the past perfect denoting a movement that has already taken part in the past (e.g., gestiegen [had risen]). Maybe participants then focus more on the end point of the denoted movement, leading to facilitation effects in spatially congruent conditions. One further aspect of the results obtained in the present study calls for discussion. The interaction effect between number and word processing was only observed for those participants who could correctly recall the number primes at the end of the experiment. One possible explanation is that the number primes need to be processed at a certain level of processing in order for them to affect the subsequent processing of direction associated words. This would suggest that the interaction effect between number and word processing is not fully automatic. Another possibility is that those participants who did not recall the number primes correctly at the end of the experiment simply did not adequately follow the instructions and strategically ignored the number primes because they were of no relevance to the experimental task. If so, it would be of no surprise that these participants did not experience any interference from the number primes, and no further conclusions could be drawn. One interesting manipulation for follow-up studies to the current experiment would be to present the number primes for a very short duration and/or framed by a visual mask (see Dudschig et al. 2014). An interaction effect between number and word processing under these conditions would provide strong evidence for the view that spatial associations in number and word processing indeed share a common basis independent of any strategic behavior of inter-relating the two domains. Acknowledgments We thank Elena-Alexandra Plaetzer for her assistance in data collection. This work was supported by a grant from the German Research Foundation (SFB 833/B4 [Kaup/Leuthold]).
Fig. 1 Mean RT of correct responses as a function of verb direction (up vs. down) and number magnitude (high vs. low). Participants in Group 1 correctly recalled the number primes at the end of the experiment, participants in Group 2 did not. Error bars represent the 95 % confidence interval for within-subject designs (Masson and Loftus 2003)
123
References Barsalou LW (2008) Grounded cognition. Ann Rev Psychol 59:617–645 Dudschig C, de la Vega I, De Filippis M, Kaup B (2014) Language and vertical space: on the automaticity of language action interconnections. Cortex 58: 151–160 Dudschig C, Lachmair M, de la Vega I, De Filippis M, Kaup B (2012) Do task-irrelevant direction-associated motion verbs affect action planning? Evidence from a Stroop paradigm. Mem Cogn 40(7):1081–1094 Fischer MH, Castel AD, Dodd MD, Pratt J (2003) Perceiving numbers causes spatial shifts of attention. Nat Neurosci 6(6):555–556 Lachmair M, Dudschig C, de la Vega I, Kaup B (2014) Relating numeric cognition and language processing: do numbers and words share a common representational platform? Acta Psychol 148: 107–114 Lachmair M, Dudschig C, De Filippis M, de la Vega I, Kaup B (2011) Root versus roof: automatic activation of location information during word processing. Psychon B Rev 18: 1180–1188 Masson, MEJ, Loftus GR. (2003) Using confidence intervals for graphically based data interpretation. Can J Exp Psychol 57: 203–220
Cogn Process (2014) 15 (Suppl 1):S1–S158 Schwarz W, Keus IM (2004) Moving the eyes along the mental number line: comparing SNARC effects with saccadic and manual responses. Percept Psychophys 66: 651–664 Vigliocco G, Vinson DP, Druks J, Barber H, Cappa SF (2011) Nouns and verbs in the brain: a review of behavioural, electrophysiological, neuropsychological and imaging studies. Neurosci Biobehav Rev 35(3): 407–426 Zwaan RA, Madden CJ (2005) Embodied sentence comprehension. In Pecher D, Zwaan RA (eds) Grounding cognition: the role of perception and action in memory, language, and thinking. Cambridge University Press, Cambridge, pp 224–245
Is joint action necessarily based on shared intentions? Nicolas Lindner1, Gottfried Vosgerau Department of Philosophy, Heinrich-Heine-Universita¨t Du¨sseldorf, Du¨sseldorf, Germany Abstract: Is joint action necessarily based on shared intentions? Regarding joint action, the majority of researchers in the field assumes that underlying collective or joint intentions are the glue that holds the respective actions of the participants together (Searle 1990; Bratman 1993; Tuomela 1988). A major part of the debate thus focuses on the nature of these particular intentions. In this talk, we will describe one major account and criticize that this account cannot explain joint action as displayed by small children. Based on this critique, we will formulate an alternative view, which suggests that some nondemanding cases of (seemingly) joint action (including those displayed by small children) are rather effects of the lack of representing one’s own intentions as one’s own (it is just represented as an intention that is there). This account has the advantage of offering a way to specify the pivotal role that joint action is supposed to play in the acquisition of socio-cognitive abilities. A prominent approach to joint intentions by Michael Bratman (1993, 2009) construes shared intention as he calls them, as being derived from singular intentions, a conception of which he developed in his book ‘‘Intention, Plans, and Practical Reason’’ from 1987. In a nutshell, Bratman characterizes intentions in this book as conductcontrolling pro-attitudes, a term by Davidson (1980) describing an agent’s mental attitude directed toward an action under a certain description. For Bratman, intentions are typically parts of larger plans concerning future actions. He regards these plans as mental states, which often are only partial and involve a hierarchy of general and more specific intentions. Bratman’s account of shared intention (1993) relies on his conception of individual intentions, the attitudes of the participants in joint action, and their interrelations and is thus constructivist in nature. In his account, a shared intention doesn’t consist in a particular type of intention. He proposes a complex of intentions and attitudes that—if they have the appropriate content and function properly—do the job of a shared intention and can be identified with it. This complex is supposed to do three interrelated jobs: It should 1) coordinate the agents’ intentional actions in such a way that the joint goal can be achieved by acting together, 2) help in coordinating the relevant planning of the participants, and 3) provide a framework that helps to structure relevant bargaining. According to Bratman, fulfilling this three-fold function is a necessary condition for any kind of shared intention. With regard to a complex that does all of these jobs, Bratman suggests three sufficient conditions to describe a substantial account of shared intention. 1) There should be an individual intention of each participant of the joint action in the form of ‘I intend that we J’. 2) These individual intentions should be held in part because of and in accordance with the relevant intentions of the other partakers in the joint action. 3) The two aforementioned conditions have to be common knowledge between the participants.
S117 It is Bratman’s main argument that the described complex of interrelated intentions and attitudes functions together as one characteristic form of shared intention (2009). Due to the constructivist and functionalist nature of his approach it may yet not be the only kind of shared intention. The author himself admits the possibility that there may be other kinds and that shared intention may thus be multiply realizable. Bratman’s conception of shared intention seems to be a convincing characterization of how cognitively mature agents act together. In contrast to this, some researchers doubt whether his approach is suited to account for joint action in young children. This issue is closely related to the developmental onset of socio-cognitive abilities. The common knowledge condition of Bratman’s substantial account presupposes that the system of intentions in question is in the public domain. Furthermore, there has to be mutual knowledge of the others’ intentions plus knowledge of the others’ knowledge. The cognitive basis for common knowledge thus rests on a variety of capacities. The agents in joint action ought to have: a) the ability to form beliefs and higher-order beliefs (beliefs about beliefs), b) the ability to attribute mental states to themselves and others, and c) the capacities needed for recursive mindreading. All in all, they must thus have a robust theory of mind. With respect to this, critics of Bratman’s account state that he characterizes shared intention in a way that is too complex to accommodate for joint action of young children (Tollefsen 2005; Pacherie 2011; Butterfill 2012). Tollefsen’s (2005) critique is based on evidence suggesting that young children lack a robust theory of mind–particularly, a proper understanding of others’ beliefs. This evidence comes from different false-belief tasks (Wellman et al. 2001). Without such a proper understanding of other agent’s beliefs, so Tollefsen argues, the common knowledge condition in Bratman’s conception could not be fulfilled. Hence, children could not take part in shared intentional activities of such a sort. Similarly, Pacherie (2011) claims that Bratman’s shared intention requires cognitively sophisticated agents who have both concepts of mental states like intentions and attitudes and the ability to represent the mental states of others. From her point of view, small children lack such fully developed mentalizing and meta-representational capacities. Therefore, shared intention cannot account for joint action in young children. A problem for Bratman’s account thus stems from the fact that there is evidence of children engaging in joint activities before they develop the putatively necessary socio-cognitive abilities. Findings from developmental psychology (Brownell 2011) suggest that children engage in different forms of joint action together with adults from around 18 months of age and, from the end of the 2nd year of life, also with peers. We will show that said criticisms rest on rather shaky grounds. First, they both attack Bratman’s substantial account, which only presents sufficient conditions for the presence of a shared intention. Thus, there might be other constructions in a Bratmanian sense that avoid these flaws. Furthermore, both critiques rely on controversial empirical claims about the onset of children’s mindreading capacities—for example, with respect to the starting point of false belief understanding in children (Bargailleon et al. 2010; De Bruin and Newen 2012) and the development of an early understanding of common knowledge (Carpenter 2009). Thus, the critiques by Pacherie and Tollefsen do not present convincing arguments against Bratman’s account per se. Still, they highlight an important issue by questioning the cognitive standards imposed on participating agents in joint action. Butterfill (2012) takes a different route in criticizing Bratman’s approach. His objection focuses on the necessary conditions for shared intention: the functional roles that shared intention is supposed to play. Butterfill claims that the coordinating and structuring of relevant bargaining, which shared intention is supposed to ensure, sometimes require monitoring or manipulating of other agents’ intentions. With regard to accounts that stress the importance of joint action for cognitive and socio-cognitive development
123
S118 in infants (Tomasello et al. 2005; Moll and Tomasello 2007), joint action would thus presuppose psychological concepts and capacities whose development it should explain in the first place. The contribution of joint activities to the development of our cognitive capacities is the core argument of Tomasello and colleagues’ hypothesis on shared intentionality. As long as one stresses its role in cognitive and socio-cognitive development, Butterfill claims that early joint action of children can hence not involve shared intention in Bratman’s sense. Bratman’s conception can thus not account for children’s joint actions. At least, if it is supposed to explain the development of their understanding of minds. Yet, his approach is suited to explain joint action of adults as mature cognitive agents. Especially, this is the case for those kinds of joint action that involve planning, future-directed intentions and deliberation. We will conclude our talk by offering an alternative account of children’s ability for joint action, which turns, in a way, the circularity upside down: If joint action is indeed pivotal for the development of socio-cognitive abilities, they cannot be developed in small children. Thus, joint action as displayed by small children has to be grounded in other abilities. Our proposal is that it is the lack of the concept of a mental state (esp intentions) that produces behavior which looks like joint action (we will not discuss whether the term should be applied to these cases or not). If a child has not yet learned that a mental state is something that ‘‘belongs’’ to single persons, it cannot be said to have acquired the concept of a mental state. However, the child might be, at the same time, able to introspect the content of the own intentions, such that the child’s introspection can be paraphrased as ‘‘there is the intention to J’’. In other words, the child has not yet learned to make a difference between the own intentions and those of others. The effect of this lack of abilities will result in a behavior that looks like joint action (at least in cases in which the intention of the adult and the child match). Such behavior might be initiated by different triggers in the surrounding world that establish a common goal in the first place. Candidates for this could be pointing gestures, affordances and alignment between the agents. This account does not only offer new perspectives for the explanation of autism (Frith 1989; Vosgerau 2009), it also offers a way to specify the thesis that (seemingly) joint action is pivotal to the acquisition of socio-cognitive abilities: Joint action sets up an environment in which children are able to gradually learn that intentions can differ between individuals. The result of this learning phase will ultimately be the acquisition of the concept of a mental state, which includes that mental states belong to persons and that thus mental states can differ between individuals (this ‘‘knowledge’’ is then tested in the ‘‘false-belief-task’’). In other words, the learning of a theory of mind starts with acquiring the concept of a mental state, and this concept can be best acquired in (seemingly) joint action scenarios, in which children directly experience the effects of differing mental states (intentions and beliefs). Accordingly, empirical research has already suggested that the acquisition of mental state concepts is dependent on the use of mental state terms (Rakoczy et al. 2006), which are presumably most often used in joint action scenarios. Some empirical results have been interpreted to show that very young children already possess the socio-cognitive abilities needed for cooperative activities and act on a rather sophisticated understanding of the mental states of self and other (Carpenter 2009). Following this line of argument, researchers propose that infants already understand others’ knowledge and ignorance (Liszkowski et al. 2008), they can act on a shared goal (Warneken et al. 2006; Warneken and Tomasello 2007), and exploit the common ground they shared with an adult (Liebal et al. 2009; Moll et al. 2008). While appreciating the importance of this research as such, we will present alternative interpretations of these findings that are cognitively less demanding and thus consistent with our proposal.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Our alternative account is primarily designed to explain the behavior of small children. However, we point to the possibility that non-demanding cases of cooperation (e.g. to buy an article in a grocery) can be explained by similar mechanisms in adults. In such cases, adults would not explicitly represent their own intentions as their own intentions, thereby generating actions that are structurally similar to those of small children. Nevertheless, other more complex cases of joint action certainly also exist in adults. In the light of our proposal, we thus also conclude that Bratman’s account of shared intention should not be abandoned altogether. Although a uniform account of joint action for both children and mature agents would have the benefits of being parsimonious, candidates for such a comprising explanation (Tollefsen and Dale 2012; Vesper et al. 2010; Gold and Sugden 2007) do not seem to have the resources to explain the development of qualitatively differing stages of joint action. References Baillargeon R, Scott RM, He Z (2010) False-belief understanding in infants. Trend Cogn Sci 14(3):110–118. doi:10.1016/j.tics.2009. 12.006 Bratman M (1987) Intention, plans, and practical reason. C S L I Publications/Center for the Study of Language & Information Bratman M (1993) Shared intention. Ethics 104(1):97–113 Bratman M (2009) Shared agency. In: Philosophy of the social sciences: philosophical theory and scientific practice. Cambridge University Press Brownell CA (2011) Early developments in joint action. Rev Philos Psychol, 2(2):193–211. doi:10.1007/s13164-011-0056-1 Butterfill S (2012) Joint action and development. Philos Quart 62(246):23–47. doi:10.1111/j.1467-9213.2011.00005.x Carpenter M (2009) Just how joint is joint action in infancy? Top Cogn Sci 1(2):380–392. doi:10.1111/j.1756-8765.2009.01026.x Davidson D (1980/2001). Essays on actions and events, 2nd ed. Oxford University Press, USA De Bruin LC, Newen A (2012) An association account of false belief understanding. Cognition 123(2):240–259. doi:10.1016/j.cognition. 2011.12.016 Frith U (1989/2003) Autism: explaining the enigma, 2nd edn. Blackwell Publ, Malden Gold N, Sugden R (2007) Collective intentions and team agency. J Philos 104(3):109–137 Liebal K, Behne T, Carpenter M, Tomasello M (2009) Infants use shared experience to interpret pointing gestures. Dev Sci 12(2):264–271. doi:10.1111/j.1467-7687.2008.00758.x Liszkowski U, Carpenter M, Tomasello M (2008) Twelve-month-olds communicate helpfully and appropriately for knowledgeable and ignorant partners. Cognition 108(3):732–739. doi:10.1016/ j.cognition.2008.06.013 Moll H, Carpenter M, Tomasello M (2007) Fourteen-month-olds know what others experience only in joint engagement. Dev Sci 10(6):826–835. doi:10.1111/j.1467-7687.2007.00615.x Moll H, Tomasello M (2007) Cooperation and human cognition: the Vygotskian intelligence hypothesis. Philos Trans R Soc B Biol Sci 362(1480):639–648. doi:10.1098/rstb.2006.2000 Pacherie E (2011) Framing Joint Action. Rev Philos Psychol 2(2):173–192 Rakoczy H, Tomasello M, Striano T (2006) The role of experience and discourse in children’s developing understanding of pretend play actions. Br J Dev Psychol 24(2):305–335. doi:10.1348/02615100 5X36001 Searle J (1990) Collective intentions and actions. In Cohen P, Morgan J, Pollack ME (Hrsg.), Intentions in Communication. Bradford Books, MIT Press, Cambridge Tollefsen D (2005) Let’s pretend! children and joint action. Philos Soc Sci 35(1):75–97. doi:10.1177/0048393104271925
Cogn Process (2014) 15 (Suppl 1):S1–S158 Tollefsen D, Dale R (2012) Naturalizing joint action: a process-based approach. Philos Psychol 25(3):385–407. doi:10.1080/095150 89.2011.579418 Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the origins of cultural cognition. Behav Brain Sci 28(5):675–691 Tuomela R, Miller K (1988) We-intentions. Philos Stud 53(3): 367–389 Vesper C, Butterfill S, Knoblich G, Sebanz N (2010) A minimal architecture for joint action. Neural Netw 23(8–9):998–1003. doi: 10.1016/j.neunet.2010.06.002 Vosgerau G (2009), ‘‘Die Stufentheorie des Selbstbewusstseins und ihre Implikationen fu¨r das Versta¨ndnis psychiatrischer Sto¨rungen’’. J Fu¨r Philos Psychiatrie 2 Warneken F, Chen F, Tomasello M (2006) Cooperative activities in young children and chimpanzees. Child Dev 77(3):640–663. doi: 10.1111/j.1467-8624.2006.00895.x Warneken F, Tomasello M (2007) Helping and cooperation at 14 months of age. Infancy 11(3):271–294. doi:10.1111/j.1532-7078.2007.tb 00227.x Wellman HM, Cross D, Watson J (2001) Meta-analysis of theory-ofmind development: the truth about false belief. Child Dev 72(3):655–684. doi:10.1111/1467-8624.00304
A general model of the multi-level architecture of mental phenomena. Integrating the functional paradigm and the mechanistic model of explanation Mike Lu¨dmann University of Duisburg-Essen, Germany The central aim of this contribution is to provide a conceptual foundation of psychology in terms of the formulation of a general model of an architecture of mental phenomena. It will be shown that the mechanistic model of explanation (Bechtel, Richardson 1993; Machamer, Darden und Craver 2000; Bechtel 2007, 2008, 2009; Craver 2007) offers an appropriate founding approach to psychology as well as their integration within the framework of cognitive and brain sciences. Although the computational model of mind provides important models of mental properties and abilities, it fails to provide an adequate multi-level model of mental properties. The mechanistic approach, however, can be regarded as a conceptually coherent and scientifically plausible extension of the functional paradigm (see Polger 2004; Eronen 2010). While a functionalist conception of the mind mostly focuses on the mysterious relationship of mental properties as abstract or second-order properties to their physical realizers (if such issues are not generally excluded), the mechanistic approach model allows establishing a multi-level architecture of mental properties and their unambiguous localization in the overall scientific system. The functionalist models of the mind are usually based on the computer metaphor of man that construes human beings as information processing systems. They postulate relatively abstract theoretical models of mental processes that allow generally very reliable predictions of subsequent behavior of the system under consideration of known input variables. The models provide a way to put some cognitive (functionalist) operators into the black box of behaviorism like thinking, decision making, planning. Taking into account the current interdisciplinary research on the mind, the functionalist conception of mind defining these operators as abstract information processing, which can be described independently of neuroscientific constraints, is a problem. If the question is raised, how the connection between functional models and the mind is established, Marr (1982) proposes that computational
S119 processes of his model of visual information processing (e.g., the generation of a three-dimensional depth structure) are specified by particular formal algorithms, which are physically implemented in the human brain. Therefore it is recognized, that functional processes also have a physical reality, but functional models fail to provide a framework for the exact circumstances, conditions, constraints, etc. of such implementation relations. Admittedly, the connectionist approach has fulfilled this task better by generating models of neural networks that are more likely to describe the actual processes in our minds (see Rumelhart, McClelland 1986; Smolensky 1988), but ultimately does not offer clear multi-level model of the mind either. It is important that the way of physical implementation as described by Marr is usually understood in terms of physical realization. Therefore, the causal profile of an abstract functional property (behavioral inputs and outputs) must be determined by a conceptual analysis in order to identify those physical (neural) structures that have exactly that causal profile (cf. Levine 1993; Kim 1998, 2005). Maybe the realization theory is intended to provide an explanatory approach of how abstract, functionally characterized properties as postulated by cognitive sciences can be a part of the physical world. An abstract, theoretical phenomenon is realized (quasi materialized) in this sense through concrete physical conditions, while it can be different physical systems to bring the computational or connectionist formalism into the world (see Fodor 1974). The ontological status of an abstract functional description or a second-order property remains highly questionable. In contrast, much is gained if the functionalist approach is extended and partially adjusted by the mechanistic rendition of mental properties. A mechanism can be understood as ‘‘a set of activities organized such that they exhibit the phenomenon to be explained’’ (Craver 2007, p 5). The mechanistic approach individuates a phenomenon about which tasks or causal roles it holds for the system concerned. So if the mechanism behind a phenomenon is explored, one has explained the phenomenon itself. As Bechtel (2008) says, a mechanism is a structure ‘‘performing a function in virtue of its component parts, component operations, and their organization’’ (p 13). Figure 1 shows the general formal structure of mechanistic levels. To the explanatory phenomenon at the top of the mechanism (S) is w. By contributed suffix ‘‘-ing’’ and the course of the arrows the process-related nature of mechanisms should be expressed. The phenomenon w can decomposed into subcomponents. Craver used X as a term for the functioning as a component of W entities and u as a name for their activity patterns. While functionalism respectively realization theory focuses on the relationship of abstract information processing and certain processes in the brain, the
Fig. 1 Formal structure of a mechanism (from Craver 2007, p 189)
123
S120 mechanistic approach extends this concern to a question of embedding a given (mental) phenomenon in a structural hierarchy of natural levels of organization characterized by the part-whole relationship. If we take a cognitive property like spatial orientation or spatial memory, it is not simply the question of which brain structure realized this property, rather than it has to be shown which causally relevant mechanisms are installed at various levels of a mereologically construed mechanistic hierarchy (see Craver 2007). Thus the functional structure, as described by cognitive science, is undoubtedly an explanatorily essential description of this mental property. So we can, for example, explain the behavior of a person in a given situation in terms of the components and predictions of working memory theory (Baddeley 1986). But the same mental event can described at different levels of organization. In this way the mental event has a neuronal structure which, among other things, consists of a hippocampal activity. In addition, the mental property has a molecular ‘‘reality’’ which is primarily characterized by the NMDA receptor activation and so on. So a mental phenomenon has a (potentially infinite) sequence of microstructures, none of them can be understood as the actual reality of the target property. From the fact that the installed part-whole relation implies a spatio-temporal coextensivity of the different microstructures, so I will argue, it can be deduced that we have a mereologically based form of psychophysical identity. Nevertheless this identity thesis does not have the crude reductionistic implications like the ‘‘classical’’ philosophical thesis of psychophysical identity (see Place 1956; Smart 1959). Likewise, it can be shown that the dictum that only functionalism guarantees the ‘‘autonomy’’ of psychology (Fodor 1974) and it is jeopardized by every conception of psychophysical identity, is fundamentally wrong. Quite the opposite is true. If we strictly follow Fodor, then psychological concepts and theories are to be preferred, which have a small inter-theoretical fit or low degree of correspondence to physical processes. Especially under these conditions, psychology risks to fall prey to crudely reductionist programs such as the new wave reductionism. This means that an inferior theory, which does not have good inter-theoretical fit, should be replaced by lower-level theories (Bickle 1998, 2003). And even worse because of the rejection of any psychophysical identity conception psychologist would have to accept a microphysicalism, entailing micro levels have an ontological and explanatory priority. On the basis of the mechanistic approach (and its identity-theoretical interpretation) both the integrity of psychology can be justified as well as the inter-theoretical fit of their concepts and theories. Mental properties form a higher level in the natural organization of a (human) organism, but at the same time they form a mutually inseparable unit with its physical microstructures. It is the mental properties that characterize diffuse nexus of neuronal events in terms of certain functional units in the first place. In this sense the mind is the structure-forming or shaping principle at all levels of the natural organization of the brain. Not despite but because of its coextensivity with diverse natural organizational levels is the mental both a real and a causally potent phenomenon. Despite the fact that with recourse to these micro levels of e.g. neurobiology some characteristics of mental phenomena can be well explained, there is neither an ontological nor an explanatory primacy of the micro levels or their explanations. The adoption of such primacy is merely the product of a cognitive bias, a misguided interpretation of scientific explanations and the process of scientific knowledge discovery (Wimsatt 1976, 1980, 2006).
References Baddeley AD (1986) Working memory. Oxford University Press, Oxford
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Bechtel W (2007) Reducing psychology while maintaining its autonomy via mechanistic explanation. In M. Schouten, H. Looren de Jong (eds) The matter of the mind: philosophical essays on psychology, neuroscience and reduction. Blackwell, Oxford, pp 172–198 Bechtel W (2008) Mental mechanisms: philosophical perspectives on cognitive neuroscience. Psychology Press, New York Bechtel W (2009) Looking down, around, and up: Mechanistic explanation in psychology. Philos Psychol 22, 543–564 Bechtel W, Richardson RC (1993) Discovering complexity: decomposition and localization as strategies in scientific research. MIT Press, Cambridge Bickle J (1998) Psychoneural reduction: the new wave. MIT Press, Cambridge Bickle J (2003) Philosophy and neuroscience: a ruthlessly reductive account. Kluwer, Dordrecht Craver CF (2007) Explaining the brain. Mechanisms and the Mosaic Unity of Neuroscience. Clarendon Press, Oxford Eronen MI (2010) Replacing functional reduction with mechanistic explanation. Philosophia Naturalis 47/48:125–153 Fodor FA (1974) Special sciences (or the Disunity of Science as a Working Hypothesis). Synthese 28:97–115 Kim J (1998) Mind in a physical world. MIT Press, Cambridge Kim J (2005) Physicalism, or something near enough. Princeton University Press, Princeton Levine J (1993) On leaving out what it’s like. In: Davies M, Humphreys GW (eds) Consciousness. Psychological and Philosophical Essays. Blackwell, Oxford, S. 121–136 Machamer P, Darden L, Craver CF (2000) Thinking about mechanisms. Philos Sci 67:1–25 Marr D (1982) Vision. Freeman and Company, New York Place UT (1956) Is consciousness a brain process? Br J Psychol 47:44–50 Polger, T. W. (2004) Natural Minds. Cambridge: MIT Press. Rumelhart DE, McClelland JL (1986) Parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge Smart JJC (1959) Sensations and brain processes. Philos Rev 68:148–156 Smolensky P (1988) On the proper treatment of connectionism. Behav Brain Sci 11:1–23 Wimsatt WC (1976) Reductionism, levels of organization, and the mind–body problem. In Globus G, Maxwell G, Savodnik I (eds) Consciousness and the Brain. Plenum, New York, pp 205–267 Wimsatt WC (1980) Reductionistic research strategies and their biases in the units of selection controversy. In Nickles T (ed), Scientific discovery: case studies. D. Reidel, Dordrecht, pp 213–259 Wimsatt WC (2006) Reductionism and its heuristics: making methodological reductionism honest. Synthese 151:445–475
A view-based account of spatial working and long-term memories: Model and predictions Hanspeter A. Mallot, Wolfgang G. Ro¨hrich, Gregor Hardiess Cognitive Neuroscience, Dept. of Biology, University of Tu¨bingen, Germany Abstract Space perception provides egocentric, oriented views of the environment from which working and long-term memories are constructed. ‘‘Allocentric’’ (i.e. position-independent) long-term memories may be organized as graphs of recognized places or views but the interaction of such cognitive graphs with egocentric working memories is unclear. Here, we present a simple coherent model of view-based working and long-term memories, and review supporting evidence
Cogn Process (2014) 15 (Suppl 1):S1–S158 from behavioral experiments. The model predicts (i) that within a given place, memories for some views may be more salient than others, (ii) imagery of a target place should depend on the location where the recall takes place and (iii) that ecall avors views of the target place which would be obtained when approaching it from the current recall location. Keywords Spatial cognition, Working memory, Imagery, View-based representation, Spatial updating Introduction Sixteen years before his famous paper on the cognitive map, Edward C. Tolman gave an account of rat spatial learning in terms of what he called the ‘‘means- ends-field’’ (Tolman 1932, 1948) of which a key diagram is reproduced in Fig. 1. The arrows indicate ‘‘means- endsrelations’’, i.e. expectations that a rat has learned about which objects can be reached from which other ones, and how. In modern terms, the ‘‘Means objects’’ (MO in the figure) are intermediate goals or representational states that the rat is in or expects to get into. This graphapproach to spatial memory has later been elaborated by Kuipers (1978) and is closely related to the route vs. maps distinction discussed by O’Keefe and Nadel (1978). Behavioral evidence for graphlike organization of human spatial memory has been reviewed e.g., by Wang, Spelke (2002) or Mallot, Basten (2009). The graph-based approach to cognitive mapping, powerful as it may appear, leaves open a number of important questions, two of which will be addressed in this paper. First, what is the nature of the nodes of the graph? In Tolman’s account, the ‘‘means objects’’ are ‘‘intervening’’ objects, the passage along each object being a ‘‘means’’ to reach the next one or the eventual goal. Kuipers (1978) thinks of the nodes as places defined by a set of sensory stimuli prevailing at each place. The resulting idea of a place-graph can be relaxed to a view-graph in which each node represents an observer pose (position plus orientation), again characterized by sensory input, which are now egocentric, oriented views (Scho¨lkopf, Mallot 1995). The second question concerns the working memory stage needed, among other things, as an interface between the cognitive graph and perception and behavior, particularly in the processes of planning routes from long-term memory and of encoding new spatial information into long-term memory. For such working memory structures, local, metric maps are generally assumed, representing objects and landmarks at certain egocentric locations (Byrne et al. 2007, Tatler, Land 2011, Loomis et al. 2013). While these models offer plausible explanations for many effects in spatial behavior, they are hard to reconcile with a view-based rather than object-based organization long-term memory which will have to interact with the working
Fig. 1 Tolman’s notion of spatial long-term memory as a ‘‘meansends-field’’ (from Tolman 1932). This seems to be the first account of the ‘‘cognitive map’’ as a graph of states (‘‘objects’’) and actions (‘‘means-ends-relations’’) in which alternative routes can be found by graph search
S121 memory. As a consequence, computationally costly transformations between non-egocentric long-term memories and egocentric working memories are often assumed. In this paper, we give a consistently view-based account of spatial working- and long-term memories and discuss a recent experiment supporting the model. View-based spatial memory Places By the term ‘‘view’’, we denote an image of an environment taken at a view-point x and oriented in a direction u. Both x and u may be specified with respect to a reference frame external to the observer, but this is not of great relevance for our argument. Rather, we assume that each view is stored in relation to other views taken at the same place x but with various viewing directions u. The views of one place combine to a graph with a simple ring topology where views taken with neighboring viewing directions are connected by a graph link (see Fig. 2a). This model of a place representation differs from the well-known snapshot-model from insect navigation (Cartwright, Collett 1982; for the role of snapshots in human navigation, see Gillner et al. 2008) by replacing the equally sampled, panoramic snapshot by a set of views that may sample different viewing directions by different numbers of views. It is thus similar to view-based models of object recognition, where views may also be sampled inhomogeneously over die sides or aspects of an object (Bu¨lthoff, Edelman 1992). As in object recognition, places may therefore have ‘‘canonical views’’ from which they are most easily recognized. Long-term memory The graph approach to spatial long-term memory has been extended from place-graphs to graphs of oriented views by Scho¨lkopf, Mallot (1995). As compared to the simple rings sufficient to model place memory, we now allow also for view-to-view links representing movements with translatory components such as ‘‘turn left and move ahead’’ or ‘‘walk upstairs’’. The result is a graph of views with links labeled by egocentric movements. Scho¨lkopf, Mallot (1995) provide a formal proof that this view-graph contains the same information as a graph of places with geocentric movement labels.
Fig. 2 Overview of view-based spatial memory. a Memory for places is organized as a collection on views obtained from a place, arranged in a circular graph. View multiplicity models salience of view orientation. b Spatial long-term memory organized as a graph of views and movements leading to the transition from one view to another. c View-based spatial working memory consisting of a subgraph of the complete view-graph, centered at the current view and including an outward neighborhood of the current view. For further explanation see text. (Tu¨bingen Holzmarkt icons are sections of a panoramic image retrieved by permission from www.kubische-panoramen.de. Map source Stadtgrundkarte der Universita¨tsstadt Tu¨bingen, Stand: 17.3.2014.)
123
S122 A sketch of the view-graph for an extended area is given in Fig. 2b. The place-transitions are shown as directed links whereas the turns within a place work either way. In principle, the view-graph works without metric information or a global, geocentric reference frame, but combinations with such data types are possible. Working memory Spatial working memory tasks may or may not involve the interaction with long-term memory. Examples for ‘‘stand-alone’’ processes in working memory include path integration, perspective taking, and spatial updating while spatial planning requires interaction with spatial long-term memory. Models of spatial working memory presented e.g. by Byrne et al. (2007), Tatler, Land (2011), or Loomis et al. (2013) assume a local egocentric map in which information about landmarks and the environment is inscribed. In contrast, Wiener, Mallot (2003) suggested a working-memory structure formed as local graph of places in which more distant places are collapsed into regional nodes. In order to reconcile this approach with the view-graph model for long-term memory, we consider a local subgraph of the view-graph containing (i) the current view, (ii) all views connected to this view by a fixed number of movement steps, and (iii) some local metric information represented either by egocentric position-labeling of the included views or by some view transformation mechanism similar to the one suggested in object recognition by Ullman, Basri (1991), or both. This latter component is required to account for spatial updating which is a basic function of spatial working memory. Frames of reference While perceptions and spatial working memory are largely organized in an egocentric way, long-term memory must be independent of the observer’s current position and orientation and is therefore often called ‘‘allo’’- or ‘‘geo’’-centric. These terms imply that a frame of reference is used, much as a mathematical coordinate system within which places are represented by their coordinates (e.g., Gallistel 1990). Clearly, the assumption of an actual coordinate frame in the ‘‘mental map’’ leads to severe problems, not the least of which are the representation of (coordinate) numbers by neurons and the choice of the global coordinate origin. The view-graph approach avoids these problems. Long-term memory is independent of ego’s position and orientation since the views and their connections are carried around like a portfolio, i.e. as abstract knowledge that does not change upon ego’s movements. Working memory may rightly be called egocentric since it collects views as they appear from the local or a close-by position. In the view-based model, the ‘‘transform’’ between the poseindependent long-term memory and the pose-dependent (‘‘egocentric’’) working memory reduces in the view based model to a simple selection process of the views corresponding to the current pose and their transfer into working memory. Predictions and experimental results The sketched model of spatial memory interplay makes predictions about the recollections that subjects may make of distant places. The task of imagining a distant place is a working-memory task where an image of that place may be built by having an imagined ‘‘ego’’ move to the target place. Bisiach, Luzzatti (1978) show that hemilateral neglect in the recall of landmarks around the Piazza del Duomo in Milan, Italy, affects the landmarks appearing left when viewed from an imagined view-point, but not the landmarks on the respective right side. This result can be expressed by assuming that neglect entails a loss of the left side of spatial working memory, into which no longterm memory items can be loaded; the long-term memory items themselves are unaffected by the neglect condition. For the imagery of distant places, two mechanisms can be assumed. In a ‘‘mental-travel’’ mechanism, an observer might imagine a travel from his or her current position to the requested target place, generate a working memory and recall the image from this working memory. In a ‘‘recall from index’’-mechanism, place names might be recalled from long-term memory without mental travel, e.g.,
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 by some sort of indexing mechanism which then is likely to recall a canonical view of the target place. The mental-travel mechanism is illustrated in Fig. 3. Assume that the subject is currently located at position A in a familiar downtown environment. When asked to recall a view of the central square appearing in Fig. 3, mental travel will generate a southward view in spatial working memory which is then recalled. In contrast, when asked at position B, the mental-travel mechanism will yield a westward view, and so on. We therefore predict that recall, or imagery of a distant place will result in oriented views whose orientation depends on the interview location. Preliminary data (Ro¨hrich et al. 2013) support this prediction: passers-by who were approached in downtown Tu¨bingen and asked to sketch a map of the ‘‘Holzmarkt’’ (a landmark square in central Tu¨bingen), produced maps whose orientation depended on the interview site. As predicted, orientations were preferred that coincided with the direction of approach from the current interview location. This effect was not found for additional interview-locations some 2 km away from downtown, indicating that here a different recall mechanism might operate. Oriented recall can also be triggered by explicitly asking subjects to perform a mental travel before sketching a map Basten et al. (2012) asked subjects to imaging walking one of two ways in downtown Tu¨bingen, passing the Holzmarkt square either in westward or eastward direction. In this phase of the experiment, the Holzmarkt was not mentioned explicitly. When asked afterwards, to draw sketches of the Holzmarkt, produced view orientations were clearly biased towards the view orientation occurring in the respective direction of mental travel carried out by each subject. This indicates that oriented view-like memories are generated during mental travel and affect subsequent recall and imagery. Conclusion We suggest that spatial long-term memory consists of a graph of views linked together according to the movements effecting each view transition. Working memory contains local views as well as those nearby views which are connected to one of the local views. When walking onwards, views of approached places are added from long-term memory, thereby maintaining orientation continuity (spatial updating). In recall, views are selected from either working or
Fig. 3 ‘‘Mental-travel mechanism’’ of spatial recall. When located at a nearby place, but out of sight of the target (places A–D), recall by mental travel towards the target place will result in different views. Preliminary data suggest that this position-dependence of spatial recall exists. (For image sources see Fig. 2.)
Cogn Process (2014) 15 (Suppl 1):S1–S158 long-term memory. For places more than 2 km away, recall reflects the long-term memory contents only. Acknowledgment WGR was supported by the Deutsche Forschungsgemeinschaft within the Center for Integrative Neuroscience (CIN) Tu¨bingen. References Basten K, Meilinger T, Mallot HA (2012) Mental travel primes place orientation in spatial recall. Lecture Note Artif Intell 7463:378–385 Bisiach E, Luzzatti C (1978) Unilateral neglect of representational space. Cortex 14:129–133 Bu¨lthoff HH, Edelman S (1992) Psychophysical support for a two– dimensional view interpolation theory of object recognition. Proc Natl Acad Sci 89:60–64 Byrne P, Becker S, Burgess N (2007) Remembering the past and imagining the future: A neural model of spatial memory and imagery. Psych Rev 114:340–375 Cartwright BA, Collett TS (1982) How honey bees use landmarks to guide their return to a food source. Nature 295:560–564 Gallistel CR (1990) The organization of learning. The MIT Press, Cambridge Gillner S, Weiß AM, Mallot HA (2008) Visual place recognition and homing in the absence of feature-based landmark information. Cogn 109:105–122 Kuipers B (1978) Modeling spatial knowledge. Cogn Sci 2:129–153 Loomis JM, Klatzky RL, Giudice NA (2013) Representing 3D space in working memory: spatial images from vision, hearing, touch, and language. In: Lacey S, Lawson R (eds) Multisensory imagery: theory and applications. Springer, New York Mallot HA, Basten K (2009) Embodied spatial cognition: biological and artificial systems. Image Vision Comput 27:1658–1670 O’Keefe J, Nadel L (1978) The hippocampus as a cognitive map, chapter 2. Spatial Behaviour. Clarendon, Oxford Ro¨hrich WG, Binder N, Mallot HA (2013) Imagery of familiar places varies with interview location. In Proceedings of 10th Go¨ttingen meeting of the German Neuroscience Society, pp T24–2C. www. nwg-goettingen. de/2013/upload/file/Proceedings NWG2013.pdf Scho¨lkopf B, Mallot HA (1995) View-based cognitive mapping and path planning. Adapt Behav 3:311–348 Tatler BW, Land MF (2011) Vision and the representation of the surroundings in spatial memory. Phil Trans R Soc Lond B 366:596–610 Tolman EC (1932) Purposive behavior in animals and men, chapter XI. The Century Co., New York Tolman EC (1948) Cognitive maps in rats and man. Psych Rev 55:189–208 Ullman S, Basri R (1991) Recognition by linear combinations of models. IEEE Trans Pattern Recogn Machine Intel 13:992–1006 Wang RF, Spelk ES (2002) Human spatial representation: insights from animals. Trends Cogn Sci 6:376–382 Wiener JM, Mallot HA (2003) ‘Fine-to-coarse’ route planning and navigation in regionalized environments. Spatial Cogn Comput 3:331–358
Systematicity and Compositionality in Computer Vision Germa´n Martı´n Garcı´a, Simone Frintrop, Armin B. Cremers Institute of Computer Science III, Universita¨t Bonn, Germany Abstract The systematicity of vision is a topic that has been discussed thoroughly in the cognitive science literature; however, few accounts of it
S123 exist in relation to computer vision (CV) algorithms. Here, we argue that the implications of the systematicity of vision, in terms of what behavior is expected from CV algorithms, is important for the development of such algorithms. In particular, the fact that systematicity is a strong argument for compositionality should be relevant when designing computer vision algorithms and the representations they work with. In this paper, we discuss compositionality and systematicity in CV applications and present a CV system that is based on compositional representations. Keywords Systematicity, Compositionality, Computer Vision Systematicity and Compositionality In their seminal paper (Fodor and Pylyshyn 1988), Fodor and Pylyshyn address the question of systematicity of cognition. Systematicity is the property by which related thoughts or sentences are understood. Anyone able to understand the sentence ‘‘John loves the girl’’ should be able to understand the related sentence ‘‘The girl loves John’’. This can be explained because both sentences are syntactically related. It is because there is a structure on the sentences that language, and thought, exhibit systematic behavior. The compositionality principle states that the meaning, or the content, of a sentence is derived from the semantic contribution of its constituents and the relations between them (Szabo´ 2013). It is because John, the girl, and loves make the same semantic contribution to the sentence ‘‘John loves the girl’’, and to ‘‘The girl loves John’’, that we are able to systematically understand both of them. In the case of language, systematicity is achieved by a compositional structure of constituents. In general, systematicity is a strong argument for compositionality (Szabo´ 2013): we are able to understand an immense number of sentences which we have never seen before. This can be extended to vision: we are able to make sense of scenes we have never seen before because they are composed of items we know. The systematicity of vision is defended by several authors. Already in (Fodor and Pylyshyn 1988), Fodor and Pylyshyn foresee that systematicity is probably a general property of cognition that is not limited to verbal capabilities. In the cognitive science literature, there are several arguments that support that vision is systematic (Aparicio 2012; Tacca 2010): ‘‘if a subject is capable of visually representing a red ball then he must be capable of representing: i) the very same red ball from a large number of different viewpoints (and retinal inputs); ii) a number of similar red balls […]; and iii) red objects and ball- shaped objects in general.’’ (Aparicio 2012). In this paper, we are concerned with the sort of systematic behavior that should be expected when a scene is observed from different points of view: a systematic CV algorithm should be able to determine the visual elements that compose the images and find the correspondences between them over time. Some authors claim that systematicity in vision can be achieved without having compositionality (Edelman and Intrator 2003, 2000). However, the models they provide have not shown to be applicable in real world CV problems. We argue that from a computer scientist point of view, recurring to compositionality is beneficial when designing CV algorithms. Compositionality in Computer Vision Algorithms The systematicity problem is rarely addressed in computational models of vision. In Edelman and Intrator (2000), the authors acknowledge that structural descriptions are the preferred theory about human vision that allows for view- point abstraction and novel shape recognition. In the structural approaches to vision, the visual information is explained in terms of atomic elements and the spatial relations that hold between them (Edelman 1997). One example is the Recognition-by-Components theory of Biederman (1987). In this theory, object primitives are represented by simple geometric 3D components called geons. However, extracting such primitive elements from images is by no means a trivial task in CV. Approaches that attempt to extract such primitives to explain the visual phenomena
123
S124 are hard to realize in practice, and according to Andreopoulos, Tsotsos (2013) there is no method that works reliably with natural images (Andreopoulos and Tsotsos 2013). Here, we suggest to generate such primitive elements by grouping mechanisms realized by segmentation methods which are well investigated in CV. In the following section, we propose a computer vision system that bases on such perceptually coherent segments to represent scenes in a compositional way. A Compositional Approach for Visual Scene Matching Here, we present a compositional vision system that is able to represent a scene in terms of perceptually coherent components and the relations between them with help of a graph representation. A graph matching algorithm enables to match components between different viewpoints of a scene and, thus, enables a scene representation that is temporally consistent. In contrast to geons, our segments are easily extracted with standard segmentation algorithms; we use the wellknown Mean Shift segmentation algorithm (Comaniciu and Meer 2002). Mean Shift produces a segmentation based on the proximity of pixels in spatial and color spaces. We construct a graph where the nodes represent segments, and the edges the neighborhood of segments. We use labeled edges, where the labels correspond to the relations between segments. These are of two types, part of and attached to, and can be obtained automatically from the image by simple procedures. To compute whether two segments that share a common border (attached to relation) it is enough to perform two morphological operations: first to dilate, and then intersect both segments. The remaining pixels will constitute the shared contour and will indicate that this relation is present. To find whether segment A is part of segment B is enough to check whether the outer contour of segment B is the same as the outer contour of the union of A and B. Once the graphs are built, we can apply a graph matching algorithm to establish correspondences between nodes, and thus, between segments. Suppose we have two graphs G1 ¼ ðV1 ; E1 ; X1 Þ and G2 ¼ ðV2 ; E2 ; X2 Þdefined by a set of nodes V, edges E, and attributes measured on the nodes X. We want to find a labelling function f that assigns nodes from G1 to nodes in G2: f:G1 ? G2. We base our approach for matching on (Wilson and Hancock 1997). The authors propose a relaxation algorithm for graph matching that locally updates the label of each node based on an energy functional F defined on the labelling function f. By defining F ðf Þ as the maximum
Cogn Process (2014) 15 (Suppl 1):S1–S158 a posteriori probability of the labelling given the measurements F ð f Þ ¼ Pðf jX1 ; X2 Þ, and by applying Bayes’ rule, we get: Pðf jX1 ; X2 Þ ¼
PðX1 ; X2 jf ÞPðf Þ pðX1 ; X2 Þ
ð1Þ
Hereby, P(X1, X2|f) is the appearance term that denotes the probability that the nodes of a given match f have certain attributes X1and X2: we used colour average and dimensions of the minimum fitting rectangle as attributes. P(f) is the structural term and is high if a matching preserves the structure of the graph; for this term to have a high value, if node A is mapped to A0 , then the neighbors of A should be mapped to the neighbors of A0 . The algorithm works by iteratively assigning to each node u in G1, the node v in G2 that maximises Equation 1: f ðuÞ ¼ argmax pðxu ; xv ju; vÞPð f Þ:
ð2Þ
v2V2
We extended the original algorithm so that it is able to deal with directed graphs as well as with labeled edges. The labels represent the two different relations: part of and attached to. The directions of the edges denote the order in which the segments appear in the relation predicates, e.g., in the part of relation, the edge points towards the node that contains the other, and in the attached to the edge points towards the node that is either under or on the right side of the other. The details of the algorithm are out the scope of this paper and can be found in (Garcia 2014). We evaluated the algorithm on a real-world video sequence recorded at our office by matching pairs of consecutive and nonconsecutive frames. In the first case, 84 % of the segments were correctly matched, and in the second case, 57 %. Some non-consecutive frames are shown in Fig. 1: the matched segments are displayed with the same color, and those that were missed are displayed in black. It can be seen that some missing matches originate from having non-repeatable segmentations over frames, i.e., the boundaries of the segments are not always consistent when the viewpoint changes (see, for example, the segmentation of the sponge in frames d) and e) in Fig. 1). This is a known problem of image segmentation algorithms (Hedau et al. 2008) that has two effects: a segment in frame 1 is segmented as two in frame 2, or the other way round. As a consequence, the graphs that are built on top of these segmentations are structurally different.
Fig. 1 First row: original non-consecutive images. Rows 2 & 3: results of the matching between the corresponding pair of frames. Matches are displayed with the same colors. Segments for which no match was found are shown in black
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 In future work, we will extend the matching algorithm so that merging of segments is performed. In the presented system, we show in an exemplar way how the concept of compositionality can be integrated into CV algorithms and, by making use of well-approved segmentation and graph-matching methods, a simple visual representation can be achieved that is coherent over time. References Andreopoulos A, Tsotsos JK (2013) 50 years of object recognition: directions forward. Comput Vis Image Understand Aparicio VMV (2012) The visual language of thought: Fodor vs. Pylyshyn. Teorema: Revista Internacional de Filosofı´a 31(1):59–74 Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115 Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. Pattern Anal Mach Intell IEEE Trans 24(5):603–619 Edelman S (1997) Computational theories of object recognition. Trend Cogn Sci, pp 296–304 Edelman S, Intrator N (2000) (coarse coding of shape fragments) + (retinotopy) approximately = representation of structure. Spatial Vision 13(2–3):255–264 Edelman S, Intrator N (2003) Towards structural systematicity in distributed, statically bound visual representations. Cogn Sci 27(1):73–109 Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28(1):3–71 Garcia GM (2014) Towards a Graph-based Method for Image Matching and Point Cloud Alignment. Tech. rep., University of Bonn, Institute of Computer Science III Hedau V, Arora H, Ahuja N (2008) Matching images under unstable segmentations. In: IEEE conference on computer vision and pattern recognition (CVPR)., IEEE Szabo´ ZG (2013) Compositionality. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy, fall 2013 edn Tacca MC (2010) Seeing objects: the structure of visual representation. Mentis Wilson RC, Hancock ER (1997) Structural matching by discrete relaxation. IEEE Trans Pattern Anal Mach Intell 19:634–648
Control and flexibility of interactive alignment: Mo¨bius syndrome as a case study John Michael1,2,3, Kathleen Bogart4, Kristian Tyle´n3,5, Joel Krueger6, Morten Bech3, John Rosendahl Østergaard7, Riccardo Fusaroli3,5 1 Department of Cognitive Science, Central European University, Budapest, Hungary; 2 Center for Subjectivity Research, Copenhagen University, Copenhagen, Denmark; 3 Interacting Minds Centre, Aarhus University, Aarhus, Denmark; 4 School of Psychological Science, Oregon State University, Corvallis, USA; 5 Center for Semiotics, Aarhus University, Aarhus, Denmark; 6 Department of Sociology, Philosophy, and Anthropology University of Exeter Amory, Exter, UK; 7Aarhus University Hospital, Aarhus, Denmark Keywords Mo¨bius Syndrome, social interaction, social cognition, alignment When we interact with others, there are many concurrent layers of implicit bodily communication and mutual responsiveness at work— from the spontaneous temporal synchronization of movements (Richardson et al. 2007), to gestural and postural mimicry (Chartrand and Bargh 1999; Bernieri and Rosenthal 1991), and to multiple dimensions of linguistic coordination (Garrod and Pickering 2009; Clark 1996; Fusaroli and Tylen 2012). These diverse processes may serve various important social functions. For example, one individual’s facial expressions,
S125 gestures, bodily postures, tone and tempo of voice can provide others with information about her emotions, intentions and other mental states, and thereby help to sustain interpersonal understanding and support joint actions. And when such information flows back and forth among two or more mutually responsive participants in an interaction, the ensuing alignment can promote social cohesion, enhancing feelings of connectedness and rapport (Lakin and Chartrand 2003; Bernieri 1988; Valdesolo et al. 2010). Indeed, by enhancing rapport, interactive alignment may also increase participants’ willingness to cooperate with each other (van Baaren et al. 2004; Wiltermuth and Heath 2009) and—equally importantly—their mutual expectations of cooperativeness even when interests are imperfectly aligned, as in scenarios such as the prisoners’ dilemma (Rusch et al. 2013). Moreover, interactive alignment may even enhance interactants’ ability to understand each other’s utterances (Pickering and Garrod 2009) and to communicate their level of confidence in their judgments about situations (Fusaroli et al. 2012), thereby enhancing performance on some joint actions. Finally, interactive alignment may also increase interactants’ ability to coordinate their contributions to joint actions (Valdesolo et al. 2010) because synchronization increases interactants’ attention to one another’s movements, and because it may be easier to predict and adapt to the movements of another person moving at a similar tempo and initiating movements of a similar size, duration, and force as oneself. It is no surprise, then, that recent decades have seen a dramatic increase in the amount of attention paid to various kinds of interactive alignment in the cognitive sciences. However, although there is a broad consensus about the importance of interactive alignment processes for social interaction and social cognition, there are still many open questions. How do these diverse processes influence each other? Which ones contribute—and in what ways—to interpersonal understanding, cooperativeness and/or performance in joint actions? Is alignment sometimes counterproductive? To what extent can alignment processes be deliberately controlled and flexibly combined, replaced, tweaked or enhanced? This latter question may be especially relevant for individuals who have impairments in some form of bodily expressiveness, and who therefore may benefit by compensating with some other form of expressiveness. In the present study, we investigated social interactions involving just such individuals, namely a population of teenagers with Mo¨bius Syndrome (MS)—a form of congenital, bilateral facial paralysis resulting from maldevelopment of the sixth and seventh cranial nerves (Briegel et al. 2006). Since people with MS are unable to produce facial expressions, it is unsurprising that they often experience difficulties in their social interactions and in terms of general social well-being. We therefore implemented a social skills intervention designed to train individuals with facial paralysis owing to MS to adopt alternative strategies to compensate for the unavailability of facial expression in social interactions (e.g. expressive gesturing and prosody). In order to evaluate the effectiveness of this intervention, each of the 5 participants with MS (‘MS-participants’) engaged in interactions before and after the intervention with partners who did not have MS (‘Non-MS-participants’). These social interactions consisted of two separate tasks, a casual getting-to-know-you task and a task designed to tap interpersonal understanding. Participants filled out rapport questionnaires after each interaction. In addition, the interactions were videotaped and analyzed by independent coders, and we extracted two kinds of linguistic data relating to the temporal organization of the conversational behavior: prosody (fundamental frequency) and speech rate. We used this latter data to calculate indices of individual behavioral complexity and of alignment using cross-recurrence quantification analysis (CRQA). We found several interesting results. First, intervention increased observer-coded rapport. Secondly, observer-coded gesture and expressivity increased in participants with and without MS after intervention. Thirdly, fidgeting and repetitiveness of verbal behavior decreased in both groups after intervention. Fourthly, while we did in general observe alignment (compared to surrogate pairs), overall
123
S126 linguistic alignment actually decreased after intervention, and pitch alignment was negatively correlated with rapport. These results suggest that the intervention had an impact on MS interlocutors, which in turn impacted non-MS interlocutors, making them less nervous and more engaged. Behavioral dynamics can statistically predict observer-coded rapport, thus suggesting a direct link between them and experience of the interaction. This pattern of findings provides initial support for the conjecture that a social skills workshop like the one employed here can not only affect the participants with MS but also—and perhaps even more importantly—affect the interaction as a whole as well as the participants without MS. One reason why this is important is because some of the difficulties experienced by individuals with MS in social interactions may arise from other people’s discomfort or uncertainly about how to behave. In other words, individuals without MS who interact with individuals with MS may interrupt the smooth flow of interaction through their uncertainty about how to interact in what is for them a new and sensitive situation. Moreover, this may also be true in other instances in which people interact with others who appear different or foreign to them (because of other forms of facial difference, skin color, etc.) Thus, this issue points to a possible direction in which further research may be conducted that would extend the findings far beyond the population of individuals with MS. More concretely, one obvious comparison would be to individuals with expressive impoverishment due to Parkinson’s disease. Do these individuals also employ some of the same kinds of compensatory strategies as individuals with MS? If so, what effects does that have upon interactive alignment within social interactions? What differences does it make that their condition is an acquired rather than a congenital one? Finally, one additional question for further research is whether some compensatory strategies are more easily automated than others. For example, it is possible that increasing hand gesturing or eye contact can be quickly learned and routinized, but that modulating one’s prosody cannot. If there are such differences among the degrees to which different processes can be automated, it would be important to understand just what underlies them. On a theoretical level, this could provide useful input to help us understand the relationship between automatic and controlled processes. On a more practical level, this could be important for three concrete reasons. First of all, it may be taxing and distracting to employ deliberate strategies for expressing oneself in social interactions, and people may therefore find it tiring, and be less likely to continue doing it. Secondly, it may be important that some interactive alignment processes occur without people’s awareness. Thus, attempting to bring them about deliberately may actually interfere with the implicit processes that otherwise generate alignment. Indeed, there is evidence that behavioral mimicry actually undermines rapport if people become aware that it is being enacted deliberately (Bailensen et al. 2008). Thirdly, it would be important for future social skills workshops to examine whether some compensatory strategies are more effectively taught indirectly—e.g. rather than telling people to use more gestures, it may be advantageous to employ some other means which does not require them to deliberately attend to their gestures or prosody, for example by using more gestures and prosody when interacting with children with MS, by asking them to watch videos in which actors are highly expressive in their gestures and prosody, or by engaging them in role-playing games in which a high level of gesture and/or prosody is appropriate. References Bailenson JN, Yee N, Patel K, Beall AC. (2008) Detecting digital chameleons. Comput Hum Behav 24:66–87 Bernieri FJ, Rosenthal R (1991) Interpersonal coordination: behavior matching and interactional synchrony. In: Feldman RS, Rime B
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 (eds) Fundamentals of nonverbal behavior. Cambridge University Press, Cambridge, pp 401–432 Bogart KR, Tickle-Degnen L, Ambady N (2012) Compensatory expressive behavior for facial paralysis: adaptation to congenital or acquired disability. Rehabilit Psychol 57(1):43–51 Bogart KR, Tickle-Degnen L, Joffe M (2012) Social interaction experiences of adults with Moebius syndrome: a focus group. J Health Psychol. Advance online publication Bogart KR, Matsumoto D (2010) Facial mimicry is not necessary to recognize emotion: Facial expression recognition by people with Moebius syndrome. Soc Neurosci 5(2):241–251 Bogart KR, Matsumoto D (2010) Living with Moebius syndrome: adjustment, social competence, and satisfaction with life. Cleft Palate-Craniofacial J47(2):134–142 Briegel W (2007) Psychopathology and personality aspects of adults with Moebius sequence. Clin Genet 71:376–377 Chartrand TT, Bargh JA (1999) The chameleon effect: the perceptionbehavior link and social interaction. J Person Soc Psychol 76:893–910 Clark HH (1996) Using language. Cambridge University Press, Cambridge Derogatis LR SCL-90-R (1977) Administration, scoring and procedures manual-I for the Revised version. Johns Hopkins University School of Medicine, Baltimore Fahrenberg J, Hampel R, Selg H. FPI-R. (2001) Das Freiburger Personlichkeitsinventar, 7th ed. Gottingen: Hogrefe Garrod S, Pickering MJ (2009) Joint action, interactive alignment, and dialog. Top Cogn Sci 1(2):292–304 Helmreich R, Stapp J (1974) Short forms of the Texas Social Behavior Inventory (TSBI), an objective measure of selfesteem. Bull Psychon Soc Kahn JB, Gliklich RE, Boyev KP, Stewart MG, Metson RB, McKenna MJ (2001) Validation of a patient-graded instrument for facial nerve paralysis: The FaCE scale. Laryngoscope 111(3):387–398 Lakin J, Chartrand T (2003) Using nonconscious behavioral mimicry to create affiliation and rapport. Psychol Sci 14:334–339 Mattick RP, Clarke JC (1998) Development and validation of measures of social phobia scrutiny fear and social interaction anxiety. Behav Res Therapy 36(4):455–470 Meyerson MD (2001) Resiliency and success in adults with Moebius syndrome. Cleft Palate Craniofac J 38:231–235 Oberman LM, Winkielman P, Ramachandran VS (2007) Face to face: blocking facial mimicry can selectively impair recognition of emotional expressions. Social Neurosci 2(3):167–178 Richardson MJ, Marsh KL, Isenhower RW, Goodman JR, Schmidt RC (2007) Rocking together: dynamics of intentional and unintentional interpersonal coordination. Hum Mov Sci 26:867–891 Robinson E, Rumsey N, Partridge J. (1996) An evaluation of social interaction skills training for facially disfigured people. Br J Plast Surg 49:281–289 Rosenberg M (1965) Rosenberg self-esteem scale (RSE). Acceptance and Commitment Therapy. Measures Package, 61 Tickle-Degnen L, Lyons KD (2004) Practitioners’ impressions of patients with Parkinson’s disease: the social ecology of the expressive mask. Soc Sci Med 58:603–614 Valdesolo P, Ouyang J, DeSteno D (2010) The rhythm of joint action: synchrony promotes cooperative ability. J Exp Soc Psychol 46:693–695 van Baaren RB, Holland RW, Kawakami K, van Knippenberg A (2004) Mimicry and pro-social behavior. Psychol Sci 15:71–74 Zigmond AS, Snaith R (1983) The hospital anxiety and depression scale. Acta Psychiatrica Scand 67(6):361–370
Cogn Process (2014) 15 (Suppl 1):S1–S158
Efficient analysis of gaze-behavior in 3D environments Thies Pfeiffer1, Patrick Renner, Nadine Pfeiffer-Lessmann 1 Center of Excellence Cognitive Interaction Technology, Bielefeld University, Germany; 2 SFB 673: Alignment in Communication, Bielefeld University, Germany Abstract We present an approach coined EyeSee3D to identify the 3D point of regard and the fixated object in real-time based on 2D gaze videos without the need for manual annotation. The approach does not require additional hardware except for the mobile eye tracker. It is currently applicable for scenarios with static target objects and requires fiducial markers to be placed in the target environment. The system has already been tested in two different studies. Possible applications are visual world paradigms in complex 3D environments, research on visual attention or human–human/human-agent interaction studies. Keywords 3D eye tracking, Natural environments Introduction Humans are evolved to live in a 3D spatial world. This affects our perception, our cognition and our action. If human behavior and in particular visual attention is analyzed in scientific studies, however, practical reasons often force us to reduce the three-dimensional world to two dimensions within a small field of view presented on a computer screen. In many situations, such as spatial perspective taking, situated language production, or understanding of spatial references, just to name a few, a restriction to 2D experimental stimuli can render it impossible to transfer findings to our natural everyday environments. One of the reasons for this methodological compromise is the effort required to analyze gaze data in scenarios where the participant is allowed to move around and inspect the environment freely. Current mobile eye-tracking systems use a scene camera to record a video from the perspective of the user. Based on one or two other cameras directed at the participant’s eyes, the gaze fixation of the participant is then mapped on the video of the scene camera. While binocular systems are already able to compensate for parallax by estimating the distance of the fixation from the observer, they have no representation of the 3D world but still only work on the 2D projection of the world visible in the scene camera video. The most important part then is identifying in the video stream that particular object the participant has been fixating. This currently requires manual annotations, which take several times as much as the recorded time. Depending on the complexity of the annotation (target object count and density), we had cases where the annotation of one minute recorded video required fifteen minutes of annotation or more. With our EyeSee3D approach, we provide a software tool that is able to identify the fixated objects automatically if it can be allowed that the environment is covered with some visible markers that do not affect the visual behavior and if the target objects remain static. Related Work There are approaches for semi-automatic gaze annotation based on 2D computer vision, such as the SemantiCode approach by Pontillo et al. (2010), which still requires manual annotation, but achieves a speed-up by incrementally learning the labeling of the targets using machine learning and computer vision techniques. Still, the experimenter has to at least validate every label. Approaches that also use 3D models are Toyama et al. (2012), but they are targeting human– computer interactions, not scientific studies, and Paletta et al. (2013), who use a 3D scan of the target environment to later identify the target position. Their approach requires much more effort during preparation but then does not require an instrumentation of the environments with markers.
S127 Application Areas The presented EyeSee3D approach can be applied as a method to accurately an- notate fixations in 3D environments as required for scientific studies. We have already tested this approach in two studies. Both studies involve settings with two interacting interlocutors (no confederates) sitting face-to-face at a table. In the first study, we were interested in gaze-patterns of joint attention (Pfeiffer-Lessmann, Pfeiffer, Wachsmuth 2013). We placed 23 figures of a LEGO Duplo set on a table, each of which facing either of the interlocutors. The experimenter then describes a certain figure and the interlocutors have to team up to identify the figure. The task, however, is not as simple as it sounds: the information given might only be helpful for one of the interlocutors, as it might refer to features of the figure only visible from a certain perspective. Even more, the interlocutors are instructed to neither speak nor gesture to communicate. This way we force the participants to use their gaze to guide their partner’s attention towards the correct figure. The set-up used in this experiment will be used later in this paper to illustrate the EyeSee3D method. In the second study, we were interested in creating computational models for predicting the targets of pointing gestures and more generally areas which in the near future will be occupied by a human interlocutor during interaction (Renner, Pfeiffer, Wachsmuth 2014). This research is motivated by human-robot interaction in which we want to enable robots to anticipate human movements in order to be more responsive, i.e., in collision-avoidance behavior. Besides eye tracking, in this study we also combined the EyeSee3D approach with an external motion-tracking system to track the hands and the faces of the interlocutors. Using the same principles as presented in the next section, also the targets of pointing gestures as well as gazes towards the body of the interlocutor can be identified computationally without the need for manual annotations. EyeSee3D The EyeSee3D approach is easy to set-up Fig. 1 on the left shows a snapshot from one of our own studies on joint attention between two human interlocutors (Pfeiffer-Lessmann, N., Pfeiffer, T., Wachsmuth, I. (2013). In this study we had 12 pairs of interaction partners and a total of about 160 min of gaze video recordings. It would have taken about 40 h to manually annotate the gaze videos, excluding any additional second annotations to test for annotation reliability. The process followed by EyeSee3D is presented in Fig. 2. In a preparation phase, we covered the environment with so-called fiducial markers, highly visible printable structures that are easy to detect using computer-vision methods (see Fig. 1, mid upper half). We verified that these markers did not attract significant attention by the participants. As a second step, we created proxy geometries for the relevant stimuli, in this example small toy figures (see Fig. 3). For our set- up, a simple approximation using bounding boxes is sufficient, but any complex approximation of the target may be used. When aiming for maximum precision, it is possible to use 3D scans with exact replications of the hull of the target structure. The whole process for setting up such a table will take about 30 min. These preparations have to be made once, as the created model can be used for all study recordings. Based on these preparations, we are now able to conduct the study and record the eye-tracking data (gaze videos and gaze data). EyeSee3D then automatically annotates the recorded gaze videos. For each frame of the video, the algorithms detect fiducial markers in the image and estimate the position and orientation of the scene camera in 3D space. For this process to succeed at least one fiducial marker has to be fully visible in each frame. The camera position and orientation are then used together with the gaze information provided by the eye tracker itself to cast a gaze ray into the 3D proxy geometries. This gaze ray intersects the 3D proxy geometries exactly at the point (see Fig. 1, right) that is visualized by the gaze cursor in the scene camera
123
S128
Fig. 1 The left snapshot is taken from a 2D mobile eye-tracking video taken from the ego- centric perspective of the scene camera. The point of regard is visualized using a green circle and a human annotator would have to manually identify the fixated object, here the figure of a girl. With EyeSee3D, gaze-rays can be computed and cast into a 3D abstract model of the environment (simple white boxes around the figures), the intersection with the fixation target (box corresponding to figure of the girl) is computed automatically and in real-time
Fig. 2 The EyeSee3D method requires a one-time preparation phase. During study recording there are two alternatives, either (a) use the standard tools and run EyeSee3D offline to annotate the data or b use EyeSee3D online during the study
Fig. 3 The 3D proxy geometries that had to be created to determine the fixated objects. The different figures are textured with original pictures, which is not needed for the process but useful for controlling the orientation of the figures when setting up the experiment
video provided by the standard eye-tracking software (see Fig. 1, left). As each of the proxy geometries is labeled, we can identify the target object automatically. This annotation process can be either used online during the study, so that the annotation results are already available when the study session is completed. Or, alternatively, EyeSee3D can be used in offline-mode to analyze the previously recorded gaze videos and data files. This offline-mode has the advantage that it can be repeatedly applied to the same data. This is useful in cases where number and placement of the proxy geometries is not known beforehand and incrementally refined during the progress of understanding the problem domain. For example, at the moment we are only interested in locating the target figure. Later on we might be working together with psycholinguists on language processing following a visualworld paradigm. We might then be also interested in whether the participants have looked at the headdress, the head, the upper body or
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 the lower body of the figures during sentence processing. After updating the 3D proxy models, we could use EyeSee3D to re-annotate all videos and have the more fine-grained annotation ready within minutes. In our example study, we were able to cover about 130 min of the 160 min of total recordings using this technique. In the remaining 30 min, participants were either moving their head so quickly that the scene camera only provided a motion-blurred image or they turned towards the interaction partner or the experimenter for questions, so that no marker was visible in the image (but also no target stimuli). Thus, the remaining 30 min where not relevant for the evaluation of the study. More technical details about the EyeSee3D approach have been presented at ETRA 2014 (Pfeiffer and Renner 2014). Discussion and Future Work The presented initial version of our EyeSee3D approach can already significantly speed-up the annotation of mobile eye-tracking studies. There are no longer economic reasons to keep an eye on short sessions and low number of participants. The accuracy of the system depends on the one hand on the accuracy of the eye- tracking system. In this the accuracy of EyeSee3D does not differ from the normal 2D video-based analysis. On the other hand the accuracy depends on the quality with that the fiducial markers are detected. The larger the detected marker and the better the contrast, the higher the accuracy of the estimated camera position and orientation. EyeSee3D is not only applicable for small setups, as the selected example of two interaction partners sitting at a table might suggest at first glance. The size of the environment is not restricted as long as at least one fiducial marker is in the field of view for every relevant target object. The markers might, for example, be sparsely distributed in a museum just around the relevant exhibits. We are currently working on further improving the speed and the accuracy of the system. In addition to that, we are planning to integrate other methods for tracking the scene camera’s position and orientation in 3D space based, e.g., on tracking arbitrary but significant images. In certain examples such as a museum or a shelf in a shopping center, this would allow for an automatic tracking without any dedicated markers. In future work, we are planning to compare the results obtained by human annotators with those calculated by EyeSee3D. In a pilot evaluation we were able to identify situations of disagreement, i.e. situations in which EyeSee3D comes to slightly different results as a human annotator, when two target objects overlap in space (which is more likely to happen with a freely moving participant than in traditional screen-based experiments) and the fixation is somewhere in between. Such situations are likewise difficult to annotate consistently between human annotators, because of their ambiguity. Investigating the differences between the systematic and repeatable annotations provided by EyeSee3D and the interpretations of human annotators, which might depend on different aspects, such as personal preferences or the history of preceding fixations, could be very informative. Besides the described speed-up achieved by EyeSee3D, it might also provide more objective and consistent annotations. In summary, using EyeSee3D the analysis of mobile gaze-tracking studies has become as easy as desktop-computer-based studies using remote eye-tracking systems. Acknowledgments This work has been partly funded by the DFG in the SFB 673 Alignment in Communication. References Paletta L, Santner K, Fritz G, Mayer H, Schrammel J (2013) 3D attention: measurement of visual saliency using eye tracking
Cogn Process (2014) 15 (Suppl 1):S1–S158 glasses, CHI 13 Extended Abstracts on Human Factors in Computing Systems, 199204, ACM, Paris, France Pfeiffer T, Renner P (2014) EyeSee3D: A Low-Cost Approach for Analysing Mobile 3D Eye Tracking Data Using Augmented Reality Technology, Proceedings of the Symposium on Eye Tracking Research and Applications, ACM Pfeiffer-Lessmann N, Pfeiffer T, Wachsmuth I (2013) A Model of Joint Attention for Humans and Machines. Book of Abstracts of the 17th European Conference on Eye Movements (Bd. 6), 152, Lund, Sweden Pontillo DF, Kinsman TB, Pelz JB (2010) SemantiCode: using content similarity and database-driven matching to code wearable eyetracker gaze data. ACM ETRA 2010, 267270, ACM Renner P, Pfeiffer T, Wachsmuth I (2014) Spatial references with gaze and pointing in shared space of humans and robots. In: Proceedings of the Spatial Cognition 2014 Toyama T, Kieninger T, Shafait F, Dengel A (2012) Gaze guided object recognition using a head-mounted eye tracker, ACM ETRA 2012, 9198, ACM
The role of the posterior parietal cortex in relational reasoning Marco Ragni, Imke Franzmeier, Flora Wenczel, Simon Maier Center for Cognitive Science, Freiburg, Germany Abstract Inferring information from given relational assertions is at the core of human reasoning ability. Involved cognitive processes include the understanding and integration of relational information into a mental model and drawing conclusions. In this study we are interested in the identification of the role of associated brain regions. Hence, (i) we reanalyzed 23 studies on relational reasoning from Pubmed, Science Direct, and Google Scholar with healthy patients and focused on peak-voxel analysis of single subregions of the posterior parietal cortex allowing a more fine-grained analysis than before, and (ii) the identified regions are interpreted in light of findings on reasoning phases from own transcranial magnetic stimulation (TMS) and fMRI studies. The results indicate a relevant role of the parietal cortex, especially the lateral superior parietal cortex (SPL) for the construction and manipulation of mental models. Keywords Relational Reasoning, Brain Regions, Posterior Parietal Cortex Introduction Consider a relational reasoning problem of the following form: The red car is to the left of the blue car. The yellow car is to the right of the blue car. What follows? The assertions formed in reasoning about (binary) relations consist of two premises connecting three terms (the cars above). Participants process each piece of in- formation and integrate it in a mental model (Ragni, Knauff 2013). A mental model (Johnson-Laird 2006) is an analogue representation of the given information. For the problem above we could construct a mental model of the following form: red car blue car yellow car From this analogical representation (for a complete discussion of how much information might be represented please refer to Knauff 2013) the missing relational information, namely that the red car is to the left of the yellow car, can easily be inferred. The premise description above is determinate, i.e., it elicits only one mental model. There are, however, indeterminate descriptions, i.e., descriptions with which multiple models are consistent, and sometimes alternative models have to be constructed. The associated mental processes in reasoning are the model construction, mental inspection, and model
S129 variation phase. The neural activation patterns can also help unravel the cognitive processes underlying reading and processing of the premise information. First experiments utilizing recorded neural activation with PET (and later with fMRI) were conducted by Goel et al. in 1998. The initial motivation was to get an answer about which of several then popular psychological theories were correct (cp Goel 2001), simply by examining the involvement of respective brain areas that are connected to specific processing functions. Such an easy answer has not yet been found. However, previous analyses (e.g., Knauff 2006; Knauff 2009) across multiple studies showed the involvement of the frontal and the posterior parietal cortex (PPC), especially for relational reasoning. Roughly speaking, the role of the PPC is to integrate information across modalities (Fig. 1) and its general involvement has been shown consistently in studies (Knauff 2006). In this article we briefly introduce state-of-the-art neural findings for relational reasoning. We present an overview of the current studies and report two studies form our lab. Subregions within the PPC, e.g., the SPC, are differentiated to allow a more fine-grained description of their role for the mental model construction and manipulation process. The Associated Activation in Relational Reasoning Main activations during reasoning about relations found in a metastudy conducted by Prado et al. (2011) identified the role of the PPC and the middle frontal gyrus (MFG). Although we know that these regions are involved in the reasoning process about relations, exact functions of these areas, the role of the subregions, and problemspecific differences remain unclear. Studies by Fangmeier, Knauff and colleagues (Fangmeier et al. 2006; Fangmeier, Knauff 2009; Knauff et al. 2003) additionally compared activations across the reasoning process. They analyzed the function of the PPC during the processing and integration of premise information and the subsequent model validation phase. The PPC is mainly active in the last phase— model validation. We included all studies mentioned in Prado et al. (2012) and Knauff (2006) and additionally searched the three databases Pubmed, Google scholar, and Science Direct with the keywords: relational reasoning or deductive reasoning in combination with the terms neuroimaging or fMRI, and searched for studies that were cited in the respective articles. Of these 26 studies we included 23 in our analysis; all those which (i) report coordinates (either Tailarach or MNI), (ii) had a reasoning vs. non-reasoning contrast, and (iii) used healthy participants, i.e., excluding patient studies. We transformed all coordinates to the MNI coordinate system for the peak voxel analysis. Only few studies report temporal activation. Mainly activation in the middle temporal gyrus was found, possibly related to language processes. Activation in the occipital cortex probably is due to the
Fig. 1 The posterior parietal cortex and central subregions
123
S130
Cogn Process (2014) 15 (Suppl 1):S1–S158
Table 1 Key studies and frontal and parietal activation
Anatomical probabilities for the peak coordinates located within the lateral (with the SPL as a subregion) and the medial SPC (with the precuneus as a subregion) according to the SPM anatomy toolbox (Eickhoff et al. 2007) are reported. Reports of SPC activation in the original publications which showed an anatomical probability of less than 30 % for the SPC are depicted in brackets. MC = motor cortex, PMC = premotor cortex, dlPFC = dorsolateral prefrontal cortex, AG = angular gyrus, TPJ = temporoparietal junction, SMG = supramarginal gyrus; left half-circle = left lateral, right half-circle = right lateral, circle = bilateral
visual presentation of the stimuli. Key activations were found in the frontal and parietal lobes (Table 1). Across all studies only the lateral SPL was consistently involved while in the frontal regions the activation was more heterogeneous. Hence, we focused on the PPC and its subregions. In Table 1 we report anatomical probabilities for the peak coordinates located within the lateral and me- dial (incl. the precuneus as a subregion) SPL according to the SPM anatomy toolbox (Eickhoff et al. 2007). Reports of SPL activation in the original publications which showed an anatomical probability of less than 30 % for the region are depicted in brackets. General Discussion Table 1 shows that in almost all experimental studies of relational reasoning the PPC is active. However, our goal—a more detailed analysis—shows the bilateral involvement of the lateral SPL and the inferior parietal lobe in the reasoning process. Additionally, and in accordance with findings from Fangmeier et al. (2006), it shows the importance in the core reasoning phase—the model validation phase. To investigate the role of these regions we conducted an fMRI study (Maier et al. 2014) and presented participants with indeterminate problems in which they could construct and vary the mental models. These processes elicited lateral SPL activation. We argue that in this region the mental model of the premise information is constructed and varied (cp Goel et al. 2004)—a result supported by our study. Thus, the lateral SPL is likely to be involved in the reasoning process. A causal connection can be established if a malfunctioning SPL leads to a decrease in reasoning performance. A method to induce ‘‘virtual lesions’’ is transcranial magnetic stimulation (TMS, Walsh, PascualLeone 2003). Hence, in a recent study we investigated the role of the SPL on the construction and alteration of mental models (Franzmeier et al. 2014). TMS on the SPL modulated the performance in deductive
123
reasoning tasks, i.e., participants needed longer if the SPL was stimulated during the model validation phase. A performance modulation was achieved by unilateral right and by bilateral stimulation. The modulation direction, i.e., whether the performance was enhanced or disrupted, depended on stimulation timing. Real lesions can shed additional light on this. A recent study by Waechter et al. (2013) compared patients with lesions in the rostrolateral prefrontal cortex to patients with PPC lesions, and controls on transitive inference problems. These results further support the role of the lateral SPL in drawing inferences and its crucial involvement in mental model construction. All studies show the eminent role of lateral SPL (and hence of the PPC) for the reasoning process. Premise information is integrated and manipulated in a mental model which is at least partially kept in the lateral SPL and to a lesser degree the inferior parietal cortex. Acknowledgments The work has been partially supported by a grant to MR from the DFG within the SFB/TR 8 in project R8-[CSPACE]. The authors are grateful to Barbara Kuhnert for drawing the brain picture and Stephanie Schwenke for proof-reading.
References Acuna BD, Eliassen JC, Donoghue JP, Sanes JN (2002) Frontal and parietal lobe activation during transitive inference in humans. Cerebral Cortex (New York, N.Y.: 1991), 12(12):1312–1321 Brzezicka A, Sedek G, Marchewka A, Gola M, Jednorg K, Krlicki L, Wrbel A (2011) A role for the right prefrontal and bilateral parietal cortex in four-term transitive reasoning: an fMRI study
Cogn Process (2014) 15 (Suppl 1):S1–S158 with abstract linear syllogism tasks. Acta Neurobiologiae Experimentalis, 71(4):479–495 Eickhoff SB, Paus T, Caspers S, Grosbras MH, Evans A, Zilles K, Amunts K (2007) Assignment of functional activations to probabilistic cytoarchitectonic areas revisited. NeuroImage 36(3):511–521 Fangmeier T, Knauff M (2009) Neural correlates of acoustic reasoning. Brain Res 1249:181–190. doi:10.1016/j.brainres.2008.10.025 Fangmeier T, Knauff M, Ruff CC, Sloutsky V (2006) FMRI evidence for a three-stage model of deductive reasoning. J Cogn Neurosci 18(3):320–334 Franzmeier I, Maier SJ, Ferstl EC, Ragni M (2014) The role of the posterior parietal cortex in deductive reasoning: a TMS study. In OHMB 2014. Human Brain Mapping Conference, Hamburg Goel V, Gold B, Kapur S, Houle S (1998) Neuroanatomical correlates of human reasoning. J Cogn Neurosci 10(3):293–302 Goel V, Dolan RJ (2001) Functional neuroanatomy of three-term relational reasoning. Neuropsychologia 39(9):901–909 Goel V, Makale M, Grafman J (2004) The hippocampal system mediates logical reasoning about familiar spatial environments. J Cogn Neurosci 16:654–664 Goel V, Stollstorff M, Nakic M, Knutson K, Graf-man J (2009) A role for right ventrolateral prefrontal cortex in reasoning about indeterminate relations. Neuropsychologia 47(13):2790–2797 Hinton EC, Dymond S, von Hecker U, Evans CJ (2010) Neural correlates of relational reasoning and the symbolic distance effect: involvement of parietal cortex. Neuroscience 168(1):138–148 Johnson-Laird PN (2006) How we reason. Oxford University Press, New York Knauff M (2006) Deduktion und logisches Denken. Denken und Problemlo¨sen. Enzyklopa¨die der Psychologie, 8, Hogrefe, Go¨ttingen Knauff M (2009) A neuro-cognitive theory of deduc- tive relational reasoning with mental models and visual images. Spatial Cogn Comput 9(2):109–137 Knauff M, Fangmeier T, Ruff CC, Johnson-Laird PN (2003) Reasoning, models, and images: Behavioral measures and cortical activity. J Cogn Neurosci 15(4):559–573 Knauff M, Johnson-Laird PN (2002) Visual imagery can impede reasoning. Memory Cogn 30(3):363–371 Knauff M, Mulack T, Kassubek J, Salih HR, Greenlee MW (2002) Spatial imagery in deductive reasoning: a functional MRI study. Brain Res Cogn Brain Res 13(2):203–212 Knauff M (2013) Space to Reason: A Spatial Theory of Human Thought. MIT Press Prado J, Chadha A, Booth JR (2011) The brain network for deductive reasoning: a quantitative meta- analysis of 28 neuroimaging studies. J Cogn Neurosci 23(11):3483–3497 Prado J, Mutreja R, Booth JR (2013) Fractionating the neural substrates of transitive reasoning: task- dependent contributions of spatial and verbal representations. Cerebral Cortex (New York, N.Y.: 1991):23(3):499–507 Prado J, Noveck IA, Van Der Henst J-B (2010a). Overlapping and distinct neural representations of numbers and verbal transitive series. Cerebral Cortex (New York, N.Y.: 1991), 20(3):720–729 Prado J, Van Der Henst JB, Noveck IA (2010b). Recomposing a fragmented literature: How conditional and relational arguments engage different neural systems for deductive reasoning. Neuroimage 51(3):1213–1221 Ragni M, Knauff M (2013) A theory and a computational model of spatial reasoning with preferred mental models. Psychol Rev 120 (3):561–588 Ruff CC, Knauff M, Fangmeier T, Spreer J (2003) Reasoning and working memory: common and distinct neuronal processes. Neuropsychologia 41(9):1241–1253 Shokri-Kojori E, Motes MA, Rypma B, Krawczyk DC (2012) The Network Architecture of Cortical Processing in Visuo-spatial Reasoning. Scientific Reports, 2. doi:10.1038/srep00411
S131 Waechter RL, Goel V, Raymont V, Kruger F, Grafman J (2013) Transitive inference reasoning is impaired by focal lesions in parietal cortex rather than rostrolateral prefrontal cortex. Neuropsychologia 51(3):464–471 Walsh P-L (2003) Transcranial magnetic stimulation: A neurochronometrics of mind. MIT Press, Cambridge Wendelken C, Bunge SA (2010) Transitive inference: distinct contributions of rostrolateral prefrontal cortex and the hippocampus. J Cogn Neurosci 22(5):837–847
How to build an inexpensive cognitive robot: Mind-R Enrico Rizzardi1, Stefano Bennati2, Marco Ragni1 1 University of Freiburg, Germany; 2 ETH Zu¨rich, Switzerland Abstract Research in Cognitive Robotics is dependent on standard robotic platforms that are designed to provide high precision as required by classical robotics, such platforms are generally expensive. In most cases the features provided by the robot are more than are needed to perform the task and this complexity is not worth the price. In this article we propose a new reference platform for Cognitive Robotics that, thanks to its low price and full-featured set of capabilities, will make research much more affordable and pave the way for more contributions in the field. The article describes the requirements and procedure to start using the plat- form and presents some use examples. Keywords Mind-R, Cognitive Robotics, ACT-R, Mindstorms Introduction Cognitive Robotics aims to bring human level intelligence to robotic agents equipping them with cognitive- based control algorithms. This can be accomplished by extending the capabilities of robots with concepts from Cognitive Science, e.g. learning, reasoning and planning abilities. The main difference to classical robotics is in the requirements: cognitive robots must show robust and adaptable behavior, while precision and efficiency are not mandatory. The standard robotic platforms are designed to comply with the demanding requirements of classical robotics; therefore the entry price is high enough to become an obstacle for most researchers. To address this issue we present a new robotic plat- form targeted to Cognitive Robotics research that we called Mind-R. The advantages of Mind-R over other robotic hardware are its low price and customization capabilities. Its greatest disadvantage is that its sensors and actuators are not even nearly as precise as other commercial hardware, but this is not a big is- sue in Cognitive Robotics which does not aim at solving tasks efficiently and precisely, instead flexibility and adaptability are the focus. The article is structured as follows: Section 2 briefly describes the ACT-R theory and how the Mind-R modules fit into its framework, Section 3 provides details of the characteristics of the hardware platform and Section 4 details a step-by-step guide on how to install the software and run some examples. ACT-R ACT-R (Bothell 2005) is a very well know and widely tested cognitive architecture. It is the implementation of a theory of the mind developed by (Anderson et al. 2004) and validated by many experiments over the years. The ACT-R framework has a modular structure that can be easily expanded with new modules that allow re- searchers to add new features to the architecture, such as controlling a robotic platform. ACT-R models the structure and behavior of the human brain. Each module has a specific function (i.e. visual, motor, etc.) that reflects the functional organization of the cortex. The modules can exchange information through their buffers; each module can read all
123
S132
Cogn Process (2014) 15 (Suppl 1):S1–S158 Imaginal Module
Goal Module Goal Buffer
Declarative Module Retrieval Buffer
Im aginal Buffer
Procedural Module
nxtmove
nxttouched
nxtvisual
nxtdistance
nxtmotor
nxttouch
nxtvision
nxtdistance
Environment
Fig. 1 ACT-R structure with Mind-R modules the buffers, but can write only in its own (i.e. to answer queries). Communication is coordinated by the procedural module, which is a serial bottleneck. Extending ACT-R to read from sensors and control actuators required writing new modules, that give cognitive models the capability to communicate with the robot as it were the standard ACT-R’s virtual device (Fig. 1). Mind-R robot The robot used to build Mind-R is the LEGO Mind- storms set (LEGO 2009, Fig. 2). Its core relies on the central brick that includes the CPU, batteries and communication interfaces. Mind-R’s design is fully customizable using LEGO Mindstorms bricks. The robot can be programmed through the USB interface using many different languages and tools. To keep the interface with ACT-R straightforward, the chosen programming language is Lisp The Lisp interpreter can command the robot through the NXT-LSP libraries (Hiraishi 2007), which are at the core of the ACT-R interface. The LEGO Mindstorms in the Mind-R configuration is composed of: one ultrasonic sensor, two bumpers, one color sensor and two engines. The ultrasonic sensor, shown in Fig. 3a, provides an approximate distance from the next obstacle inside a solid angle. A bumper, show in Fig. 3b, provides binary state in- formation, that is it can distinguish between press and release states. The color sensor, show in Fig. 3c, can distinguish between basic colors, for example blue, red and green, over a short distance. The engines are step- per motors, able to turn in both directions. Two engines together make the driving system in Fig. 3d. Each engine drives a wheel, shown in the upper left and upper right-hand corners of Fig. 3d; they can be controlled to navigate in the environment. The third wheel, in the bottom part of Fig. 3d, has no tire has and only has balance purposes. Mind-R has already shown its effectiveness in spatial planning problems (Bennati and Ragni 2012). As a future development the effect of complex visual perception, coming from image processing software, on robotic navigation will be investigated. Setup This section provides a step-by-step guide on how to in- stall and configure the needed software to run the Mind- R platform. A more detailed guide can be found on the Mind-R website9 containing examples and a troubleshooting section. The LEGO NXT Fantom driver10 can be obtained from the LEGO Mindstorms website or from the Mind- storms installation CD and then installed. A reboot of the computer may be required. The 9
http://webexperiment.iig.uni-freiburg.de/mind-r/index.html. For Intel-based Macs users: make sure to install the cor- rect driver for your platform. For GNU/Linux no support is provided.
Fig. 2 Mind-R robot
Fig. 3 The Mindstorms robot and its sensors
recommended interpreter is SBCL x86. The SBCL installer can be down- loaded from its website.11 ACT-R 6.0 can be downloaded from its website and unpacked into a local folder. The Mind-R software, containing the NXT communication library, the peripherals modules and a demo model can be downloaded from the Mind-R website. The
10
123
11
SBCL website: http://www.sbcl.org/.
Cogn Process (2014) 15 (Suppl 1):S1–S158 source code of the NXT communication library is provided together with a guide about how to compile it. The con- tent of the Mind-R archive has to be unpacked into a local folder. The robot has to be connected to the computer through a USB port before loading the modules with the interpreter. After the robot is recognized by the OS, the interpreter can be launched. Make sure to start the interpreter from the folder where the Mind-R software has been unpacked. The NXT communication library can be loaded from LISP interpreter with the command (load ‘‘nxt.lsp’’). If the SBCL current working directory is not the one into which Mind-R has been unpacked, the NXT communication library loading will fail. This will prevent the demo or any other Mind-R models from running. If the loading was successful, SBCL returns T; if anything else is returned as final output, see the troubleshooting section of the Mind-R website. The next step is to load ACT-R with the command (load ‘‘/path/to/ act-r/load-act-r-6.lisp’’), replacing the path with the appropriate one. The Mind- R modules can now be loaded. Within the Mind-R archive four modules are provided: nxt-distance.lsp, nxt-touch.lsp, nxtmotor.lsp and nxt-vision.lsp The first al- lows ACT-R to communicate with the ultrasonic sensor, the second one with the bumpers, the third commands the engines and the last the color sensor. When the modules have been loaded by the interpreter with a load command, the robot setup phase can be concluded by executing the function nxt-setup. Once all this software has been successfully installed and loaded, a demo model consisting of a series of productions that let Mind-R explore the environment, can be started. The demo model is called demo.lsp and can be loaded with a load command or using the graphical interface. This demo model is very simple and is designed to be a starting point for building more complex models. To run the demo use the ACT-R run function. The engines of the robot might continue running after a model has terminated, this depends both on the model structure and on its final state. To stop the engines the function motor-reset has to be called. The demo model contains some productions used to let the robot inter- act with the environment. Figures 4 and 5 show productions similar to those in the model. Figure 4 shows two simple productions that are used to read the distance measured by ultrasonic sensor and to print the read distance into the console by invoking !outptut!. Figure 5 shows two productions that send commands to the engines. The left one makes the robot move straight forward. The value next to the duration field indicates how long the engines have to rotate be- fore stopping. The right one makes the robot turn right. Again, the higher the value assigned to duration, the longer the robot will turn in that direction. Conclusions The described platform is a low-priced and flexible cognitive robot that give researchers in the field of Cognitive Robotics an affordable alternative to the most common and expensive robotic platforms. The platform is based on the widely accepted architecture ACT-R 6.0 that controls the robot and interacts with the environment through LEGO Mind- storms’ sensors and actuators. The Mind-R robot has
Fig. 4 Productions that read the distance from the ultrasonic sensor
S133
Fig. 5 Production rules that control the engines: left moves forward, right turns right
already proven effective in spatial navigation (Bennati and Ragni 2012) tasks where it was able to replicate human navigation results and errors. The flexibility of the platform allows advanced perception capabilities such as computer vision and natural speech to be added. A first step in this direction is (Rizzardi 2013) in which a communication module and digital image processing software were used to ac- quire information from a retail webcam to improve the robotic navigation with the use of visual landmarks. By integrating ACT-R on a LEGO Mindstorms platform it is possible to use other ACT-R models from driver models (Haring et al. 2012), reasoning (Ragni and Bru¨ssow 2011) and planning (Best and Lebiere 2006) towards a true unified cognition approach. How- ever, equipping robots with cognitive knowledge is not only important for learning about human and embodied cognition (Clark 1999), it is becoming increasingly important for Human-Robot-Interaction, where a successful interaction depends on an understanding of the other agents’ behavior (Trafton et al. 2013). Our hope is that an affordable robot and the bridging function towards ACT-R maybe fruitful for research and education purposes. Acknowledgments This work has been supported by the SFB/TR 8 Spatial Cognition within project R8-[CSPACE] funded by the DFG. A special thanks to Tasuku Hiraishi, the developer of the NXT-LSP communication library. References Anderson J, Bothell D, Byrne M, Douglass S, Lebiere C, Qin Y (2004) An integrated theory of the mind. Psychological Review 111(4):1036–1060 Bennati S, Ragni M (2012) Cognitive Robotics: Analysis of Preconditions and Implementation of a Cognitive Robotic System for Navigation Tasks. In: Proceedings of the 11th International Conference on Cognitive Modeling, Universitaetsverlag der TU Berlin Best BJ, Lebiere C (2006) Cognitive agents interacting in real and virtual worlds. Cognition and multi-agent interaction: From cognitive modeling to social simulation pp 186–218 Bothell D (2005) ACT-R. http://act-r.psy.cmu.edu Clark A (1999) An embodied cognitive science? Trend Cogn Sci 3(9):345–351 Haring K, Ragni M, Konieczny L (2012) Cognitive Model of Drivers Attention. In: Russwinkel N, Drewitz U, van Rijn H (eds) Proceedings of the 11th International Conference on Cognitive Modeling, Universitaetsverlag der TU Berlin, pp 275–280 Hiraishi T (2007) Nxt controller in lisp http://super.para.media. kyoto-u.ac.jp/*tasuku/index-e.html LEGO (2009) LEGO Mindstorms. http://mindstorms.lego.com Ragni M, Bru¨ssow S (2011) Human spatial relational reasoning: Processing demands, representations, and cognitive model. In:
123
S134 Burgard W, Roth D (eds) Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, CA Rizzardi E (2013) Cognitive robotics: Cognitive and perceptive aspects in navigation with landmarks. Master’s thesis, Universita´ degli Studi di Brescia, Brescia, Italy Trafton G, Hiatt L, Harrison A, Tamborello F, Khemlani S, Schultz A (2013) Act-r/e: An embodied cognitive architecture for humanrobot interaction. Journal of Human-Robot Interaction 2(1):30–55
Crossed hands stay on the time-line Bettina Rolke1, Susana Ruiz Ferna´ndez2, Juan Jose´ Rahona Lo´pez3, Verena C. Seibold1 1 Evolutionary Cognition, University of Tu¨bingen, Germany; 2 Leibniz Knowledge Media Research Center (KMRC), Tu¨bingen, Germany; 3 Complutense University, Madrid, Spain How are objects and concepts represented in our memories? This question has been addressed in the past by two controversial positions. Whereas some theories suppose that representations are coded in amodal ‘‘concept nodes’’ (e.g., Kintsch 1998), more recent embodied theories assume that internal representations include multimodal perceptual and motor experiences. One example for the latter conception constitutes the conceptual metaphor view, which assumes that abstract concepts like ‘‘time’’ are grounded in cognitively more accessible concepts like ‘‘space’’ (Boroditsky et al. 2010). This view is empirically supported by time–space congruency effects, showing faster left-hand responses to past-related words and faster right-hand responses to future-related words compared to responses with reversed stimulus–response mapping (e.g., Santiago et al. 2007). This congruency effect implies that time is mentally represented along a line, extending horizontally from left to right. Whereas the existence of this ‘‘mental time-line’’ has been supported by several empirical findings (see Bonato et al. 2012), the specific properties of the spatial reference frame are still unclear. The aim of the present study was to shed further light into the specific relationship between temporal and spatial codes. Precisely, we examined whether the frame of reference for the association between temporal and spatial codes is based on the structural ‘‘embodied’’ side of the motor effectors, meaning that left (right) refers to the left (right) hand, independent of the actual hand position, or, alternatively, whether the frame of reference is organized along an egocentric spatial frame which represents things and effectors occurring at the left (right) body side as left (right)-sided. In other words, according to the embodied frame of reference, the left hand represents left, irrespective whether it is placed at the left or right body side, whereas according to an egocentric spatial frame, the left hand represents the left side, when it is placed at the left body side, but represents the right side, when it is placed at the right body side. Method We employed a spatial discrimination task. Participants (N = 20) had to respond with their right or left hand depending on the presentation side of a rectangle. Specifically, when the rectangle was presented at the left (right) side of fixation, participants had to press a response key on their left (right) side. In the uncrossed-hands condition, participants placed their left (right) index finger at the left (right) response key, in the crossed-hands condition they crossed hands and thus responded with their right (left) index finger on the left (right) key to left (right) sided targets. To activate spatial codes by time related words, we combined the spatial discrimination task with a priming paradigm and presented future and past related time words (e.g., yesterday, tomorrow) before the rectangle appeared (see Rolke et al. 2013). To monitor the time-course of the time–space congruency effect, we manipulated the SOA between the prime word and the rectangle (SOA = 300, 600, or 1,200 ms). Whereas the uncrossed condition served as baseline to establish the time–space congruency effect, the crossed-hands
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 condition allowed us to investigate whether the time–space congruency is based on egocentric spatial codes or whether it depends on body-referenced effector sides. Specifically, if the time–space congruency effect is based on an egocentric frame of reference, we expect faster RT for left (right) key responses following past (future) words regardless of response condition. If, on the other hand, the time–space congruency effect depends on effector side, we expect that the pattern reverse across conditions, i.e., faster RT should result for left (right) key responses following past (future) words in the uncrossed-hands condition, but faster RT for left (right) key responses following future (past) words in the crossed-hands condition. The experiment factorially combined response condition (uncrossed hands, crossed hands), temporal reference (past, future), response key position (left, right), and SOA (300, 600, or 1,200 ms). Repeated measures analyses of variance (ANOVA) were conducted on mean RT of correct responses and percent correct (PC) taking participants (F1) and items (F2) as random factors. P-values were, whenever appropriate, adjusted for violations of the sphericity assumption using the Greenhouse-Geisser correction. Results RT results are summarized in Fig. 1 which depicts mean RT as a function of temporal reference, response key position, SOA, and response hands condition. An ANOVA on RT revealed shorter RT for the uncrossed compared to the crossed condition, F1(1,19) = 48.9, p \ .001; F2(1,11) = 9198.2, p \ .001. SOA exerted an influence on RT, F1(2,38) = 28.3, p \ .001; F2(2,22) = 243.4, p \ .001. Shorter RTs were observed at shorter SOAs (all contrasts between SOAs p \ .05). As
Fig. 1 Mean RT depending on response key position, temporal reference of prime words, and SOA. Solid lines represent data of the uncrossed response condition; dotted lines represent data of the crossed response condition. For sake of visual clarity, no error bars were included in the figure
Cogn Process (2014) 15 (Suppl 1):S1–S158 one should expect, response condition interacted with response key position, F1(1,19) = 8.3, p = .01; F2(1,11) = 93.5, p \ .001, indicating a right hand benefit for right hand responses at the left side in the crossed condition and at the right side in the uncrossed condition. Theoretically most important, temporal reference and response key position interacted, F1(1,19) = 9.7, p = .01; F2(1,11) = 17.6, p = .002. This time–space congruency effect was neither modulated by response condition, F1(1,19) = 1.1, p = .31; F2(1,11) = 1.4, p = .27, nor by SOA, F1(2,38) = 1.2, p = .30; F2(2,22) = 1.0, p = .37. All other effects were not significant, all ps [ .31. Participants conducted more errors in the crossed than in the uncrossed response condition, F1(1,19) = 24.3, p \ .001; F2(1,11) = 264.3, p \ .001. The F2-analysis further revealed an interaction between response key position, SOA, and response condition F2(2,22) = 3.9, p = .04. There were no other significant effects for PC, all ps [ .07. Discussion By requiring responses on keys placed on the left or right by crossed and uncrossed hands, we disentangled the egocentric spatial space and the effector-related ‘‘embodied’’ space. The presentation of a time word before a lateralized visual target evoked a space–time congruency effect, that is, responses were fastened for spatially left (right) responses when a past (future) word preceded the rectangle. Theoretically most important, this space–time congruency effect was not modulated when hands were crossed. This result indicates that temporal codes activate abstract spatial codes rather than effector-related spatial codes. References Bonato M, Zorzi M, Umilta´ C (2012) When time is space: Evidence for a mental time line. Neurosci Biobehav Rev 36:2257–2273. doi:10.1016/j.neubiorev.2012.08.007 Boroditsky L, Fuhrman O, McCormick K (2010) Do English and Mandarin speakers think about time differently? Cognition 118:123–129. doi:10.1016/j.cognition.2010.09.010 Kintsch, W. (1998) Comprehension: A paradigm for cognition. Cambridge University Press, New York Rolke B, Ruiz Ferna´ndez S, Schmid M, Walker M, Lachmair M, Rahona Lo´pez JJ, Herva´s G, Va´zquez C (2013) Priming the mental time-line: Effects of modality and processing mode. Cogn Process 14:231–244. doi:10.1007/s10339-013-0537-5 Santiago J, Lupia´n˜ez J, Pe´rez E, Funes MJ (2007) Time (also) flies from left to right. Psychon Bull Rev 14:512–516. doi: 10.1007/s10339-013-0537-5
Is the novelty-P3 suitable for indexing mental workload in steering tasks? Menja Scheer, Heinrich H. Bu¨lthoff, Lewis L. Chuang Max Planck Institute for Biological Cybernetics, Tu¨bingen, Germany Difficulties experienced in steering a vehicle can be expected to place a demand on one’s mental resources (O’Donnell, Eggemeier 1986). While the extent of this mental workload (MWL) can be estimated by self-reports (e.g., NASA-TLX; Hart, Staveland 1988), it can also be physiologically evaluated in terms of how a primary task taxes a common and limited pool of mental re- sources, to the extent that it reduces the electroencephalographic (EEG) responses to a secondary task (e.g. an auditory oddball task). For example, the participant could be primarily required to control a cursor to track a target while attending to a series of auditory stimuli, which would infrequently present target tones that should be responded to with a button-press (e.g., Wickens, Kramer, Vanasse and Donchin 1983). Infrequently presented targets, termed oddballs, are known to elicit a large positive potential after approximately 300 ms of their presentation (i.e.,P3).
S135 Indeed, increasing tracking difficulty either by decreasing the predictability of the tracked target or by changing the complexity of the controller dynamics has been shown to attenuate P3 responses in the secondary auditory monitoring task (Wickens et al. 1983; Wickens, Kramer and Donchin 1984). In contrast, increasing tracking difficulty—by introducing more frequent direction changes of the tracked target (i.e. including higher frequencies in the function that describes the motion trajectory of the target)—has been shown to bear little influence on the secondary task’s P3 response (Wickens, Israel and Donchin 1977; Isreal, Chesney, Wickens and Donchin 1980). Overall, the added requirement of a steering task consistently results in a lower P3 amplitude, relative to performing auditory monitoring alone (Wickens et al. 1983; Wickens et al. 1977; Isreal et al. 1980). Using a dual-task paradigm for indexing workload is not ideal. First, it requires participants to perform a secondary task. This prevents it from being applied in real-world scenarios; users cannot be expected to perform an unnecessary task that could compromise their critical work performance. Second, it can only be expected to work if the performance of the secondary task relies on the same mental resources as those of the primary task (Wickens, Yeh 1983), requiring a deliberate choice of the secondary task. Thus, it is fortunate that more recent studies have demonstrated that P3 amplitudes can be sensitive to MWL, even if the auditory oddball is ignored (Ullsperger, Freude and Erdmann 2001; Allison, Polich 2008). This effect is said to induce a momentary and involuntary shift in general attention, especially if recognizable sounds (e.g. a dog bark, opposed to a pure sound) are used (Miller, Rietschel, McDonald and Hatfield 2011). The current work, containing two experiments, investigates the conditions that would allow ‘novelty-P3’, the P3 elicited by the ignored, recognizable oddball, to be an effective index for the MWL of compensatory tracking. Compensatory tracking is a basic steering task that can be generalized to most implementations of vehicular control. In both experiments participants were required to use a joystick to counteract disturbances of a horizontal plane. To evaluate the generalizability of this paradigm, we depicted this horizontal plane as either a line in a simplified visualization or as the horizon in a realworld environment. In the latter, participants experienced a large field-of-view perspective of the outside world from the cockpit of an aircraft that rotated erratically about its heading axis. The task was the same regardless of the visualization. In both experiments, we employed a full factorial design for the visualization (instrument, world) and 3 oddball paradigms (in experiment 1) or 4 levels of task difficulty (in experiment 2) respectively. Two sessions were conducted on separate days for the different visualizations, which were counter-balanced for order. Three trials were presented per oddball paradigm (experiment 1) or level of task difficulty (experiment 2) in blocks, which were randomized for order. Overall, we found that steering performance was worse when the visualization was provided by a realistic world environment in experiments 1 (F (1, 11) = 42.8, p \ 0.01) and 2 (F (1, 13) = 35.0, p \ 0.01). Nonetheless, this manipulation of visualization had no consequence on our participants’ MWL as evaluated by a post-experimental questionnaire (i.e., NASATLX) and EEG responses. This suggests that MWL was unaffected by our choice of visualization. The first experiment, with 12 participants, was designed to identify the optimal presentation paradigm of the auditory oddball. For the EEG analysis, two participants had to be excluded, due to noisy electrophysiological recordings (more than 50 % of rejected epochs). Whilst performing the tracking task, participants were presented with a sequence of auditory stimuli that they were instructed to ignore. This sequence would, in the 1-stimulus paradigm, only contain the infrequent odd- ball stimulus (i.e., the familiar sound of a dog’s bark (Fabiani, Kazmerski, Cycowicz and Friedmann 1996)). In the 2-stimulus paradigm this infrequently presented oddball (0.1) is accompanied by a more frequently presented pure tone (0.9) and in
123
S136 the 3-stimulus paradigm the infrequently presented oddball (0.1) is accompanied by a more frequently presented pure tone (0.8) and an infrequently presented pure tone (0.1). These three paradigms are widely used in P3 research (Katayama, Polich 1996). It should be noted, however, that the target to target interval is 20 s regardless of the paradigm. To obtain the ERPs the epochs from 100 ms before to 900 ms after the onset of the recognizable oddball stimulus, were averaged. Mean amplitude measurements were obtained in a 60 ms window, centered at the group- mean peak latency for the largest positive maximum component between 250 and 400 ms for the oddball P3, for each of the three mid-line electrode channels of interest (i.e., Fz, Cz, Pz). In agreement with previous work, the novelty-P3 response is smaller when participants had to perform the tracking task compared to when they were only presented with the task-irrelevant auditory stimuli, without the tracking task (F (1, 9) = 10.9, p \ 0.01). However, the amplitude of the novelty-P3 differed significantly across the presentation paradigms (F (2, 18) = 5.3, p \ 0.05), whereby the largest response to our task-irrelevant stimuli was elicited by the 1- stimulus oddball paradigm. This suggests that the 1-stimulus oddball paradigm is most likely to elicit novelty-P3 s that are sensitive to changes in MWL. Finally, the attenuation of novelty-P3 amplitudes by the tracking task varied across the three mid-line electrodes (F (2, 18) = 28.0, p \ 0.001). Pairwise comparison, Bonferroni corrected for multiple comparisons, revealed P3 amplitude to be largest at Cz, followed by Fz and smallest at Pz (all p \ 0.05). This stands in contrast with previous work that found control difficulty to attenuate P3 responses in parietal electrodes (cf., Isreal et al. 1980; Wickens et al. 1983). Thus, the current paradigm that uses a recognizable, ignored sound is likely to reflect an underlying process that is different from previous studies, which could be more sensitive to the MWL demands of a tracking task. Given the result of experiment 1, the second experiment with 14 participants, investigated whether the 1-stimulus oddball paradigm would be sufficiently sensitive in indexing tracking difficulty as defined by the bandwidth of frequencies that contributed to the disturbance of the horizontal plane (cf., Isreal et al. 1980). Three different bandwidth profiles (easy, medium, hard) defined the linear increase in the amount of disturbance that had to be compensated for. This manipulation was effective in increasing subjective MWL, according to the results of a post- experimental NASA-TLX questionnaire (F (2, 26) = 14.9, p \ 0.001) and demonstrated the expected linear trend (F (1, 13) = 23.2, p \ 0.001). This increase in control effort was also reflected in the amount of joystick activity, which grew linearly across the difficulty conditions (F (1, 13) = 42.2, p \ 0.001). For the EEG analysis two participants had to be excluded due to noisy electrophysiological recordings (more than 50 % of rejected epochs). A planned contrast revealed that the novelty- P3 was significantly lower in the most difficult condition compared to the baseline viewing condition, where no tracking was done (F (1, 11) = 5.2, p \ 0.05; see Fig. 1a). Nonetheless, novelty-P3 did not differ significantly between the difficulty conditions (F (2, 22) = 0.13, p = 0.88), nor did it show the expected linear trend (F (1, 11) = 0.02, p = 0.91). Like (Isreal et al. 1980), we find that EEGresponses do not discriminate for MWL that is associated with controlling increased disturbances. It remains to be investigated, whether the novelty-P3 is sensitive for the complexity of controller dynamics, like it has been shown for the P3. The power spectral density of the EEG data around 10 Hz (i.e., alpha) has been suggested by (Smith, Gevins 2005) to index MWL. A post hoc analysis of our current data, at electrode Pz, revealed that alpha power was significantly lower for the medium and hard conditions, relative to the view-only condition (F (1, 11) = 6.081, p \ 0.05; (F (1, 11) = 6.282, p \ 0.05). Nonetheless, the expected linear trend across tracking difficulty was not significant (Fig. 1b). To conclude, the current results suggest that a 1-stimulus oddball task ought to be preferred when measuring general MWL with the
123
Cogn Process (2014) 15 (Suppl 1):S1–S158
Fig. 1 a left Grand average ERP data of Experiment 2 averaged over Fz, Cz, Pz; right averaged amplitude of P3 as function of tracking difficulty. b left Averaged power spectral density (PSD) at Pz; right averaged PSD as a function of tracking difficulty novelty-P3. Although changes in novelty-P3 can identify the control effort required in our compensatory tracking task, it is not sufficiently sensitive to provide a graded response across different levels of disturbances. In this regard, it may not be as effective as self-reports and joystick activity in denoting control effort. Nonetheless, further research can improve upon the sensitivity of EEG metrics to MWL by investigating other aspects that better correlate to the specific demands of a steering task. Acknowledgments The work in this paper was supported by the myCopter project, funded by the European Commission under the 7th Framework Program. References Allison BZ, Polich J (2008) Workload assessment of computer gaming using a single-stimulus event-related potential paradigm. Biol Psychol 77 (3):277–283 Fabiani M, Kazmerski V, Cycowicz Y, Friedmann, D. (1996) Naming norms for brief environmental sounds. Psychol Rev 33:462–475 Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index). Results of empirical and theoretical research Isreal JB, Chesney GL, Wickens CD, Donchin E (1980) P300 and tracking difficulty: evidence for multiple resources in dual-task performance. Psychophysiology 17 (3):259–273 Katayama J, Polich J (1996) P300 from one-, two-, and three-stimulus auditory paradigms. Int J Psychophysiol 23, 33–40 Miller MW, Rietschel JC, McDonald CG, Hatfield BD (2011) A novel approach to the physiological measurement of mental workload. Int J Psychophysiol 80 (1):75–78 O’Donnell RC, Eggemeier TF (1986) Workload assessment methodology. Handbook of Perception and Human Performance, 2:1–49 Smith ME, Gevins A (2005) Neurophysiologic monitoring of mental workload and fatigue during operation of a flight simulator. Defense and Security (International Society for Optics and Photonics) 116–126
Cogn Process (2014) 15 (Suppl 1):S1–S158 Ullsperger P, Freude G, Erdmann U (2001). Auditory probe sensitivity to mental workload changes—an event-related potential study. Int J Psychophysiol 40 (3):201–209 Wickens CD, Kramer AF, Vanasse L, Donchin E (1983) Performance of concurrent tasks: a psychophysiological analysis of the reciprocity of information-processing resources. Science 221 (4615):1080–1082 Wickens CD, Israel J, Donchin E (1977) The event related potential as an index of task workload. Proceedings of the Human Factors Society Annual Meeting 21, 282–286 Wickens CD, Kramer AF, Donchin E (1984). The event- related potential as an index of the processing demands of a complex target acquisition task. Annals of the New York Academy of Sciences 425 (955610):295–299 Wickens CD, Yeh Y-Y (1983) The dissociation between subjective workload and performance: A multiple resource approach. In: Proceedings of the human factors and ergonomics society annual meeting, 27(3):244–248
Modeling perspective-taking by forecasting 3D biological motion sequences Fabian Schrodt, Martin V. Butz Cognitive Modeling, Computer Science Department, University of Tu¨bingen, Germany Abstract The mirror neuron system (MNS) is believed to be involved in social abilities like empathy and imitation. While several brain regions have been linked to the MNS, it remains unclear how the mirror neuron property itself develops. Previously, we have introduced a recurrent neural network, which enables mirror-neuron capabilities by learning an embodied, scale- and translation-invariant model of biological motion (BM). The model allows the derivation of the orientation of observed BM by (i) segmenting BM in a common positional and angular space and (ii) generating short-term, top-down predictions of subsequent motion. While our previous model generated short-term motion predictions, here we introduce a novel forecasting algorithm, which explicitly predicts sequences of BM segments. We show that the model scales on a 3D simulation of a humanoid walking and is robust against variations in body morphology and postural control. Keywords Perspective Taking; Embodiment; Biological Motion; Self-Supervised Learning; Sequence Forecasting; Mirror-Neurons; Recurrent Neural Networks Introduction This paper investigates how we may be able to recognize BM sequences and mentally transform them to the egocentric frame of reference to bootstrap mirror neuron properties. Our adaptive, selfsupervised, recurrent neural network model (Schrodt et al. 2014) might contribute to the understanding of the MNS and its implied capabilities. With the previous model, we were able to generate continuous mental rotations to learned canonical views of observed 2D BM—essentially taking on the perspective of an observed person. This self-supervised perspective taking was accomplished by backpropagating errors stemming from top-down, short-term predictions of the BM progression. In this work, we introduce an alternative or complementary, timeindependent forecasting mechanism of motion segment sequences to the model. In the brain, prediction and forecasting mechanisms may be realized by the cerebellum, which is involved in the processing of BM (Grossman et al. 2000). In addition, it has been suggested that the cerebellum may also support the segmentation of motion patterns via
S137 the basal ganglia, thereby influencing the learning of motor sequences in parietal and (pre-)motor cortical areas (Penhune and Steele 2012). Along these lines, the proposed model learns to predict segments of motion patterns given embodied, sensorimotor motion signals. Due to the resulting perspective taking capabilities, the model essentially offers a mechanism to activate mirror neuron capabilities. Neural Network Model The model consists of three successive stages illustrated in the overview given in Fig. 1. The first stage processes relative positional and angular values into mentally rotated, motion-direction sensitive population codes. The second stage performs a modulatory normalization and pooling of those. Stage III is a self- supervised pattern segmentation network with sequence forecasting, which enables the back-propagation of forecast errors. We detail the three stages and the involved techniques in the following sections. Stage I: Feature Preprocessing The input of the network is driven by a number of (not necessarily all) relative joint positions and joint angles of a person. Initially, the network can be driven by self-perception to establish an egocentric perspective on self-motion. In this case, the relative joint positions may be perceived visually, while the perception of the joint angles may be supported by proprioception in addition to vision. When actions of others are observed, joint angles may be solely identified visually. In each single interstage Ia in the relative position pathway, a single, positional body landmark relation is transformed into a directional velocity by time-delayed inhibition, in which way the model becomes translation-invariant. Interstage Ib implements a mental rotation of the resulting directional velocity signals using a neural rotation module Rl. It is driven by auto-adaptive mental rotation angles (Euler angles in a 3D space), which are implemented by bias neurons. The rotational module and its influence on the directional velocity signals are realized by gain field-like modulations of neural populations (Andersen et al. 1985). All positional processing stages apply the same mental rotation Rl, by which multiple error signals can be merged at the module. This enables orientationinvariance on adequate adaptation of the module’s biases. In interstage Ic, each (rotated) D-dimensional directional motion feature is convolved into a population of 3D - 1 direction-responsive neurons.
Fig. 1 Overview of the three-stage neural modeling approach in a 3D example with 12 joint positions and 8 joint angles, resulting in n = 20 features. Boxes numbered with m indicate layers consisting of m neurons. Black arrows describe weighted forward connections, while circled arrowheads indicate modulations. Dashed lines denote recurrent connections. Red arrows indicate the flow of the error signals
123
S138
Cogn Process (2014) 15 (Suppl 1):S1–S158
The processing of each one-dimensional angular information is done analogously, resulting in 2-dimensional population codes. A rotation mechanism (inter-stage Ib) is not necessary for angles and thus not applied. In summary, stage I provides a population of neurons for each feature of sensory processing, which is either sensitive to directional changes in a body-relative limb position (26 neurons for each 3D position) or sensitive to directional changes in angles between limbs (2 neurons for each angle). Stage II: Normalization and Pooling Stage II first implements individual activity normalizations in the direction-sensitive populations. Consequently, the magnitude of activity is generalized over, by which the model becomes scale- and velocity-invariant. Normalization of a layer’s activity-vector can be achieved by axo-axonic modulations, using a single, layer-specific normalizing neuron (shown as circles in Fig. 1). Next, all normalized direction-sensitive fields are merged by one-to-one connections to a pooling layer, which serves as the input to stage III. To also normalize thepactivity of the pooling layer, the connections are weighted by ffiffiffi 1= n, where n denotes the number of features being processed. Stage III: Correlation Learning Stage III realizes a clustering of the normalized and pooled information from stage II (indexed by i) over time by instar weights fully connected to a number of pattern-responsive neurons (indexed j). Thus, each pattern neuron represents a unique constellation of positional and angular directional movements. For pattern learning, we use the Hebbian inspired instar learning rule (Grossberg 1976). To avoid a ‘‘catastrophic forgetting’’ of patterns, we use winner-takes-all competitive learning in the sense that only the weights to the most active pattern neuron are adapted. We bootstrap the weights from scratch by adding neural noise to the input of each pattern neuron, which consequently activates Hebbian learning of novel input patterns. The relative influence of neural noise decreases while a patternsensitive neuron is learned (cf. Schrodt et al. 2014). In contrast to our previous, short-term prediction approach, here we apply a time-independent forecasting algorithm (replacing the attentional gain control mechanism). This is realized by feedback connections wji from the pattern layer to the pooling layer, which are trained to approximate the input neti of the pooling layer neurons: 1 owji ðtÞ ¼ Dwji ðtÞ ¼ neti ðtÞ wji ðtÞ; g ot
ð1Þ
where neuron j is the last winner neuron that differed from the current winner in the pattern layer. In consequence, the outgoing weight vector of a pattern neuron forecasts the input to the pooling layer while the next pattern neuron is active. The forecasting error can be backpropagated through the network to adapt the mental transformation for error minimization (cf. red arrows in Fig. 1). Thus, perspective adaptation is driven by the difference between the forecasted and actually perceived motion. The difference di is directly fed into the pooling layer by the outstar weights: diðtÞ ¼ Dwji ðtÞ;
Fig. 2 Variants of the simulated walker
ð2Þ
where j again refers to the preceding winner. Experiments In this section, we first introduce the 3D simulation we implemented to evaluate our model. We then show that after training on the simulated movement, the learned angular and positional correlations can be exploited to take on the perspective of another person that currently executes a similar motion pattern. The reported results are averaged over 100 independent runs (training and evaluating the network starting with different random number generator seeds). Simulation and Setup We implemented a 3D simulation of a humanoid walking with 10 angular DOF. The movement is cyclic with a period of 200 time steps
123
(corresponding to one left and one right walking step). The simulation provides the 3D positions of all 12 limb endpoints relative to the body’s center x1 . . .x12 as well as 8 angles a1 . . .a8 between limbs (inner rotations of limbs are not considered). The view of the walker can be rotated arbitrarily before serving as visual input to the model. Furthermore, the simulation allows the definition of the appearance and postural control of the walker. Each of the implied parameters (body scale, torso height, width of shoulders/hips and length of arms/ legs, as well as minimum/maximum amplitude of joint angles on movement) can be varied to log-normally distributed variants of an average walker, which exhibits either female or male proportions. Randomly sampled resulting walkers are shown in Fig. 2. Perspective-Taking on Action Observation with Morphological Variance We first trained the model on the egocentric perspective of the average male walker for 40 k time steps. The rotation biases were kept fixed since no mental rotation has to be applied during selfperception. In consequence, a cyclic series of 4 to 11 winner patterns evolved from noise in the pattern layer. Each represents i) a sufficiently linear part of the walking via its instar vector and ii) the next forecasted, sequential part of the movement via its outstar vector. After training, we fed the model with an arbitrarily rotated (uniform distribution in orientation space) view of a novel walker, which was either female or male with 50 % probability. Each default morphology parameter was varied by a log-normal distribution LNð0; r2 Þ with variance r2 = 0.1, postural control parameters were not varied. Instar/outstar learning was disabled from then on, but the mental rotation biases were allowed to adapt according to the backpropagated forecast error to derive the orientation of the shown walker. Figure 3 shows the mismatch of the model’s derived walker orientation, which we term orientation difference (OD), over time. We define the OD by the minimal amount of rotation needed to rotate the derived orientation into the egocentric orientation about the optimal axis of rotation. In result, all trials converged to a negligible OD, which means that the given view of the walker was internally rotated to the previously learned, egocentric orientation. The median remaining OD converged to * 0.15 with quartiles of * ±0.03. The time for the median OD to
Fig. 3 The model aligns its perspective to the orientation of observed walkers with different morphological parameters (starting at t = 200). Blue quartiles, black median
Cogn Process (2014) 15 (Suppl 1):S1–S158
S139 Grossberg S (1976) on the development of feature detectors in the visual cortex with applications to learning and reaction–diffusion systems. Biological Cybernetics 21(3):145–159 Grossman E, Donnelly M, Price R, Pickens D, Morgan V, Neighbor G, Blake R (2000) Brain areas involved in perception of biological motion. Journal of cognitive neuroscience 12(5):711–720 Penhune VB, Steele CJ (2012) Parallel contributions of cerebellar, striatal and m1 mechanisms to motor sequence learning. Behavioral brain research 226(2):579–591 Schrodt F, Layher G, Neumann H, Butz MV (2014) Modeling perspective-taking by correlating visual and proprioceptive dynamics. In: 36th Annual Conference of the Cognitive Science Society, Conference Proceedings
Fig. 4 The model aligns its perspective to the orientation of observed walkers with different postural control parameters
fall short of 1 was 120 time steps. These results show that morphological differences between the self-perceived and observed walkers could be generalized over. This is because the model’s scale-invariance applies to every positional relation perceived by the model. Perspective-Taking on Action Observation with Postural Control Variance In this experiment, we varied the postural control parameters of the simulation on action observation by a log-normal distribution with variance r2 = 0.1, instead of the morphological parameters. Again, female as well as male walkers were presented. The perspective of all shown walkers could be derived reliably, but with a higher remaining OD of * 0.67 and more distal quartiles of * ± 0.32. The median OD took longer to fall short of 1, namely 154 time steps. This is because the directions of joint motion are influenced by angular parameters. Still, variations in postural control could largely be generalized over (Fig. 4). Conclusions and Future Work The results have shown that the developed model is able to recognize novel perspectives on BM independent from morphological and largely independent from posture control variations. With the previous model, motion segments are also recognized if their input sequence is reordered, such that additional, implicitly learned attractors may exist for the perspective derivation. The introduced, explicit learning of pattern sequences forces the model to deduce the correct perspective by predicting the patterns of the next motion segment rather than the current one. It may well be the case, however, that the combination of both predictive mechanisms may generate even more robust results. Future work needs to evaluate the current model capabilities and limitations as well as possible combinations of the prediction mechanisms further. Currently, we are investigating how missing or incomplete data could be derived by our model during action observation. We believe that the introduced model may help to infer the current goals of an actor during action observation somewhat independent of the current perspective. Experimental psychological and further cognitive modeling studies may examine the influence of motor sequence learning on the recognition of BM and the inference of goals. Also, an additional, dynamics-based modulatory module could be incorporated, which could be used to deduce emotional properties of the derived motion—and could thus bootstrap capabilities related to empathy. These advancements could pave the way for the creation of a model on the development of a mirror neuron system that supports learning by imitation and is capable of inferring goals, intentions, and even emotions from observed BM patterns. References Andersen RA, Essick GK, Siegel RM (1985) Encoding of spatial location by posterior parietal neurons. Science 230(4724):456–458
Matching quantifiers or building models? Syllogistic reasoning with generalized quantifiers Eva-Maria Steinlein, Marco Ragni Center for Cognitive Science, University of Freiburg, Germany Abstract Assertions in the thoroughly investigated domain of classical syllogistic reasoning are formed using one of the four quantifiers: all, some, some not, or none. In everyday communication, meanwhile, set-based quantifiers like most and frequency-based quantifiers such as normally are more often used. However, little progress has been made in finding a psychological theory that considers such quantifiers. This article adapts two theories for reasoning with these quantifiers: the Matching-Hypothesis and a variant of the Mental Model Theory. Both theories are evaluated experimentally in a syllogistic reasoning task. Results indicate a superiority of the model-based approach. Semantic differences between the quantifiers most and normally are discussed. Keywords Reasoning, Syllogisms, Matching-Hypothesis, Mental Models, Minimal Models Introduction Consider the following example: All trains to Bayreuth are local trains. Normally local trains are on time. What follows? You might infer that, normally, trains to Bayreuth are on time—at least this is what participants in our experiments tend to do. And, in absence of other information, it might be even sensible to do so. However, if you understand the second assertion as ‘‘Normally local trains in Germany are on time’’ then the local trains to Bayreuth could be an exception. So while different conclusions are possible, none of them is necessarily true. Hence no valid conclusion (NV) follows, but participants rarely give this logically correct answer. Problems like the example above consisting of two quantified premises are called syllogisms. The classical quantifiers all, some, some not, and none have been criticized for being too strict or uninformative, respectively, (Pfeifer 2006) and thus being infrequently used in natural language. Hence so-called generalized quantifiers like most and few have been introduced and investigated in this field (Chater, Oaksford 1999). In our study we additionally included the frequency-based term normally that is used in non-monotonic reasoning. Non-monotonic reasoning (Brewka, Niemela¨ and Truszczynski 2007) deals with rules that describe what is usually the case, but do not necessarily hold without exception. The terms of a syllogistic problem can be in one of four possible figures (Khemlani, Johnson-Laird 2012). We focus on two: Figure I,
123
S140
Cogn Process (2014) 15 (Suppl 1):S1–S158
the order A-B and B–C (the example above is of this type with A = ‘‘trains to Bayreuth’’, B = ‘‘local trains’’, and C = ‘‘are on time’’), and Figure IV, the term order ‘‘B-A B–C’’. While Figure I allows for a transitive rule to be applied, Figure IV does not. Additionally, conclusions can be drawn in two directions, relating A to C (A-C conclusion) or C to A (C-A conclusion). Several theories of classical syllogistic reasoning have been postulated based on formal rules (e.g. (Rips 1994)), mental models (e.g. Bucciarelli, JohnsonLaird 1999), or heuristics (e.g. Chater, Oaksford 1999). However, none of them provides a satisfying account of native participants’ syllogistic reasoning behavior (Khemlani, Johnson-Laird 2012). While most theories only provide predictions for reasoning with the classical quantifiers, some theories apply equally to generalized quantifiers. One of the most important approaches in this field is the Probability Heuristics Model (PHM) introduced by Chater and Oaksford (1999). It states that reasoners solve syllogisms by simple heuristics, approximating a probabilistic procedure. Within this framework, generalized quantifiers like most are treated as probabilities of certain events or features. Another theory to explain human syllogistic reasoning is the Matching Hypothesis (Wetherick, Gilhooly 1995), which states that the choice of the quantifier for the conclusion matches the most conservative quantifier contained in the premises. Extending this approach with most and normally could result in the order: All \ Normally \ Most \ Some = Some not \ None Considering the example above from this perspective, normally is preferred over all; hence a reasoner would, incorrectly, respond that normally trains to Bayreuth are on time. Do people actually reason when confronted with syllogisms or are responses the result of a superficial automatic process, as suggested by the Matching-Hypothesis? Mental Models are an approach that assumes individuals engage in a reasoning process, thus allowing for more sophisticated responses. Yet individual differences exist. Therefore, we suggest Minimal Models as a hybrid approach, combining mental models and heuristics. It is assumed that a deductive process based on an initial model is guided by the most conservative quantifier of the premises. Reasoners will try to verify this quantifier in the initial model, which is minimal with respect to the number of individuals represented, and tend to formulate a conclusion containing this quantifier. For example, for the syllogism ‘‘Most A are B’’, ‘‘Some B are C’’, some is more conservative and tested in the following initial (minimal) model (left):
Some holds in this model, thus the preferred conclusion is ‘‘Some A are C’’. While this answer is dominated by heuristics (corresponding
to what is known as System 1), some reasoners may engage in a more sophisticated reasoning process (System 2) consisting of the construction of alternative models in order to falsify the initial conclusion. An example for an alternative model of the syllogism is given above. With such an alternative model in mind, the reasoner will arrive at the valid response, i.e. in this case NV. Empirical Investigation Hypothesis. We tested whether our extension of the MatchingHypothesis or Minimal Models provide a more accurate account of human syllogistic reasoning. The PHM was not included in our analysis, as it does not provide any predictions for reasoning with the quantifier normally. For our experiment, we assume that Minimal Models make better predictions for the response behavior of naive participants, because 1) they allow for effects of figure, i.e. responses may vary depending on the order of terms in the premises and 2) they not only predict heuristic System 1 responses, but also System 2 responses which are logically valid and are often not conform to System 1 responses. Therefore, we hypothesize that Minimal Models show a higher hit rate and more correct rejections than the Matching-Hypothesis. Furthermore, System 2 responses should occur as predicted by Minimal Models. In addition to this comparison of theories, we explored the semantics of the quantifiers most and normally empirically. Participants. Fifty-eight native English speakers (21 m, 37 f; mean(age) = 35.5) participated in this online experiment. They were recruited via Amazon Mechanical Turk and come from a variety of educational and occupational backgrounds. Design & Procedure. In this online experiment participants were asked to generate conclusions to 31 syllogisms in Figures I and IV reflecting all combinations of the quantifiers all, some, most, and normally (the simple syllogism AA in Figure I was omitted, as it was used as an explanatory example in the instruction). Both premises were presented simultaneously, together with the question ‘‘What follows?’’ Participants could either fill in a quantifier and the terms (X, N, V), or write nothing (meaning NV) in the response field. After completing this production task, participants were asked about their understanding of the four quantifiers. For each quantifier they had to complete a statement of the following form: ‘‘If someone says [quantifier] it refers to a minimum of… out of 100 objects.’’ Note that we asked for the minimum, as the lower bounds of the quantifiers are of greater importance to understanding the semantics of these specific quantifiers. Results Overall calculations show only a small, but significant difference (Wilcoxon test, z = 2.11, p = .018) between Minimal Models (67.9 % of responses predicted) and the Matching-Hypothesis (64.5 %). This trend was confirmed by the more fine-grained analysis of hits and correct rejections following a method introduced by (Khemlani, Johnson-Laird 2012): Theoretical predictions are compared to significant choices (as shown in Tables 1 and 2) and hits (i.e. choices that are predicted by the respective theory) and correct rejections are counted. In this analysis, Minimal Models perform better in both categories, hits (90.1 vs. 78.1 %; Wilcoxon test, z = 2.99, p = .001) and correct rejections (92.1 vs. 87.5 %;
Table 1 Significant choices for Figure I and the percentage of participants who drew these conclusions First premise
Second premise All [A]
Some [I]
Most [M]
Normally [N]
All [A]
–
I (78 %)
M (72 %)
N (60 %)
Some [I]
I (78 %)
I (79 %)
I (67 %)
I (69 %)
Most [M]
M (74 %)
I (74 %)
M (67 %)
M (33 %), I (28 %), N (26 %)
Normally [N]
N (69 %)
I (79 %)
I (33 %), M (26 %), N (19 %)
N (69 %)
123
Cogn Process (2014) 15 (Suppl 1):S1–S158
S141
Table 2 Significant choices for Figure IV and the percentage of participants who drew these conclusions First premise
Second premise All [A]
Some [I]
Most [M]
Normally [N]
All [A]
A (74 %)
I (66 %)
M (50 %)
N (50 %)
Some [I]
I (55 %), I*(21 %)
I (57 %), NV (36 %)
I (48 %), NV (31 %)
I (43 %), NV (31 %)
Most [M]
M (45 %), M*(17 %)
I (50 %), NV (24 %)
M (50 %), NV (22 %), I (21 %)
N (24 %), NV (24 %), M (22 %), I (21 %)
Normally [N]
N (41 %), N*(17 %)
I (57 %), NV (31 %)
M (26 %), I (22 %), N (17 %), NV (17 %)
N (59 %), NV (24 %)
Conclusions marked with * are conclusions in C-A direction, all others are in A-C direction. NV = not valid conclusion Wilcoxon test, z = 2.65, p = .004). According to our Minimal Model approach, for 26 tasks System 2 leads to responses differing from the heuristic ones. In eight cases, this prediction was confirmed by the data, i.e., in those cases a significant proportion of participants drew the respective System 2 conclusion. The quantitative interpretation of the quantifiers is depicted in Fig. 1 for the quantifiers some, most, and normally. Note that the values for the quantifier all are not illustrated, as with one exception all participants assigned a minimal value of 100 to it. For several participants normally is equivalent to all, i.e., no exceptions are possible—in contrast to most. The direct comparison of the quantifiers most and normally revealed that, as expected, normally (mean = 75.5) is attributed a significantly (Wilcoxon text, z = 2.39, p = .008) higher value than most (mean = 69.0). Discussion Our distinction between frequency-based quantifiers (e.g., normally) and set-based quantifiers (e.g., most) in reasoning is—to the best of our knowledge—new. Although both, in principle, allow for exceptions, depending on the underlying semantics, four reasoners gave the same semantics for normally as for all. For most all reasoners assumed the possibility for exceptions—possibly applying a principle similar to the Gricean Implicature (Newstead 1995). This principle assumes that whenever we use expressions that allow for exceptions, these can be typically assumed. So far the PHM (Chater, Oaksford 1999) does not provide any predictions for reasoning with the quantifier normally; however, given our quantitative evaluation of this quantifier, the PHM could be
Fig. 1 Individual values (points) and quartiles (lines) of participants’ understanding of the minimum of the quantifiers some, most, and normally
extended to address this issue in further research. Furthermore, for the presented experimental results the theory predictions for the quantifier normally could also be examined using the more fine-grained method of Multinomial Processing Trees (cf. Ragni, Singmann and Steinlein 2014). Non-monotonic reasoning, i.e., reasoning about default assumptions, often uses quantifiers like normally to express knowledge. For instance (Schlechta 1995) characterizes such default assumptions in his formal investigation as generalized quantifiers. Although there are many theories for syllogistic reasoning (Khemlani, Johnson-Laird 2012), there are only few that can be generically extended; among these we focused on the MatchingHypothesis and the Mental Model Theory. In contrast to the Matching-Hypothesis, Mental Model Theory relies on mental representations that can be changed to search for counter-examples (as in relational reasoning cf. Ragni, Knauff 2013) and generate additional predictions by the variation of the models. The findings indicate that an extension of the Matching-Hypothesis to include the set-based quantifier most and the frequency-based quantifier normally lead to an acceptable prediction of the experimental data. There are, however, some empirical findings it cannot explain, e.g., System 2 responses and figural effects in reasoning with generalized quantifiers. It seems that in this case reasoners construct models as representations instead of merely relying on superficial heuristics. Acknowledgments The work has been partially supported by a grant to MR from the DFG within the SPP in project Non-monotonic Reasoning. The authors are grateful to Stephanie Schwenke for proof-reading.
References Brewka G, Niemela¨ I, Truszczynski M (2007) Nonmonotonic reasoning. Handbook of Knowledge Represent 239–284 Bucciarelli M, Johnson-Laird PN (1999) Strategies in syllogistic reasoning. Cogn Sci 23:247–303. doi:10.1016/S0364-0213 (99)00008-7 Chater N, Oaksford M (1999) The probability heuristics model of syllogistic reasoning. Cogn Psychol 38:191–258. doi:10.1006/ cogp.1998.0696 Khemlani S, Johnson-Laird PN (2012) Theories of the syllogism: a metaanalysis. Psychol Bull 138, 427–457. doi:10.1037/a0026841 Newstead S (1995) Gricean implicatures and syllogistic reasoning. J Memory Lang 34 (5):644–664 Pfeifer N (2006) Contemporary syllogistics: Comparative and quantitative syllogisms. In Kreuzbauer G, Dorn GJW (eds) Argumentation in Theorie und Praxis: Philosophie und Didaktik des Argumentierens. LIT, Wien, pp 57–71 Ragni M, Knau M (2013) A theory and a computational model of spatial reasoning with preferred mental models. Psychological Review 120 (3):561–588
123
S142 Ragni M, Singmann H, Steinlein E-M (2014) Theory comparison for generalized quantifiers. In Bello P, Guarini M, McShane M, Scassellati B (eds) Proceedings of the 36th annual conference of the cognitive science society. Cognitive Science Society, Austin, pp 1984–1990 Rips LJ (1994) The psychology of proof: Deductive reasoning in human thinking. The MIT Press, Cambridge Schlechta K (1995) Defaults as generalized quantifiers. J Logic Comput 5 (4):473–494 Wetherick NE, Gilhooly KJ (1995) ‘Atmosphere’, matching, and logic in syllogistic reasoning. Curr Psychol 14:169–178. doi: 10.1007/BF02686906
What if you could build your own landmark? The influence of color, shape, and position on landmark salience Marianne Strickrodt, Thomas Hinterecker, Florian Ro¨ser, Kai Hamburger Experimental Psychology and Cognitive Science, Justus Liebig University Giessen, Germany Abstract This study focused on participants’ preferences for ‘building’ a landmark from eight colors, eight shapes, and four possible landmark positions for aiding the wayfinding of a nonlocal person. It can be suggested that participants did not only select features according to their personal and aesthetic preference (e.g. blue, circle), but also according to a sense of common cognitive availability and utility for learning a route (e.g. red, triangle). Strong preferences for the position of a landmark –namely, before the intersection and in direction of the turn– are in line with other studies investigating position preference from an allocentric view. Keywords Salience, Landmarks, Position, Color, Shape, Feature preference Introduction When travelling in an unknown environment people can use objects such as buildings or trees for memorizing the paths they walk. These objects, also called landmarks, are considered to be important reference points for learning and finding ones way (Lynch 1960). Essentially, almost everything can become a landmark as long as it is salient (Presson, Montello 1988). Sorrows, Hirtle (1999) defined three different landmark saliencies, whose precise definitions change slightly in the literature (e.g. Klippel, Winter 2005; Ro¨ser et al. 2012) but, nevertheless, include the following aspects: • •
•
visual/perceptual salience: physical aspects of an object (e.g. color, shape, size); semantic/cognitive salience: aspects of knowledge and experiences, refers to mental accessibility of an object (i.e. easiness to label an object); structural salience: easiness one can cognitively conceptualize the position of an object.
Speaking of perceptual salience, which is not absolute but contrast-defined, a high contrast of the landmark to the surrounding will lead to easy and fast identification and recognition (Presson, Montello 1988). Nevertheless, given a high contrast, also color preference itself might influence the visual salience of an object. In a non-spatial context, blue to purple colors were found to be preferred whereas ‘yellowish-green’ colors were most disliked (Hurlbert, Ling 2007). The cause of these preferences is discussed in light of different theories and explanations ranging from evolutionary adaption of our
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 visual system (finding the red, ripe fruit) to aspects of ecological valence (experiences we associate with colored objects cause corresponding color preference). In a spatial context, colored environments have been found to enhance wayfinding behavior of children and adults when compared to non-colored environments (Jansen-Osmann, Wiedenbauer 2004). Also, on the level of single colors applied to landmarks along a virtual route, green led to the worst performance in a subsequent recognition task, while yellow, cyan, and red were recognized best (Wahl et al. 2008). Interestingly, even though not being a preferred color per se, yellow seems to be easy to recognize, therefore, probably helpful for remembering a path and learning the surrounding. Thus, it seems to be important to differentiate between a personal preference for color and the memorability and utility of colors in a spatial context. A shape, such as a square or an ellipse, comprises both, visual and semantic salience –an appearance which is more or less easy to perceive and to reproduce and a mental conceptualization, a label, a meaning. Shapes compared to colors revealed a significantly higher recognition performance (Wahl et al. 2008). Nevertheless, no differences in selecting or recognizing differently shaped landmarks could be found (Ro¨ser et al. 2012; Wahl et al. 2008). Outside the spatial context, Bar, Neta (2006) found that angular objects are less preferred than curved objects. They speculated: sharp contours elicit a sense of threat and lead to a negative bias towards the object. Taken together, these findings again suggest that both, preference and utility of shapes, are to be differentiated, whereby utility might play a more important role when selecting a color in a wayfinding context. Besides these low-level features color and shape, this research concentrates on the position of a landmark at an intersection, covering an important aspect of structural salience, namely, how different positions are conceptualized. When instructed to select one out of four landmarks for a route description, each attached to one of the four corners of an intersection, participants show a clear position preference (Ro¨ser et al. 2012). From an egocentric perspective the positions in direction of turn either before or after the intersection are chosen significantly more often than positions opposite to the direction of turn. With allocentric (map-like) material the position before the intersection and in direction of turn is most preferred. Therefore, what most accounted for an object to be chosen was its location dependent of the direction in which the route continued, not whether it was presented on the left or right. The two types of defining the position of an object at an intersection are visualized in Fig. 3. This study addresses all saliencies with the help of a simple selection task. Participants should choose from color, shape and position to create a landmark, which should be aiding another persons’ way through the same environment. We assume that the landmarks produced by the participants mirror their own implicit sense of a good, salient landmark, which everyone should be able to use. Results might in turn be an indicator for diverging scores of salience within the examined features. By combining the findings of the aforementioned preference and navigation research, we hypothesize that red and blue and the position in front of the intersection, in direction of turn (D) are most frequently chosen. Since shapes seem to induce distinctive preference this might also be reflected in the construction of a landmark, but no clear suggestions can be made at this point. Material and Method Participants The sample consisted of 56 students (46 females) from Giessen University (mean age 24 yrs, SD = 4.5), who received course credits for participation. Normal or corrected to normal vision was required. Materials On an A4 sized paper an allocentric view of a small schematic city area was printed. The region of interest in this area consisted of four orthogonal intersections (Fig. 1). The route (two right and left turns) through this region was displayed by a magenta line. On the four corners of each intersection a quadratic white field indicated an
Cogn Process (2014) 15 (Suppl 1):S1–S158
S143
color assignments [%]
30 25
22.77
20
17.86
17.41
*
14.29
15
12.95
uniform distribution
10.27 10
*
*
3.57
5
0.89 0 red
green
21.88
21.43
blue
yellow
violet
orange
black
white
Fig. 1 Schematic city area and range of colors and shapes participants could choose from as presented to the participants. The route from start (‘‘Start’’) to destination (‘‘Ziel’’) is indicated by a dashed line. White quadratic fields are the optional locations for the created landmarks optional location for a landmark. Participants could choose from eight colors (violet, blue, green, yellow, orange, red) or luminances (white, black), respectively, and eight shapes (diamond, hexagon, square, rhomboid, ellipse, circle, cross, triangle). Procedure Instructions were given on a separate paper. Only one landmark should be ‘built’ for every intersection and each color and shape could only be used once. The landmark was to be positioned at one of the four corners of an intersection. The shape, therefore, had to be drawn with the selected color in one of the four white corners of an intersection. The task was to use the subjectively preferred combinations to build the landmarks in order to facilitate wayfinding for a notional, nonlocal person. Participants were instructed to imagine giving a verbal route description to this nonlocal person, including their built landmarks. Results Overall 224 decisions for shapes, colors, and positions, respectively (56 participants * 4 landmarks to build), were analyzed with nonparametric Chi Square tests. Frequencies for selection of shapes and colors can be seen in Fig. 2. When analyzing single colors (Bonferroni correction a = .006), red was significantly above uniform distribution (v2(1) = 21.592, p \ .001), black (v2(1) = 16.327, p \ .001) and white (v2(1) = 27.592, p \ .001) were below. Regarding the shapes, results show that participants have a significant preference for the triangle (v2(1) = 18, p \ .001) and the circle (v2(1) = 16.327, p \ .001). On the other hand, ellipse (v2(1) = 13.224, p = .001), hexagon (v2(1) = 14.735, p \ .001), and rhomboid (v2(1) = 19.755, p \ .001) were rarely chosen at all. Green, blue, yellow, violet, orange, and square, diamond and cross did not deviate from average frequencies. Figure 3 and Table 1 comprise findings comparing landmark positions. When focusing on landmark positions dependent of the direction of turn, it could be shown that position D –in front of the intersection, in direction of turn– is by far most frequently selected (71.88 %), followed by the other, associated position lying in direction of turn but behind the intersection –position B (25 %). Positions opposite to the direction of turn (A and C) lag far behind, suggesting that the significant difference between direction independent positions in front of and behind the intersection (1 and 2 against 3 and 4) is solely driven by the popularity of D.
shape assignments [%]
30 25 20 15
17.86
*
*
14.73 12.95
uniform distribution
10 4.46
5
*
*
4.02
* 2.68
Fig. 2 Relative frequency of selected colors and shapes and their deviation from uniform distribution (dashed line)
a| Position independent of direction of turn
Position dependent of |b direction of turn
1
2
A
B
12.5%
14.29%
1.34%
25%
3
4
C
D
37.5%
35.71%
1.79%
71.88%
Fig. 3 Relative frequency of selected positions a independent of direction of turn, 1 behind, left; 2 behind, right; 3 in front, left; 4 in front, right. b dependent of direction of turn, A behind, opposite; B behind, in; C in front, opposite; D in front, in. Note that the right figure includes both, right and left (transposed to right) direction of turns
Discussion This study examined the selection of three different landmark features, namely color, shape, and location. Participants were instructed to select according to their own persuasion what kind of landmark is most qualified to aid a nonlocal person to find her way following a route description. Most favored by the participants was the color red (followed by green and blue, which, due to a error correction did not differ from chance). The least preferred ‘colors’ were the luminances black and white. As for shapes triangle and circle were most frequently selected (ensued by square, although, without significant difference from chance). Least preferred were ellipse, hexagon, and rhomboid. A significant prominence of the position D was found.
123
S144
Cogn Process (2014) 15 (Suppl 1):S1–S158
Table 1 Multiple Chi Square comparisons for the two types of definition for landmark location Independ. 1–2
v2(1) 0.267
p
Depend. .699
A–B
v2(1) 47.610
p \.001*
1–3
28
\.001*
A–C
0.143
1–4
25.037
\.001*
A–D
152.220
\.001*
2–3
23.310
\.001*
B–C
45.067
\.001*
2–4
20.571
\.001*
B–D
50.806
\.001*
3–4
0.098
.815
C–D
149.388
\.001*
1.000
v2 value and significance are shown. (Bonferroni correction a = .008)
The neglect of the luminances black and white is in line with the assumptions concerning visual salience, namely, that a low contrast to the grey and white background of the experimental material is not preferable in a wayfinding context. Results suggest that participants were aware of the positive impact of contrast. Interestingly, neither former results of color preferences (Hurlbert, Ling 2007) nor benefit in recognition (Wahl et al. 2008) are perfectly mirrored in our data, suggesting that selection process was not based on either of these levels. Instead, it seems to be plausible to suggest a selection strategy preferring landmark features according to familiarity. As red, blue, and green constitute the three primary colors every western pupil gets taught in school and as they are probably the most used colors in street signs, they might also be best established and conceptualized in the knowledge of an average person, selecting these colors. For the visual and semantic salience of shapes a similar explanation may be consulted. Shapes are preferred, which are highly common and easy to identify by everyone: triangles and circles. Furthermore, the low complexity of these shapes compared to rhomboid or hexagon might have affected the selection as well. It seems that the sharpness of the contour of an object was immaterial in this task. The clearest and most reliable result is the preference for position D (allocentric), the position before the intersection and in direction of the turn. Also Waller, Lippa (2007) pointed out the advantages of landmarks in directions of turn as they serve as beacons (as compared to associative cues). Merely recognizing these landmarks is sufficient to know where to go, since their position reveals the correct direction response at an intersection. Overall, it seems that participants did not choose object properties according to a mere personal feature preference. Their selection process probably involved preference with respect to perceptibility, easiness of memorization, and usability in terms of wayfinding (‘this works fine as a landmark’). To what extent the selection was based on a conscious or unconscious process can’t be determined here. Also, if the fact of guiding another person (compared to oneself) played an important role in creating a landmark can’t be sufficiently answered at this point. Furthermore, if these preferences really help people to learn a route faster or easier yet is another question. For the task of building a landmark, which shall aid other people to find the same way, we found evidence that people show clear preference for best-known and most common colors and shapes. Moreover, the high frequency of the selection of the position before the turn and in the direction of turn is striking. This study, the task of creating a landmark, is a small contribution to the expanding research on visual as well as semantic, and structural salience of landmarks. References Bar M, Neta M (2006) Humans prefer curved visual objects. Psychol Sci 17:645–648 Hurlbert AC, Ling Y (2007) Biological components of sex differences in color preference. Curr Biol 17:R623–R625
123
Jansen-Osmann P, Wiedenbauer G (2004) The representation of landmarks and routes in children and adults: A study in a virtual environment. J Environ Psychol 24:347–357 Klippel A, Winter S (2005) Structural salience of landmarks for route discrimination. In: Cohn AG, Mark D (ed) Spatial information theory. International Conference COSIT. Springer, Berlin Lynch K (1960) The image of the city. MIT Press, Cambridge Presson CC, Montello DR (1988) Points of reference in spatial cognition: Stalking the elusive landmark. Br J Dev Psychol 6:378–381 Ro¨ser F, Krumnack A, Hamburger K, Knauff, M (2012) A four factor model of landmark salience—A new approach. In: Russwinkel N, Drewitz U, van Rijn H (ed) Proceedings of the 11th International Conference on Cognitive Modeling (ICCM). Universita¨tsverlag TU Berlin, Berlin Sorrows ME, Hirtle SC (1999) The nature of landmarks for real and electronic spaces. In: Freksa C, Mark DM (ed) Spatial information theory: cognitive and computational foundations of geographic information science, International Conference COSIT 1999. Springer, Stade Wahl N, Hamburger K, Knauff M (2008) Which properties define the ‘salience’ of landmarks for navigation?—An empirical investigation of shape, color and intention. International Conference Spatial Cognition 2008, Freiburg Waller D, Lippa Y (2007) Landmarks as beacons and associative cues: their role in route learning. Mem Cognit 35:910–924
Does language shape cognition? Alex Tillas Institut fu¨r Philosophie, Heinrich-Heine-Universita¨t, Du¨sseldorf, Germany Introduction In this paper, I investigate the relation between language and thinking and offer an associationistic view of cognition. There are two main strands in the debate about the relation between language and cognition. On the one hand there are those that ascribe a minimal role to language and argue that language merely communicates thoughts from the Language of Thought-level to the conscious-level (e.g. Grice 1957; Davidson 1975; Fodor 1978). On the other hand, there are those who argue for a constitution relation holding between the two (Carruthers 1998; Brandom 1994). Somewhere in the middle of these two extremes lie the supra-communicative views of language that go back to James (1890/1999), Vygotsky (trans. 1962) and more recently in the work of Berk and Garvin (1984). Furthermore, Gauker (1990) argues that language is a tool for affecting changes in the subject’s environment, while Jackendoff (1996) argues that linguistic formulation allows us a ‘handle’ for attention. Finally, Clark (1998), and Clark and Chalmers (1998) argue for the causal potencies of language and suggest that language complements our thoughts (see also Rumelhart et al. 1986). Building upon associationism the view suggested here ascribes a significant role to language in terms of cognition. This role is not limited to interfacing between unconscious and conscious level but the relation between the two is not one of constitution. More specifically, in the suggested view linguistic labels (or words) play a crucial role in thinking. Call this position Labels and Associations in Thinking hypothesis (henceforth LASSO). LASSO is similar to Clark’s view in that utilization of linguistic symbols plays a significant role. However, for Clark, language is important in reducing cognitive loads, while in LASSO utilization of linguistic labels is responsible for acquisition of endogenous control over thoughts. In particular, I start from the ability that human agents have to manipulate external objects in relationships of agency towards them, and
Cogn Process (2014) 15 (Suppl 1):S1–S158 argue that we can piggyback on that ability to manipulate and direct our own thinking. Despite sharing with ‘supra-communicative’ views that language does not merely serve to communicate thoughts to consciousness, my focus here is on a more general level. In particular, I focus on how language influences thinking, rather than on how specific cognitive tasks might be propped by language. Finally, LASSO resembles Lupyan’s (2007) Label Feedback Hypothesis, even though my agenda is more general than Lupyan’s (non-linguistic aspects of cognition such as perceptual processing). The LASSO Hypothesis LASSO is based on a view of concepts as structured entities, comprising a set of representations. Members of this set are mainly perceptual representations from experiences with instances of a given kind, as well as perceptual representations of the appropriate word. These representations become associated on the basis of co-occurrence. Crucially they become reactivated when thinking about this object; to this extent thinking is analogous to perceiving (Barsalou 1999). To endogenously control the tokening of a given concept is to activate this concept in the absence of its referents. In turn, to endogenously control thinking is to token a thought on the basis of processes of thinking rather than of processes of perceiving the appropriate stimulus. Endogenously controlled thinking is merely associative thinking, i.e., current thinking caused by earlier thinking. The key claim hare is that we have endogenous control over our production of linguistic items given that we are able to produce linguistic utterances at will. It is this executive control over linguistic utterances that gives us endogenous control over our thoughts. Admittedly, there are alternative ways to acquire endogenous control over our thoughts, e.g. via associations with a goal-directed state over which we already have endogenous control. Once a certain degree of linguistic sophistication is acquired, the process of activating a concept in a top-down manner is achieved in virtue of activating associated words. Language is not constitutive to (conscious) thinking According to Carruthers (1998; 2005), accounting for our non-inferential access to our thoughts requires inner speech to be constitutively involved in propositional thinking. Contra Carruthers, I argue that this is not the only way in which non-inferential thinking can occur. One alternative is associative thinking. It might be that the transition from the word to the concept that has the very same content that a given word expresses is an associationistic link. In the suggested view, perceptual representations and words are associated in memory. Note that this is not a case of language being constitutive to thoughts, but a case of co-activation of a concept’s different subparts: Perceptual representations of the appropriate word (A) and representations formed during perceptual experiences with instances of a given kind (B). This occurs in virtue of an instance of a word activating A, which in turn activates B resulting in the concept’s activation, as a whole. Nevertheless, and importantly, this kind of thinking is not interpretative, as Carruthers argues. It is not that an agent hears a word, say ‘Cat’, and then tries to guess or infer what the word means. Instead, on hearing the word ‘Cat’ the concept cat becomes activated. Access to thinking is neither interpretative nor constitutive. Perceptual representations of objects and words are distinct from each other and are brought together during the process of concept formation. It is just that we only have conscious access at the level where representations of words and objects converge—consider this in terms of Damasio’s (1989) well known ‘convergence zones’ hypothesis. In this sense, an agent can only access representations of objects and words simultaneously and treat them as if they were constitutive parts of a concept/thought. The relationship between a thought and its representation in selfknowledge is brute causation. The particular transition between a first order thought and a second order thought are causally and not
S145 constitutively related. Contra Carruthers, the relationship between a first order and a second order thought is not a constitutive but a causal associative one. Thought and language are not constitutively connected. Evidence for LASSO: Language & perceptual categorization The suggested view enjoys significant empirical support, e.g. from evidence showing that perceptual categorization depends on language. This evidence could in turn be used against the communicative conception of language. For instance, in a series of experiments, Davidoff, Robertson (2004) examined LEW’s—a patient with language impairments and close to the profile of high-level Wernicke’s aphasia, abilities to categorize visually presented color stimuli, and found that color categories did not ‘pop-out’ for LEW. Instead, he retreated to a comparison between pairs, which in turn resulted in his poor performance in the categorization tasks. From this, Davidoff, Robertson argue that color categorization is essentially a rule-governed process. And even though colors are assigned to a given category on the basis of similarity, it is similarity to a conventionally named color that underlines this assignment. LEW’s inability to categorize simple perceptual stimuli is because names are simply not available to him. With regards to his performance in the color and shape categorization tasks, they argue that it is not the case that LEW has simply lost color or shape names. He is rather unable to consciously allocate items to perceptual categories. To this extent, they argue that LEW’s impairment is not related to a type-of-knowledge but rather to a typeof-thought story. Furthermore, they argue that there is a type of classification, independent of feature classification, which is unavailable to aphasics with naming disorders. This evidence does not suggest a constitutive relation between language and thinking. Instead it suggests a strong relation between naming and categorization impairments, which could be explained by appealing to a strong association between a linguistic label and a concept. This in turn lends support to LASSO. Evidence against a constitutive relation between language & cognition Evidence in favor of LASSO and against a constitutive relation between language and cognition can be found in results showing that grammar—a constitutive part of language—is neither necessary nor sufficient for thinking and more specifically in Theory of Mind (ToM) reasoning. For instance, Siegal, Varley, Want (2001) show a double dissociation between grammar and ToM reasoning, which in turn indicates that reasoning can occur largely independently from grammatical language. Even though ToM understanding and categorization is not all there is to cognition, had it been the case that there was a constitutive relation between language and (conscious) cognition—in the way Carruthers argues for instance—then a double dissociation between grammar and ToM reasoning would have never occurred. Focusing on the relation between grammar and cognition in aphasia, Varley and Siegal (2000) show that subjects with severe agrammatic aphasia and minimal access to propositional language performed well in different ToM tests and were capable of simple causal reasoning. On these grounds, Siegal, Varley, Want (2001) argue that reasoning about beliefs as well as other forms of sophisticated cognitive processes involve processes that are not dependent on grammar. By contrast to the previous evidence, Siegal et al. report that non-aphasic subjects with right-hemisphere (non-language dominant) lesions exhibited impaired ToM reasoning and had difficulties understanding sarcasm, jokes and the conversational implications of questions (Siegal et al. 1996; Happe´ et al. 1999). This double dissociation between grammar on the one hand and causal reasoning and ToM on the other, suggest a non-constitutive relation between language and cognition, and in turn favors LASSO.
123
S146 Objections to LASSO Qua inherently associationistic, LASSO might be subject to the objection that it cannot account for propositional thinking or for compositionality of thought. For it might be that LASSO at best describes how inter-connected concepts become activated without explaining the propositional-syntactic properties that thoughts in the form of inner speech have. In reply, a single thought becomes propositional in structure and content by piggybacking on language. The conventional grammatical unity and structure of the sentence unifies these concepts and orders them in a certain way. Another challenge facing associationistic accounts of thinking is that it is unclear how they can account for the characteristic of concepts to combine compositionally. In reply, I appeal to Prinz’s semantic account (2002), according to which, in order for c to refer to x, the following two conditions have to be fulfilled: a) xs nomologically covary with tokens of c b) An x was the (actual) incipient cause of c In the suggested view the concept petfish, like all concepts, is a folder that contains perceptual representations. The incipient causes of petfish can either be instances of petfish or representations of pets and representations of fish. Crucially, in terms of semantics, petfish has to nomologically covary with petfish rather than a disjunction of pet and fish. The reason why petfish nomologically covaries with petfish is that the concept’s functional role is constrained by the constraints on the uses of the word that are set by the agent’s locking into the conventions about conjunction formation. In this sense, agents participate in a convention and it is via the association between the word and the concept that the functional role of the conjunctive concept is constrained. In terms of the constitutive representations of petfish, these can be representations of pets like cats and dogs as well as representations of fish. Crucially, these representations are idle in the functional role of the concept; the latter is more constrained by its link to the words. Acknowledgments I am grateful to Finn Spicer, Anthony Everett and Jesse Prinz for comments on earlier drafts of this paper. Research for this paper has been partly funded by the Alexander S. Onassis Public Benefit Foundation (ZF 075) and partly by the Deutsche Forschungsgemeinschaft (DFG) (SFB 991_Project A03). References Barsalou LW (1999) Perceptual symbol systems, Behav Brain Sci 22: 577–609. doi:10.1017/s0140525x99002149 Berk L and Garvin R (1984) Development of private speech among low-income Appalachian children. Dev Psychol 20 2: 271–286. doi:10.1037/0012-1649.20.2.271 Brandom R (1994) Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press, Cambridge MA Carruthers P (1998) Conscious thinking: Language or elimination? Mind Lang 13 4: 457–476. doi:10.1111/1468-0017.00087 Carruthers P (2005) Consciousness: Essays from a higher order perspective. Clarendon Press, Oxford Clark A (1998) Magic words: How language augments human computation. In Carruthers P and Boucher J (ed) Language and thought: Interdisciplinary themes, pp 162–183. Cambridge University Press, Cambridge Clark A and Chalmers DJ (1998) The extended mind. Analysis 58 1:7–19. doi:10.1111/1467-8284.00096 Damasio AR (1989) Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition. Cognition 33: 25–62. doi: 10.1016/0010-0277(89)90005-X
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Davidoff J and Roberson D (2004) Preserved thematic and impaired taxonomic categorization: A case study. Lang Cognitive Proc 19 1: 137–174. doi:10.1080/01690960344000125 Davidson D (1975) Thought and talk. In his Inquiries into truth and interpretation pp 155–170. Oxford University Press, Oxford Dummett M (1975) Wang’s Paradox. Synthese. 30, 301–24 Elman JL, Bates EA, Johnson MH, Karmiloff-Smith A, Parisi D, Plunkett K (1996) Rethinking innateness: A connectionist perspective on development. MIT Press, Cambridge MA Fodor J (1978) Representations: Philosophical essays on the foundations of cognitive science. MIT Press, Cambridge MA Gauker C (1990) How to Learn a Language like a Chimpanzee. Phil Psych 3 1: 31–53. doi:10.1080/09515089008572988 Grice P (1957) Meaning. Phil Review 66:377–88 Happe´ F et al. (1999) Acquired ‘theory of mind’ impairments following stroke. Cognition 70, 211–40. doi:10.1016/S00100277(99)00005-0 Jackendoff R (1996) How language helps us think. P&C 4 1. doi: 10.1075/pc.4.1.03jac James W (1890/1999) The Principles of Psychology (2 vols.). Henry Holt, New York (Reprinted Thoemmes Press, Bristol). Lupyan G (2012) Linguistically modulated perception and cognition: the label feedback hypothesis. Front Psychol 3: 54 doi: 10.3389/fpsyg.2012.00054 Prinz J (2002) Furnishing the mind: Concepts and their perceptual basis. MIT Press, Cambridge Rumelhart DE, Smolensky P, McClelland JL, Hinton GE (1986) Parallel distributed models of schemata and sequential thought processes. In McClelland JL and Rumelhart DE (eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models pp 7–57 Siegal M, Carrington J, Radel M (1996) Theory of mind and pragmatic understanding following right hemisphere damage. Brain Lang 53: 40–50. doi:10.1006/brln.1996.0035 Siegal M, Varley M, Want SC (2001) Mind over grammar: reasoning in aphasia and development. Trends Cogn Sci 5 7. doi: 10.1016/S1364-6613(00)01667-3 Varley R, Siegal M (2000) Evidence for cognition without grammar from causal reasoning and ‘theory of mind’ in an agrammatic aphasic patient. Curr Biol 10: 723–26. doi:10.1016/S0960-9822(00)00538-8 Vygotsky LS (1962) Thought and Language. MIT Press, Cambridge
Ten years of adaptive rewiring networks in cortical connectivity modeling. Progress and perspectives Cees van Leeuwen KU Leuven, Belgium; University of Kaiserslautern, Germany Activity in cortical networks is generally considered to be governed by oscillatory dynamics, enabling the network components to synchronize their phase. Dynamics on networks are determined to a large extent by the network topology (Barahona and Pecora 2002; Steur et al. 2014). Cortical network topology, however, is subject to change as a result of development and plasticity. Adaptive network models enable the dynamics on networks to shape the dynamics of networks, i.e. the evolution of the network topology (Gross and Blasius 2008). Adaptive networks show a strong propensity to evolve complex topologies. In adaptive networks, the connections are selectively reinforced (Skyrms and Pemantle 2000) or rewired (Gong and van Leeuwen 2003, 2004; Zimmerman et al. 2004), in adaptation to the dynamical properties of the nodes. The latter are called adaptively rewiring networks.
Cogn Process (2014) 15 (Suppl 1):S1–S158 Gong and van Leeuwen (2003, 2004) started using adaptive rewiring networks in order to understand the relation between large scale brain structure and function. They applied a Hebbian-like algorithm, in which synchrony between pairs of network components (nodes) is the criterion for rewiring. The nodes exhibit oscillatory activity and, just like the brain does, where dynamic synchronization in spontaneous activity shows traveling and standing waves, and transitions between them (Ito et al. 2005, 2007), the network nodes collectively move spontaneously in and out of patterns of partial synchrony. Meanwhile, adaptive rewiring takes place. When a pair of nodes is momentarily synchronized but not connected, from time to time a link from elsewhere is relayed, in order to connect these nodes. This is the key principle of adaptive rewiring. Adaptively rewiring a network according to synchrony in spontaneous activity gave rise to the robust evolution of a certain class of complex network structures (Fig. 1). These share important characteristics with the large-scale connectivity structure of the brain. Adaptive rewiring models, therefore, became an integral part of the research program of the Laboratory for Perceptual Dynamics, which takes a complex systems view to perceptual processes (For a sketch of the Laboratory while at the RIKEN Brain Science Institute, see van Leeuwen 2005; for its current incarnation as an FWO-funded laboratory at the KU Leuven, see its webpage at http://perceptualdynamics.be/). The original adaptive rewiring model (Gong and van Leeuwen 2003, 2004) was developed over the years in a number of studies (Jarman et al. 2014; Kwok et al. 2007 Rubinov et al. 2009a; van den Berg et al. 2012; van den Berg and van Leeuwen 2004). Here I review these developments and sketch some further perspectives. In the original algorithm (Gong and van Leeuwen 2004; van den Berg and van Leeuwen 2004), the network initially consists of randomly coupled maps. Coupled maps are continuously valued maps connected by a diffusive coupling scheme (Kaneko 1993). We used coupled logistic maps; the return plots of these maps are generic and can be regarded as coarsely approximating that of a chaotic neural mass model (Rubinov et al. 2009a). Adaptively rewiring the couplings of the maps showed the following robust tendency: From the initially random architecture and random initial conditions, a small-world network gradually emerges as the effect of rewiring. Small worlds are complex networks that combine the advantages of a high degree of local clustering from a regular network with the high degree of global connectedness observed in a random-network (Watts and Strogatz 1998). They are, in other words, an optimal compromise for local and global signal transfer. Small-world networks have repeatedly been observed in the anatomical and functional connectivity of the human brain (He et al. 2007; Sporns 2011; Bullmore and Bassett 2011; Gallos et al. 2012).
Fig. 1 A random network prior to (left) and after (right) several iterations of adaptive rewiring (From van Leeuwen 2008). Note that this version of the model considers topology only; geographical proximity of nodes was heuristically optimized in order to provide a visualization of the propensity of the system to evolve a modular small-world network
S147 The products of rewiring have an additional characteristic that is relevant to the brain: they are modular networks (Rubinov et al. 2009a). This means that they form community structures that interact via hubs. The hubs are specialized nodes that network evolution has given the role of mediating connections between communities. They synchronize, sometimes with one and sometimes with another and can be considered as agents of change in the behavior of the regions to which they are connected. Several studies have explored, and help extend, the notion that adaptive rewiring leads to modular small worlds. It was already shown early on (Gong and van Leeuwen 2003) that combining rewiring with network growth results in a modular network that is also scale-free in the distribution of its connectivity (Baraba´si and Albert 1999). Kwok et al. (2007) have shown, that the behavior of these networks is not limited to coupled maps, but could also be obtained with more realistic, i.e. spiking model neurons. Other than the coupled maps, these have directed connections. As the system proceeds its evolution, the activity in the nodes changes. Initial bursting activity (as observed in immature neurons, see e.g. Leinekugel et al. 2002, an activity assumed to be random but in fact, like that of the model, shows deterministic structure, see Nakatani et al. 2003), gives way to a mixture of regular and irregular activity characteristic of mature neurons. Van den Berg et al. (2012) lesioned the model and showed that there is a critical level of connectivity, at which the growth of small-world structure can no longer be robustly sustained. Somewhat surprisingly, this results in a break-down, not primarily in the connections between the clusters, but in the local clustering. In other words, the network shifts towards randomness. This corresponds to observations in patients diagnosed with schizophrenia (Rubinov et al. 2009b). The model, therefore, could suggest an explanation of the anomalies in large-scale connectivity structures found in schizophrenic patients. Despite these promising results, a major obstacle towards realistic application of the model has been the absence of any geometry. A spatial embedding for the model would allow us to consider the effect of biological constraints such as metabolic costs and wiring length. In a recent study, Jarman et al. (2014) studied networks endowed with metrics, i.e. a definition of distance between nodes, and observed its effects on adaptive rewiring. A cost function that penalizes rewiring more distant nodes, leads to a modular small world structure with greater efficiency and robustness, compared to rewiring based on synchrony alone. The resulting network, moreover, consists of spatially segregated modules (Fig. 2, left part), in which within-module connections are predominantly of short range and their inter-connections are of long range (Fig. 2, right part). This implies that the topological principle of adaptive rewiring and the spatial principle of rewiring costs operate in synergy to achieve a brain-like architecture. Both principles are biologically plausible. The spatially biased rewiring process, therefore, may be considered as a basic mechanism for how large-scale architecture of the cortex is formed. The models developed so far have been no more (and no less) than a proof of principle. To some extent, this is how it should be. Efforts at biological realism can sometimes obscure the cognitive, neurodynamical principles on which a model is based. Some predictions, such as what happens when lesioning the model, could already be made with a purely topological version, with its extreme simplification of the neural dynamics. Yet, in order to be relevant, future model development will have to engage more with neurobiology. We are doing this step by step Jarman et al. (2014) have overcome an important hurdle in applying the model by showing how spatial considerations could be taken into account. Yet, more is needed. First, we need to resume our work on realistic (spiking) neurons (Kwok et al. 2007): We will consider, distinct (inhibitory and excitatory) neural populations, realistic neural transmission delays, spike-timing dependent plasticity and more differentiated description of mechanisms that guide synaptogenesis in the
123
S148
Fig. 2 From: Jarman et al. 2014. Adaptive rewiring on a sphere left Differently colored units reveal the community structure (modularity) resulting from adaptive rewiring with a wiring cost function. Right Correlation between spatial distance of connections (x-axis) and their topological ‘‘betweenness centrality’’ (Y-axis). From top to bottom initial state and subsequent states during the evolution of the small world network. The correlation as it emerges with the network evolutions shows that links between modules tend to be of long range
transition from immature to mature systems. Second, and only after that, should we be start preparing the system for information processing functions. References Baraba´si A-L, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512 Barahona M, Pecora LM (2002) Synchronization in small-world systems. Phys Rev Lett: 89:054101–4 Bullmore ET Bassett DS (2011) Brain graphs: graphical models of the human connectome. Annu Rev Clin Psychol 7:113–140 Gallos LA, Makse HA, Sigman M (2012) A small world of weak ties provides optimal global integration of self-similar modules in functional brain networks PNAS 109:2825–2830 Gong P, van Leeuwen C (2004) Evolution to a small-world network with chaotic units. Europh Lett 67:328–333 Gong P, van Leeuwen C (2003) Emergence of scale-free network with chaotic units. Physica A Stat Mech Appl 321:679–688 Gross T, Blasius B (2008) Adaptive coevolutionary networks: a review. J Roy Soc Interf 5:259–271 He Y, Chen ZJ, Evans AC (2007) Small-world anatomical networks in the human brain revealed by cortical thickness from MRI. Oxf J 17:2407–2419 Ito J, Nikolaev AR, van Leeuwen C (2005) Spatial, temporal structure of phase synchronization of spontaneous EEG alpha activity. Biol Cybern 92:54–60 Ito J, Nikolaev AR, van Leeuwen C (2007) Dynamics of spontaneous transitions between global brain states. Hum Brain Mapp 28:904–913 Jarman N, Trengove C, Steur E, Tyukin I, van Leeuwen C (2014) Spatially constrained adaptive rewiring in cortical networks creates spatially modular small world architectures Cogn Neurodyn: doi:101007/s11571-014-9288-y Kaneko K (ed) (1993) Theory, applications of coupled map lattices Wiley, Chichester
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Kwok HF, Jurica P, Raffone A, van Leeuwen C (2007) Robust emergence of small-world structure in networks of spiking neurons. Cogn Neurodyn 1:39–51 Leinekugel X Khazipov R Cannon R Hirase H Ben-Ari Y, Buzsaki G (2002) Correlated bursts of activity in the neonatal hippocampus in vivo Science 296(5575) 2049–2052 Nakatani H Khalilov I Gong P, van Leeuwen C (2003) Nonlinearity in giant depolarizing potentials. Phys Lett A 319:167–172 Rubinov M Sporns O van Leeuwen C, Breakspear M (2009a) Symbiotic relationship between brain structure, dynamics BMC Neuroscience 10:55 doi:101186/1471-2202-10-55 Rubinov M Knock S Stam C Micheloyannis S Harris A Williams L, Breakspear M (2009b) Small-world properties of nonlinear brain activity in schizophrenia Human Brain Mapping 58 (2) 403–416 Skyrms B Pemantle R (2000) A dynamic model of social network formation Proc Natl Acad Sci USA 97:9340–9346 Sporns O (2011) The human connectome: a complex network Ann N Y Acad Sci 1224(1):109–125 Steur E Michiels W Huijberts HJC, Nijmeijer H (2014) Networks of diffusively time-delay coupled systems: Conditions for synchronization, its relation to the network topology Physica D 277 22–39 van den Berg D Gong P Breakspear M, van Leeuwen C (2012) Fragmentation: Loss of global coherence or breakdown of modularity in functional brain architecture? Frontiers in Systems Neuroscience 6 20 doi:103389/fnsys201200020 van den Berg D, van Leeuwen C (2004) Adaptive rewiring in chaotic networks renders small-world connectivity with consistent clusters Europhysics Letters 65 459–464 van Leeuwen C (2005) The Laboratory for Perceptual Dynamics at RIKEN BSI. Cogn Proc 6:208–215 van Leeuwen C (2008) Chaos breeds autonomy: connectionist design between bias and babysitting. Cogn Proc 9:83–92 Watts D, Strogatz S (1998) Collective dynamics of ‘small-world’ networks Nature 393:440–442 Zimmermann M G Eguıluz V M, M S Miguel (2004) Phys Rev E 69:065102
Bayesian mental models of conditionals Momme von Sydow Department of Psychology, University of Heidelberg, Germany Conditionals play a crucial role in psychology of thinking, whether one is concerned with truth table tasks, the Wason selection task, or syllogistic reasoning tasks. Likewise, there has been detailed discussion on normative models of conditionals in philosophy, in logics (including non-standard logics), in epistemology as well as in philosophy of science. Here a probabilistic Bayesian account of the induction of conditionals based on categorical data is proposed that draws on different traditions and suggests a synthesis of several aspects of some earlier approaches. Three Main Accounts of Conditionals There is much controversy in philosophy and psychology over how indicative conditionals should be understood, and to whether this relates to the material implication, to conditional probabilities, or to some other formalization (e.g. Anderson, Belnap 1975; Ali, Chater, Oaksford 2011; Byrne, Johnson-Laird 2009; Edgington 2003; Beller 2003; Evans, Over 2004, Kern-Isberner 2001; Krynski, Tenenbaum 2007; Pfeiffer 2013; Johnson-Laird 2006; Leitgeb 2007; Oaksford, Chater 2007, cf. 2010; Oberauer 2006; Oberauer, Weidenfeld, Fischer 2007; Over, Hadjichristidis, Evans, Handley, Sloman 2007;). Three main influential approaches, on which we will build, may be distinguished: One class of approaches is based on the material implication. A psychological variant replaces this interpretation (with a T F T T truth
Cogn Process (2014) 15 (Suppl 1):S1–S158 table by mental models akin either to complete truth tables or to only the first two cases of such a truth table (Johnson-Laird 2006; cf. Byrne, Johnson-Laird 2009). The present approach adopts the idea that a conditional ‘if p then q’ may be represented with reference either to a full 2 9 2 contingency table or simply with reference to the cells relating to the antecedent p (i.e., p & q, p & non-q). Another class uses a conditional probability interpretation, thus referring only to the first two cells of a contingency table (Stalnaker 1968, cf. Eddington 2003; Evans, Over 2004; Oberauer et al. 2007; Pfeifer 2013). This is often linked to assuming the hypothetical or counterfactual occurrence of the antecedent p (cf. Ramsey test). Here we take conditional probabilities as a starting-point for a probabilistic understanding of conditionals, while adding advantages of the mental model approach. Moreover, here an extended Bayesian version of this approach is advocated, concerned not with a hypothetical frequentist (observed or imagined) relative frequency of q given p, but rather with an inference about an underlying generative probability of q given p that now depends on priors and sample size. A subclass of the conditional probability approach additionally assumes a high probability criterion for the predication of logical propositions (cf. Foley 2009). This is essential to important classes of non-monotonic logic (e.g., System P) demanding a high probability threshold (a ratio of exceptions e) for the predication of a ‘normic’ conditional (Adams 1986; Schurz 2001, cf. 2005): P(q|p) [ 1 - e. We here reformulate a high probability criterion in a Bayesian way using second-order probability distributions (cf. von Sydow 2014). Third, conditionals sometimes involve causal readings (cf. Hagmayer, Waldmann 2006; Oberauer et al. 2007) and methods of causal induction (Delta P, Power, and Causal Support; Cheng 1997; Griffiths, Tenenbaum 2005; cf. Ali et al. 2011) that make use of all four cells of a contingency table. Although conditionals have to be distinguished from causality (‘‘if effect then cause’’; ‘‘if effect E1 then effect E2’’; ‘‘if cause C1 then cause C2’’), conditional probabilities may not only form the basis for causality, but conditionals may be estimated based on causality. Moreover, determining the probability of conditionals may sometimes involve calculations similar to causal judgments. In any case, approaches linking conditionals and causality have not been fully developed for non-causal conditionals in situations without causal model information. Bayesian Mental Model Approach of Conditionals (BMMC) The Bayesian Mental Model Approach of Conditionals allows for complete and incomplete models of conditionals (here symbolized as p & [ q vs. p * [ q). It nonetheless models conditionals in a probabilistic way. It is claimed that the probability of fully represented conditionals (P(p & [ q)) needs not to be equated with a single conditional probability (P(q|p)). In contrast, the probability of conditionals concerned with the antecedent p only, P(p * [ q), is taken to be closely related to the relative frequency of the consequent given the antecedent (its extension). However, the model does not merely refer to the extensional probability Pe(q|p), but is concerned with subjective generative probabilities affected by priors and sample size. The postulates of the approach and the modelling steps will be sketched here (cf. von Sydow 2014, for a related model): (1) Although BMMC relates to the truth values of conditionals and biconditionals, etc. (Step 6), it assigns probabilities to these propositions as a whole (cf. Foley 2009, von Sydow 2011). (2) BMMC distinguishes complete vs. incomplete conditionals. This idea is adopted from mental model theory (Johnson-Laird, Byrne 1991; cf. Byrne, Johnson-Laird 2002). It is likewise assumed that standard conditionals are incomplete. However, whereas mental model theory has focused on cognitive elaboration as the cause for fleshing out incomplete conditionals, the use of complete vs. incomplete conditionals is primarily linked here to the homogeneity or inhomogeneity of the occurrence of q in the negated subclasses of the antecedent
S149 p (i.e. non-p) (cf. Beller’s 2003, closed-world principle). Imagine homogeneity of non-p with P(q|p) = P(q|non-p) = .82 (e.g., ‘‘if one does p then one gets chocolate q’’ but for non-p cases one gets chocolate with the same probability as well.) Here it seems inappropriate to assign the high probability of P(q|p) to P(p & [ q) as well, since the antecedent does not make a difference. However, consider a similar case were non-p is heterogeneous. Take nine subclasses in which P(q|non-p) = .9 and one in which P(q|non-p) = .1 (this yields the same average of P(q|non-p) = .82). For such a heterogeneous contrast class, the conditional is indeed taken to singles out only the specific subclass p (similar to the conditional probability approach), since there is at least one potential contrast in one subclass of nonp For the homogeneous case, however, the probability of the conditional is claimed to reflect the overall situation, and a high probability here would involve a difference between P(q|non-p) and P(q| p). (3) BMMC represents the simpler, antecedent-only models of conditionals, not as extensional probabilities, or relative frequencies of (observed or imagined) conditionals, but as subjective estimates of generative probabilities that have produced them. Although similar to a conditional probability approach, i.e. PE(q|p), this measure depends on priors and sample size. For flat priors observing a [4; 1] input (f(p&q), f(p&non-q)) yields a lower P(p * [q) than for a larger sample size, e.g. [40; 10]. Particularly for low sample sizes, priors may overrule likelihoods, reversing high and low conditional probability judgments. Formally, the model uses cases of q or non-q, conditional on p, as input (taken as Bernoulli trials with an unchanging generative probability h). Given a value of h the Binomial distribution provides us with the likelihood of the data, P(D| h), with input k = f(q|p) in n = f(q|p) + f(q|p) trials: n k Bðkjh; nÞ ¼ h ð1 hÞnk k We obtain a likelihood density function for all h (cf. middle Fig. 1), resulting in a Beta distribution, now with the generative probability h as an unknown parameter (with a-1 = f(x = q|p) and b-1 = f(x = :q|p): Betaða; bÞ ¼ Pðhja; bÞ ¼ const: ha1 ð1 hÞb1 As prior for h we take the conjugate Beta distribution (e.g., Beta(1,1) as flat prior) to calculate easily a Beta posterior probability
Fig. 1 Example for the prior for h, the Binomial likelihood and the Beta posterior distribution over h
123
S150 distribution for h (Fig. 1) that depends on sample size and priors. Its mean is a rational point estimate for the subjective probability of q given p. (4) In contrast, given fully represented conditionals (no heterogeneous contrast class), the probability of a conditional even more clearly differs from (extensional) conditional probabilities (cf. Leitgeb 2007). One option would be to apply a general probabilistic pattern logic (von Sydow 2011) to conditionals. In this case, conditionals, however, would yield the same results as inclusive disjunctions P(p & [ q) = P(:p _ q). Albeit here concerned with all four cells of a logical truth table, another option is that conditionals have a direction even in non-causal settings. This assumption will be pursued here. A hypothetical causal-sampling assumption that asserts hypothetical antecedent-sampling for conditionals (Fiedler 2000), as if assuming that the antecedent would have caused the data (cf. Stalnaker 1968; Evans, Over 2004). (In the presence of additional causal knowledge, one may correct for this, but this is not modelled here.) Based on the generative models of conditional probabilities (Step 3), here generative versions of delta P (Allan, Jenkins 1980) or causal power (Cheng 1997) are suggested as another possible formalization of a full conditional. Formally, the two conditional probability distributions (for q|p and q|non-p) are determined based Step 3. To proceed from the two beta posterior distributions on the interval [0, 1], to a distribution for Delta P, relating to P(q|p)-P(q|non-p) in the interval [- 1, 1], one can use standard sampling techniques (e.g. inversion or rejection method, Lynch 2007). For the sequential learning measure for causal power one proceeds analogously. The means of the resulting probability distributions may be taken as point estimates. However, these Delta P and causal power may not be flexible enough (see Step 6). (5) Let us first return to incomplete conditionals (Step 3). Even here the probability of a conditional P(p * [ q) may have to be distinguished from the conditional probability, even if modelled as a generative conditional probability (Step 3). To me there are to other plausible options: One option would be to model probabilities of conditionals along similar lines as other connectives have been modelled in von Sydow (2011). Here I propose another option, closely related to another proposal von Sydow (2014). This builds on the general idea of high probability accounts (Adams 1986; Schurz 2001, cf. 2005; Foley 2009), here specifying acceptance intervals over h. This seems particularly suitable if concerned with the alternative testing of the hypotheses p * [ q, p * [ non-q, and, p * [ q_non-q (e.g., ‘‘if one does p then one either gets chocolate q or does not’’). This links to the debate concerning conjunction fallacies and other inclusion fallacies (given p, ‘q_non-q’ refers to the tautology and includes the affirmation q; cf. von Sydow 2011, 2014). Formally, we start with ideal generative probabilities on the h scale (hq = 1; hnon-q = 0; p and hq_non-q = .5) (cf. von Sydow 2011). We then vary for each of the three hypotheses H, the acceptance threshold e (over all, or all plausible, values). For e = .2, the closed acceptance inter-val for the consequent q would be [.8, 1]; for non-q, [0, .2]; and for ‘q_non-q’, [.4, .6]. Based on Step 3 we calculate for all tested hypotheses the integral over h in the specified interval of the posterior probability distribution: h2
r Posterior distributionðh; HÞ h1
This specifies the subjective probability that for given observed data the posterior probability of H is within the acceptance interval (cf. von Sydow 2011). The probability of each hypothesis is determined by adding up the outcomes for H over different levels of e and normalizing the results over the alternative hypothesis (e.g., alternative conditionals). This provides us with a kind of pattern
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 probability Pp of the hypotheses, predicting systematic (conditional) inclusion fallacies (e.g., allowing for Pp(q_non-q|p) \ Pp(q|p)). (Additionally, such intervals over h may help to model quantifiers: ‘‘If x are p then most x are q’’, cf. Bocklisch 2011). (6) In continuation of Step 4, and analogous to Step 5, we detail the alternative testing of p &[q, p &[non-q, and p &[(q _ nonq) for complete conditionals. Since this includes representation of non-p as well, we can also model the converse conditionals (\ &, probabilistic necessary conditions) and biconditionals (\ &[, probabilistic necessary and sufficient conditions) as alternatives to conditionals (&[, probabilistic sufficient conditions). First, to determine homogeneity of non-p subclasses (cf. Step 2), Step 5 is to be applied repeatedly, revealing whether each subclass is rather q, non-q, or q_ non-q. If the dominant results for all subclasses do not differ, we can determine the probability of a fully represented conditional. We make use of the results for the incomplete conditionals (for p or non-p; cf. Step 5). Related to conditionals, converse conditionals or biconditionals (or their full mental models), we interpret ideal conditionals p &[q, at least in the presence of alternative biconditionals, as the combination of p *[q and non-p * [ (q_nonq); ideal biconditionals p \ &[q as combinations of p *[q and non-p * [ non-q; and ideal converse conditionals p \ &q as the combination of p *[(q _ non-q) and non-p * [ q. Sometimes a connective may refer to more than one truth table: In the absence of biconditionals, P(p &[q) is taken to be the mixture of a conditional and a biconditional. Likewise the approach allows to model, for instance, ‘‘if p then q or non-q’’ (p & [ (q_non-q)) as average of two truth table instantiations (with non-p either being q or, in another model, non-q). Technically it is suggested that one can obtain the pattern probabilities of the combination of the incomplete models by assuming their independence and by multiplying their outcome; e.g.: Pp(p & [ q) = Pp(p * [ q) * Pp(non-p * [ q_non-q). If the hypothesis-space is incomplete or if other logical hypotheses are added (von Sydow 2011; 2014), the results need to be normalized to obtain probabilities for alternative logical hypotheses. Conclusion Overall the sketched model is suggested to provide an improved rational model for assessing generative probabilities for conditionals, biconditionals, etc. The model predicts differences for complete and incomplete mental models of conditionals, influences of priors, influences of sample size, probabilistic interpretations of converse conditionals and biconditionals, hypothesis-space dependence, and conditional inclusion fallacies. Although all these phenomena seem plausible in some situations, none of the previous models, each with their specific advantages, seems to cover all predictions. Throughout its steps the present computational model may contribute to predicting a class of conditional probability judgments (perhaps complementing extensional conditionals) by potentially integrating some divergent findings and intuitions from other accounts into a Bayesian framework of generative probabilities of conditionals. Acknowledgments This work was supported by the grant ‘Sy 111/2-1’ from the DFG as part of the priority program New Frameworks of Rationality (SPP 1516). I am grateful to Dennis Hebbelmann for an interesting discussion about modelling causal power in sequential learning scenarios (cf. Step 4). Parts of this manuscript build on von Sydow (2014), suggesting a similar model for other logical connectives. References Adams EW (1986) On the logic of high probability. J Philos Logic 15:255–279
Cogn Process (2014) 15 (Suppl 1):S1–S158 Allan LG, Jenkins HM (1980) The judgment of contingency and the nature of the response alternative. Can J Psychol 34:1–11 Ali N, Chater N, Oaksford M (2011) The mental representation of causal conditional reasoning: Mental models or causal models. Cognition 119:403–418 Anderson AR, Belnap N (1975) Entailment: the logic of relevance and necessity, vol I. Princeton University Press, Princeton Beller S (2003) The flexible use of deontic mental models. In R. Alterman, D. Kirsh (eds) Proceedings of the Twenty-Fifth Annual Conference of the Cognitive Science Society. Lawrence Erlbaum, Mahwah, pp 127–132 Bocklisch F (2011) The vagueness of verbal probability and frequency expressions. Int J Adv Comput Sci 1(2):52–57 Byrne RMJ, Johnson-Laird PN (2009) ‘If’ and the problems of conditional reasoning. Trend Cogn Sci 13:282–286 Cheng PW (1997) From covariation to causation: A causal power theory. Psychol Rev 104:367–405 Edgington D (2003) What if? Questions about conditionals. Mind Lang 18:380–401 Evans JSBT, Over DE (2004). If. Oxford University Press, Oxford Fiedler K (2000) Beware of samples! A cognitive-ecological sampling approach to judgment biases. Psychol Rev 107:659–676 Foley R (2009) Beliefs, degrees of belief, and the Lockean Thesis. In: Huber F, Schmidt-Petri C (eds) Degrees of belief, synthese library 342, Heidelberg: Springer Griffiths TL, Tenenbaum JB (2005) Structure and strength in causal induction. Cogn Psychol 51:334–384 Hagmayer Y, Waldmann MR (2006) Kausales Denken. In Funke J (ed) Enzyklopa¨die der Psychologie ‘‘Denken und Problemlo¨sen’’, Band C/II/8 (S. 87–166). Hogrefe Verlag, Go¨ttingen Johnson-Laird PN, Byrne RMJ (2002) Conditionals: A theory of meaning, pragmatics, and inference. Psychol Rev 109:646–678 Johnson-Laird PN (2006) How We Reason. Oxford University Press, Oxford Kern-Isberner G (2001) Conditionals in Nonmonotonic Reasoning and Belief Revision. Springer, Heidelberg Krynski TR, Tenenbaum JB (2007) The role of causality in judgment under uncertainty. J Exp Psychol Gen 3:430–450 Leitgeb H (2007) Belief in conditionals vs. conditional beliefs. Topoi 26(1):115–132 Lynch SM (2007) Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. Springer, Berlin Oaksford M, Chater N (2010) Cognition and Conditionals (eds). Probability and Logic in Human Reasoning. Oxford University Press, Oxford Oberauer K (2006) Reasoning with conditionals: A test of formal models of four theories. Cogn Psychol 53:238–283 Oberauer K, Weidenfeld A, Fischer K (2007) What makes us believe a conditional? The roles of covariation and causality. Think Reason 13:340–369 Over DE, Hadjichristidis C, Evans JSBT, Handley SJ, Sloman SA (2007) The probability of causal conditionals. Cogn Psychol 54:62–97 Pfeifer N (2013) The new psychology of reasoning: a mental probability logical perspective. Think Reason 19:329–345 Schurz G (2005) Non-monotonic reasoning from an evolutionary viewpoint. Synthese 146:37–51 von Sydow M (2011) The Bayesian logic of frequency-based conjunction fallacies. J Math Psychol 55(2):119–139 von Sydow M (2014) Is there a Monadic as well as a Dyadic Bayesian Logic? Two Logics Explaining Conjunction ‘Fallacies’. In: Proceedings of the 36th annual conference of the cognitive science society. Cognitive Science Society, Austin
S151
Visualizer verbalizer questionnaire: evaluation and revision of the German translation Florian Wedell, Florian Ro¨ser, Kai Hamburger Giessen, Germany Abstract Many everyday abilities depend on various cognitive styles. With the Visualizer-Verbalizer Questionnaire (VVQ) we here translated a well-established inventory to distinguish between verbalizers and visualizers into German language and evaluated it. In our experiment 476 participants answered the VVQ in an online study. Results of this experiment suggest that indeed only eight items measure, what they are supposed to. To find out, whether these eight items are usable as a future screening tool, we currently run further studies. The VVQ translation will be discussed with respect to the original VVQ. Keywords Cognitive styles, Evaluation, Translation, Visualizer, Verbalizer, VVQ Introduction ‘‘When I learn or think about things, I imagine them very pictorially.’’ People often describe their ability of learning or thinking in one of two possible directions. Either they state that they are the ‘‘vivid type’’, whose thoughts are full of colors and images or they describe themselves as the ‘‘word-based’’-person, which seems often a bit cold and more rational. In the nineteen-seventies Baddeley and Hitch (1974) demonstrated how important the working memory is for everyday life. It seems as if the way of how we learn and describe things is more or less unconscious, but this fundamental ability is determined by individual preferences. Individual preferences and individual abilities are very important for various human skills, e.g. wayfinding, decision making. Therefore, they have to be taken into account throughout the whole domain of spatial cognition (e.g., Pazzaglia, Moe` 2013). One way of dealing with the necessary interindividual differentiation in wayfinding performance is to distinguish between people’s cognitive style (Klein 1951) or—more precisely—the preferred components of their working memory. In their model Baddeley and Hitch (1974) assumed that the central executive is a kind of attentive coordinator of verbal and visuo-spatial information in certain ways. Riding (2001) stated that one of the main dimensions of cognitive styles is the visualizer-verbalizer-dimension. Therefore it is common in cognitive research to differentiate between preferring visual (visualizer) and/or verbal (verbalizer) information (e.g. Richardson 1977; Pazzaglia, Moe` 2013). Considering this classification it can be assumed that visualizers seem to be people with high-imagery preferences and verbalizers tend to have low-imagery preferences. These two styles are generally accounted for with self-report-instruments. As Jonasson and Grabowski (1993) concluded, the primarily used tool to distinguish between visualizer and verbalizer is the VisualizerVerbalizer Questionnaire (VVQ; Richardson 1977). The VVQ contains 15 items. Participants have to answer each of the given items by judging in how they apply to their style of thinking (dichotomy; yes/ no). Still there is an unsolved problem concerning the VVQ. The verbal subscale indeed surveys verbal abilities (e.g., Kirby et al. 1988), whereas the items of the visual subscale are only partly connected to visuo-spatial abilities (e.g., Edwards, Wilkins 1981; Kirby et al. 1988). Another problem concerning the VVQ is that it is rather hard to find people that can clearly be assigned to one of the ‘‘extremes’’ of the visualizer-verbalizer-dimension, since most participants are located somewhere in between and may not be assigned
123
S152 to one of the two dimension poles. Preliminary studies in our research group revealed that in some cases an estimate of about 50 participants had to be investigated for cognitive style with the VVQ in order to clearly assign 2–3 people to one of the two groups, which is not very useful and also not very economic for further research. In the present study, our aim is to translate the VVQ into German language. It seems to be necessary to translate and evaluate this questionnaire, since it is not evaluated and because of the lack of an equivalent tool freely available for research on the visualizer-verbalizer-dimension in the German-speaking area. Experiment Method Participants A total of 476 participants (377 female/99 male), ranging from 18 to 50 years (M = 24.14 years) were examined anonymously in an online study during the period from 12/16/2013 to 01/14/2014. Most of the participants highest educational attainment was claimed to be a high-school diploma (n = 278), followed by university degree (n = 195) and other school graduation (n = 3). All participants were told that the study served to evaluate several translated forms of questionnaires, which included the VVQ. Participation was voluntary and was not compensated for in any way. Materials The used material was the VVQ in its translated form. Table 1 shows the translation of the whole inventory. The questionnaire was translated in three steps. In the first step, the VVQ was translated by the first author of this study. Negatively formulated items were formulated negatively in German as well. Then –in step two– the translation was corrected by the two co-authors. In the third step, a bilingual member (native English- and German-speaking) of the research group of Experimental Psychology and Cognitive Science corrected the translated items on colloquial subtleties. After the translation process the online-study was setup with LimeSurvey, a tool for creating and conducting online studies. Procedure Participants were recruited with an E-Mail containing basic information and the Hyperlink to the study webpage. Forwarded to the webpage via the Hyperlink, participants first received a short introduction about the aim of the study, followed by three standard demographical questions (gender, age, and level of education; Fig. 1). A specific instruction marked the start of the VVQ. Participants were asked to answer each item with either yes or no and, if they were not able to answer an item neither with yes nor no, they were asked to choose the answer that most likely applied to them. The translated items of the VVQ were presented in the same order as in the original version of the questionnaire. Results Before reporting the results of the VVQ, it should be noted that we were unable to compare our findings with the original data, due to the lack of statistical data in the original study by Richardson (1977). After reversing the code of negatively formulated items, we analyzed the VVQ with a factor analysis and Varimax rotation. The assumed two factors were preset. Each of the two factors had an eigenvalue above two (2.32 and 2.42) and taken together, these factors explained 31.59 % of the variance. Table 2 shows the results of the factor analysis in detail. We only found eight items matching their predicted scale, while each scale contains four items. The other seven items could not clearly be assigned to one of these scales. Figure 2 shows a diagram of the items in the rotated space to illustrate the distribution of each item to the respective underlying factor. Cronbach’s alpha (a = .04) of the translated version is very weak, when considering the whole inventory, but reaches at least a moderate level (a = .57), when items 06, 07, 08, 09, 10, 13 and 14 are eliminated.
123
Cogn Process (2014) 15 (Suppl 1):S1–S158 Table 1 VVQ items (Richardson 1977) and the German translation VVQ_01
I enjoy doing work that requires the use of words Mir machen Aufgaben Spaß, bei denen man mit Wo¨rtern umgehen muss
VVQ_02
My daydreams are sometimes so vivid I feel as though I actually experience the scene Meine Tagtrau¨me fu¨hlen sich manchmal so lebendig an, dass ich meine, sie wirklich zu erleben
VVQ_03
I enjoy learning new words Das Lernen neuer Wo¨rter macht mir Spaß
VVQ_04
I easily think of synomyms for words Es fa¨llt mir leicht. Synonyme von Wo¨rtern zu finden
VVQ_05
My powers of imagination are higher than average Ich besitze eine u¨berdurchschnittliche Vorstellungskraft
VVQ_06
I seldom dream Ich tra¨ume selten
VVQ_07
I read rather slowly Ich lese eher langsam
VVQ_08
I can’t generate a mental picture of a friend’s face when I close my mind Wenn ich meine Augen schließe, kann ich mir das Gesicht eines Freundes nicht bildhaft vorstellen
VVQ_09
I don’t believe that anyone can think in terms of mental pictures Ich glaube nicht, dass jemand in Form mentaler Bilder denken kann
VVQ_010 I prefer to read instructions about how to do something rather than have someone show me Ich lese lieber eine Anleitung, als mir von jemand anderem ihren Inhalt vorfu¨hren zu lassen VVQ_011 My dreams are extremely vivid Meine Tagtra¨ume sind extrem lebhaft VVQ_012 I have better than average fluency in using words Meine Wortgewandtheit ist u¨berdurchschnittlich VVQ_013 My daydreams are rather indistinct and hazy Meine Tagtra¨ume sind eher undeutlich und verschwommen VVQ_014 I spend very little time attempting to increase my vocabulary Ich verbringe wenig Zeit damit, meinen Wortschatz zu erweitern VVQ_015 My thinking often consists of mental pictures or images Ich denke sehr ha¨ufig in Form von Bildern
Discussion The investigation of the VVQ reveals a large deviation between the original VVQ and the translated version. The data suggests that the translated VVQ contains the two predicted main factors (visualizer and verbalizer). These two factors or in other words the two extreme poles of the visualizer-verbalizer-dimension are covered with four items each. These are the items 02, 05, 11 and 15 for the visualizerpole and the items 01, 03, 04 and 12 for the verbalizer-pole. The remaining seven items cannot clearly be attributed to one of the poles.
Cogn Process (2014) 15 (Suppl 1):S1–S158
S153
Fig. 1 Screenshot of presented demographic items in LimeSurvey; first the dichotomous question for the participants’ gender, second a free-text field for age and third a drop-down box where participants choose their level of education Table 2 Underlying factors of the translated version of the VVQ items after Varimax rotation Item
Verbalizer
Visualizer
01
.754
02
.006
.037 .618
03
.684
-.034
04
.655
.068
05
.216
.423
06
.032
-.619
07 08
-.300 -.097
-.030 -.280
09
-.058
-.214
10
.127
-.151 .657
11
-.087
12
.587
.073
13
-.089
-.692
14
-.655
-.131
15
.011
.533
Fig. 2 Diagram of components in the rotated space; cluster with items VVQ_02, 05, 11, 15 represents the visualizer-based items; cluster with items VVQ_01, 03, 04, 12 represents the verbalizer-based items
Work in progress: Revising the VVQ When analyzing the data of the VVQ, the results show that nearly half of the questionnaire does not measure in detail whether a person is a visualizer or a verbalizer. This finding matches some data of our research group that both styles are not separable from each other, but a small number of people can clearly be assigned to one of the two groups. This shows that a translated form of the VVQ is not able to exactly distinguish between visualizer and verbalizer, which can also be assumed for the original version of the VVQ. The results could be explained in the way that, due to the translation process, the intended item content changed. An aspect that supports this assumption is that in some cases the participants answered in the wrong direction, as item 14 as an example illustrates: ‘‘I spend very little time attempting to increase my vocabulary’’ is translated with ‘‘Ich verbringe wenig Zeit damit, meinen Wortschatz zu erweitern’’ into German language. The problem is that the German translation could induce two possible solutions for a participant to answer this item with ‘‘yes’’, which makes a participant both, either visualizer or verbalizer. The first solution, which marks the participant as a verbalizer is that the participant wants to say ‘‘yes, I spend little time, because there is no need for me to spend more time on learning that stuff, as I already am very good’’. The second possible solution would clearly mark the participant as visualizer, when he or she answers in the intended way with ‘‘yes, I spend very little time on it, because I do not care much about that stuff’’. To solve this problem, it seems to be necessary to change the phrasing of several items. But when doing so, it is inevitable to change most parts of the inventory or even the whole inventory. We assume that one possible way to work with the translated form of the VVQ is to reduce the VVQ to eight items, namely the eight items, which are clearly definable as being part of the visualizer- or verbalizer-pole and use the inventory as a screening-test only. We currently do research on this possibility with a second online study. In this study our participants are asked to answer this VVQ screening version. On the one hand, we want to investigate, whether a strict distinction between visualizers and verbalizers is possible or if there is only one cognitive style as a result of the combination of both, visual and verbal abilities. Our research group also plans to use the translated VVQ as a pretest in further investigations on the visual-impedance effect (Knauff, Johnson-Laird 2002). The visual-impedance effect is described with relations that elicit visual images containing details that are irrelevant to an inference problem and in turn (should) impede the reasoning process (Knauff, Johnson-Laird 2002). The VVQ might help to discover whether visualizers or verbalizers are more affected by the visual-impedance effect. We assume that (extreme) verbalizers might not be as much affected as (extreme) visualizers, because their preference to imagine is more word-based (or propositional) and therefore their reasoning process might not be as much disrupted as it might be with the visualizers. Further research and conclusion The VVQ is a widely used tool in the German research area. One reason for this is that it is freely available (in contrast to some other questionnaires like the OSIQ (Blajenkova et al. 2006). Therefore, we here consider creating a completely new inventory that fits the German language better with the eight definable items of the VVQ as a basis. There are two ways to fill the inventory with items. In the first way, we suggest to translate and evaluate the revised version of the VVQ by Kirby et al.