Learning in the bath-tub: the micro and macro dimensions of the causal relationship between learning and policy change

Abstract Whilst the literature classifies policy learning in terms of types or ontological approaches (reflexive and social constructivist learning versus rational up-dating of priors), we offer a three-dimensional approach to explore the relationships between individual learning, learning in groups and the macro-dimension. Our contribution maps most (although not all) lively debates in the field on a multi-dimensional space and explores the logic of causality from micro to macro. To achieve these two aims, we draw on Coleman’s ‘bath-tub’. We map learning in the bath-tub by considering prominent studies on learning but also, in some cases, by exploring and drawing lessons from political science and behavioural sciences. By integrating findings in an eclectic way, we explain the logic of learning using a single template and suggest methods for empirical analysis. This is not a literature review but an original attempt to capture the causal architecture of the field, contributing to learning theory with findings from mainstream political and behavioural sciences.


Introduction and motivation
The claim that there is a causal relationship between learning and policy change is a classic feature of theoretical and empirical policy analysis. One popular textbook on Theories of the Policy Process edited by Sabatier and Weible (2014) refers to learning 80 times in relation to: the impact of collective learning (p. 13, see Heikkila & Gerlak, 2013); experiential learning clashing with preferred policy options (p. 31, citing Moynihan, 2006); organizational learning (p. 44, in relation to the multiple streams framework); policy-oriented learning affecting social constructions (p. 132, citing Montpetit 2007, p. 198 in the context of the advocacy coalitions framework); learning as mechanism of policy diffusion (pp. 310, 311, see Gilardi, 2010); learning and policy capacity (p. 401, see Howlett, 2009); and, learning as meso-theory adopted by the narrative policy framework (p. 243, see Shanahan, Jones, & McBeth, 2011).
We do not want to generalize from one book, although this particular one makes the claim to present the lenses on the policy process that have achieved the status of theories.
Other books contain chapters on learning, for example The Oxford Handbook of Public Policy (chapter by Freeman, 2006) and The Handbook of Public Policy Analysis (chapter by Grin & Loeber, 2007). Yet, when we go beyond references and empirical studies and look for the analytical foundations we do not find a systematic field (Dunlop & Radaelli, 2016). Consider these questions. Why would we expect an organization or individual to learn? On the basis of what model of the individual actor, can we theorize learning? Why should the process of learning lead to policy change? And, what are the effects of learning? Concepts and findings are not systematic, and often the studies on policy learning skirt around these questions. They document learning, but without building on the same understanding of where learning takes place, at what level and with what effects. Thus, the aim in this article is to provide some sort of conceptual architecture, and in this way to facilitate the cumulative progress in the field.
Speaking of concepts, we do have of course a strand of research on concept formation, mostly concerned with the types of learning, the difference between individual and collective learning and the relationship between the process of learning and the products of learning (Bennett & Howlett, 1992;Dunlop & Radaelli, 2013;Grin & Loeber, 2007;Hall, 1993;Heikkila & Gerlak, 2013;May, 1992). These conceptual efforts provide the stepping stones for building the architecture we seek. The reader should use these references for the definitions of learning in public policy, plus Freeman (2006, p. 369) on 'the essential issues that any account of learning must address' . But, let us mention two important points.
So far theorists of learning have done a good job in two directions. First, they have classified learning in types -this adds to the precision of the concepts deployed in empirical research. Second, they have distinguished 'different ways of thinking about learning' (Freeman, 2006, p. 369), often using ontology as dividing line. Thus, we have the organic, mechanistic, positivist ontology of learning versus the interpretivist and social constructivist ontology (Grin & Loeber, 2007). To be aware of a concept and its ontological presuppositions is fundamental. But it is not enough. To go forward, we offer a multi-dimensional approach to explore the relationships between individual learning, learning in groups, and the macro-dimension. Our contribution is in terms of mapping some (although not all) lively debates in the field on a multi-dimensional space and to explore the logic of causality from micro to macro. To achieve our two aims, we draw on Coleman's 'bath-tub' (1986Coleman's 'bath-tub' ( , 1990. We start with an outline of Coleman's bath-tub model and explication of how the bulk of policy learning research relates to it. Specifically, we show the inadequacy of the classic policy analysis literature that treats learning and policy change as associated at the macro level where the actual processes that underpin the relationship remain obscure. The three sections that follow address each of the main bath-tub components. First, the macro to micro level examines the micro-foundations of human action and policy learning. Drawing on political and behavioural sciences we demonstrate the importance of heuristics, emotions and surprises for policy learning. Next, we move to how individuals relate when in group settings. Studies examining this micro to micro setting uncover the disruptive potential of learning, the importance of individuals in sense-making and the socialization mechanisms that often determine what lessons are adopted or disregarded. Finally, we move to the analytical region of aggregation -from micro interactions to macro settings. Institutional analyses, studies emphasizing the framing effects of organizations and cultural identity and memories all demonstrate learning at the social level cannot simply be 'read-off ' organizations and institutions with their own processes of learning. With conclude with reflections on the utility of the analysis and the methodological implications of the different analytical dimension of the bath-tub. Note that, although we shed light on the different dimensions of the causal relationship, our word budget is such that we cannot explain how learning at one level influences learning at another level -see Witting and Moyson (2015) for this type of analysis.
Before we start, let us answer an important question raised by a reviewer of this article: what is ontologically implied by the bath-tub? Are you, our reader, buying into something when you see the literature mapped onto this analytical device, which is not without its critics? To paraphrase social constructivist language, we argue that the bath-tub is what a given group of social scientists makes of it. For some, this is the archetypical positivist template for explanation informed by Humean causality. It excludes holistic approaches that deny value to micro-foundations, and pins down the scientific enterprise to one particular type of causality and one approach to evidence. Thus -one could carry on -to embrace the bath-tub is to chain policy learning to the Procrustean bed of positivism.
Yet for us, in this project at least, the bath-tub is something else: it is a valuable heuristic to map the field. Our aim in this article, in fact, is not to address the ontological concerns and the debate on the positivist, interpretivist and social constructivist ontologies of policy learning (for this, see Grin & Loeber, 2007). For us, the bath-tub is a way to show how the literature populates the dimensions of analysis: the model of the individual, the group dimension, and the relationship between learning in one organization or sector, and finally the macro outcomes. This is a multi-dimensional way that is familiar to several theories of the policy process -indeed, in policy analysis it is common to refer to the micro, the meso and the macro level (where 'meso' means sectors or policy sub-systems rather than groups).
No doubt most of the authors we show on one dimension of the bath-tub would be surprised to see their work on the same conceptual map as the work of others, who start from different ontological assumptions and cover another dimension of the bath-tub. But this is exactly our point: the bath-tub allows us to map the field and show where studies informed by a certain ontology or assumptions or presuppositions fall in the analytical space. The advantage of using a multi-dimensional map is to go beyond the dichotomy of positivist ontology versus social constructivist ontology. It shows the field in its entirety rather than breaking it down into those who stay on one side of the ontological debate and those who endorse other ways of knowing (for these wider discussions about ways of constructing knowledge the reader can turn to Moses & Knutsen, 2012)

Why dip a toe in Coleman's bath-tub?
At its most general, policy learning is treated as the updating of beliefs based on experience, interactions, analysis or rules. Dunlop and Radaelli (2013) rehearse the main definitional features of learning: individual-organizational-social; intended, organic and un-intended; hard and soft; single, double-loop and complex; transformative of identities and preferences or more strategic and opportunistic. In terms of concept formation, learning can occur in five processes: improving policy to achieve governmental goals society-wide processes of enduring ideational adaptation to exogenous circumstances and sense-making, evidence accumulation and analytical sophistication inside public organizations, changes of core and secondary beliefs within networks or advocacy coalitions, and finally processes in which governments use knowledge about the policies of other governments to re-design or innovate policy (Bennett & Howlett, 1992).
The actual learning process however is often left undefined with authors preferring to focus on the outputs of learning -the 'lessons' that are drawn about policy instruments or their frames -and outcomes -changes in these. Bennett and Howlett (1992) indeed concentrate on organizational change, programme change and paradigm shift. Neglect of the learning processes means that we rarely deal with analytical issues about how different levels of the social environment contribute to belief change in policy. We address this challenge using a well-known way of exploring causal logic in the social sciences and then link it to learning studies.
Coleman's bath-tub (sometimes known as 'boat') models the processes that link relationships between social causes and outcomes as three elements (Figure 1). There are various interpretations of this model and we ought to stick to a simple one. For us, macro-to-micro concerns the micro-foundations of action. Micro-to-micro is the arrow that refers to how individuals behave when they interact in groups, for example a group of professionals in an organization. Finally, micro-to-macro is the analytical region of aggregation. This is somewhat in line with the notion that theories of the policy process consider individual, meso and macro features of the policy process.
As we said, this is simple and our aim is to map different studies, not to sell an ontological take on policy learning. There are more sophisticated ways to approach the bath-tub (see Mills, van de Bunt, & de Bruijn, 2006; see also Elster's discussion, Elster, 1989). But, this simple approach allows us to lay out some important arguments supporting the claim that learning causes change.
How can we apply the bath-tub to the causality of learning and change? Obviously not all learning explanations will be Coleman-compatible. Some (like exclusively holistic explanations) do not fit with the bath-tub, or require additional analytical frameworks. Like all models, Coleman's set of causal relationships may be accommodating a specific study only on one dimension (say, the macro dimension) even if the author of that study would emphatically deny value to micro-foundational analysis. It is the group of studies that we place onto the map that shows how the three set of relationships define learning, not the individual study.
Indeed, although this is not a review of the literature, it seems that the most famous studies of policy learning are empirically and theoretically engaged with macro-macro explanations. To illustrate, in Heclo's classic study of social policy change in Britain and Sweden the relationships about learning and policy change are at the level of government-led policies (Heclo, 1974). His study tracks down the dynamics of unemployment insurance and old-age pensions over the long-term. Having described inputs, processing of policy issues, outputs and feedback, Heclo famously concludes that the changes in social policy are part of a process of collective social learning that transcends power relations. These claims point towards macro-macro relations: variations of learning in the two countries 'causes' variation in policy outcomes.
Yet, Modern Social Policy in Britain and Sweden hints to the deeper causal structure of learning. For Heclo, social learning is not something we observe only at the level of society. Quite the contrary, learning is generated only by individuals 'alone and in interaction these individuals acquire and produce changed patterns of collective action' (Heclo, 1974, p. 306). Thus, social learning has its own micro-foundations (from macro to micro) and micro-micro properties (interaction among individuals). Heclo describes learning as involving both individual cognition and the characteristics of the social environment in which the individual operates. He approaches learning metaphorically: a maze in which the walls are re-patterned all the time; where individuals are bound in groups acting together; where the group disagrees on how to get out of the maze and more fundamentally on whether getting out is the best solution to the problem; and finally where there are many groups, not just one, inside the maze, and each group keeps getting in each other's way (Heclo, 1974, p. 308). This is not to say that learning is random. Instead, learning is shaped by individuals, organizations, the 'cobweb of interaction' (Heclo, 1974, p. 307) and the feedback effects of previous policy choices.
We can go further in portraying mechanisms of learning by turning to another classic author, Karl Deutsch (1966). For Deutsch, feedback goes well beyond the capacity to respond to the environment and re-create equilibrium. Learning capacity is generative of change. Learning is observed in actions that are produced in response to information. But, the information input 'includes the results of its own action in the new information by which it modifies its subsequent behaviour' (Deutsch, 1966, p. 88). Learning capacity is more advanced than the classic ('mechanistic' for Deutsch, 1966, p. 185) concept of equilibrium. In fact, a learning system is in principle equipped to pursue changing goals. Feedback in a learning system is a relationship between micro and macro -turning to Coleman's category. But, the macro context is constantly re-patterned, like the walls in Heclo's maze. Indeed, for Deutsch, feedback in a learning system generates a trajectory like the one of a zigzagging rabbit. The rabbit does not search for a homeostatic path, trying to get back to the old equilibrium. Instead the rabbit re-calculates the optimal trajectory in a cybernetic 'maze' and, because of this, is capable of creative, unexpected changes.
Thus, the components of the bath-tub we are trying to develop existed already in these classic studies: micro-foundations, the group or network dimension, and the complex interaction between cognition and the characteristics of the situation and the environment. But who learns, exactly, in public policy? The agents of social learning in Heclo are eminently individuals belonging the administrative and ministerial elites, hence our micro-foundations should model the incentives and motivation of the bureaucrat. But, later in the history of comparative public policy, the notion of 'social learning' became to cover wider categories of policy choices that affect society in a broader sense. This is Peter Hall's (1993) paradigmatic or third-order change, a change that is socially embedded -then, it goes beyond bureaucratic elites. Learning is collective sense-making, when a society at a certain moment in history converges through discursive practices towards a new understanding of what 'good policy' is. This is yet another way of considering the micro-macro relationshipcapturing the dynamics of public discourse and the 'social' widely defined. In this sense, microfoundations direct us towards the citizen and how public discourse emerges in the interaction between elites and individuals.
We can then proceed and examine the components of the bath-tub systematically, drawing on the literature that appeared since the days of Deutsch and Heclo. When necessary, we move from policy learning to wider political and behavioural sciences, thus contributing to filling the gaps in the literature on learning.

Establishing the micro-foundations: from macro to micro
Micro-foundations concern explanations of individual behaviour that provide the rationale for what we observe at the macro level. This brings us into the world of cognition. The tradition of so-called rational policy analysis (Carley, 1980) and, with much less normative emphasis on the 'rational' side of things, Bayesian decision theory provide a simple way of thinking about the micro foundations. In a nutshell, (instrumental) learning is about using evidence to adjust (coherently) our priors to reality and behave consequently. Cognition is about reasoning on evidence. In turn, evidence comes from what we see or from the interaction we take part in -opening the door to the micro-micro analysis.
There are two ways to build on this. One is to consider this Bayesian reasoning our baseline and see what happens in the real world. Another is to look at this reasoning property of learning with a sceptical eye and challenge the very claim that learning is about reasoning. Here we can follow Flyvbjerg's (2001) lead and take a wider Aristotlean view of knowledge -learning can be underpinned by epistemic updates, but it is also informed by matters of practical techne or value-driven phronesis. Starting with 'what happens in the real world' and deviations from the Bayesian cognitive baseline, the most robust evidence for micro-foundations comes from experimental political science and experimental psychology. Emotions, impulse, affect sometimes do not allow our brain to work slowly and follow reason. The mind follows heuristics instead (Kahneman, 2011).
In policy analysis, Simon (1955) and Lindblom (1959) set the early agenda on cognition, demonstrating that rational, deductive approaches clash with insurmountable computational limits. This story of bounded rationality is even more progressive today, when the problem is not lack of information, but its colossal size and availability. In policy theory, these arguments concerning heuristics have been chiefly developed by Schneider and Ingram in the later 1980s. They observe that the 'process of formulating policy ideas' and 'the logic through which policy intends to achieve its objectives' often follow heuristics (Schneider & Ingram, 1988, p. 61).
Consequently, our learning processes of drawing inferences from evidence cannot be systematic. At best, they are boundedly rational. But beyond that, experiments show that individuals learn following decision heuristics when engaging in problem-solving and design processes. Among these heuristics, the most important for policy learning microfoundations are availability, simulation and anchoring. In consequence, the learning process is punctuated by heuristics that take us in the world of biases. It is 'reasoning by analogy, search through possible examples relying on decision heuristics, or indiscriminately copying policy based on prevailing fashion or limited knowledge and experiences' (Schneider & Ingram, 1988, p. 78). Learning often is a matter of 'pinching ideas' quickly, rather than reflecting on the best available evidence in slow mode.
An important implication is that not all learning is beneficial to policy. These microfoundational biases explain how learning via heuristics causes policy change, but this change may become a policy fiasco. The government and the regulators in general can react to this state of play by using nudging techniques to rectify the effects of heuristics (Alemanno & Sibony, 2015;Thaler & Sunstein, 2008), although it is rare to see regulators that control for their own biases at the stage of policy design.
As mentioned, there is another way to build on the results of experiments. This alternative path rejects the normative implications of the language of bias and heuristics. How? By challenging the notion that learning is only reasoning on the basis of experience. From cognition we move into emotions -and some emotions are no longer seen as hindrances to learning, or, like Darwin (1872) said, leftovers from the process of evolution.
One can argue then that emotions are not necessarily this bad, fast 'beast' that destroys the logic of 'good' inferential learning. We know that affect and anxiety are not necessarily antithetic to the diligent search for evidence and information. They can also facilitate information search (Redlawsk, Civettini, & Emmerson, 2010). In cognitive psychology, the function of emotions is centred around information we need when we act on hunches rather than full understanding (as shown by the experiment carried out by Damasio (1996). Also within cognitive psychology, there is a body of experiments showing that emotions mobilize psychological resources (see Yiend, Mackintosh, & Mathews, 2005, p. 478 on arousal, anxiety and performance) and impact on cognitive processes such as semantic interpretation and attention (Yiend et al., 2005). To illustrate, emotions often motivate us to go deeper in our analysis of the situation or to pay more attention to certain features of the situation depending on our trait emotion. Generally, we do not look for information in the world out there, unless we have a personal motivation to do so.
Further, at least empirically, political judgements 'biased' by emotions and heuristics do not differ much from the rational, cool, informed political judgement, as shown by studies of the informed and uninformed voter (Lodge, Steenbergen, & Brau, 1995). And again: psychology and cognitive sciences argue that the level of emotions operates within any type of cognitive process. Emotions and cognition are not each other's opposite. Rather, they are two complementary processes: Ledoux (1989) reports on cognitive-emotional interactions in the brain. Cognitive appraisals help us to distinguish an emotion from another according to Schachter and Singer (1962). Cognition can then influence emotion. If emotions are inherently adaptive and depend on cognitive performance (Izard, 2009), they can facilitate rather than hinder learning.
For us, the most compelling argument against inferential learning (defined here as reasoning about evidence and re-adjusting priors with coherence) comes from a set of experiments that report that cognition can be fast-paced, linked to quick associations between cues and responses -or, in public policy language, between the stimulus coming from the environment and the behavioural response of the individual. These models of contingent learning are quite radical in their implications (Kamkhaji & Radaelli, 2016 for a review and application to the EU). They show that often it is not learning that produces change. Rather, it is change that creates contingent fast responses via cue-outcomes associations. Surprise trumps experience. Priors do not change under conditions of fast, extreme surprise.
But the individual forced to choose quickly still sees that by choosing an action instead of another brings a better result. There is no time to reflect on 'what have I done' . There is only time to use the fast experience of doing something in response to a cue; this becomes the only probe. And the subject carries on with the next fast probing mechanism. It is only in a second moment that these fast-paced associative mechanisms (i.e. associating a stimulus with a 'correct' response) bed-in and provide some feedback. This feedback in turn anchors cognition. And at this point the individual is in a position to draw inferences and learn via reason. We have come full circle: change causes contingent associations at the individual level, then feedback lays down the pre-conditions for learning. This does not happen every time of course and, thinking back to the seminal work of Argyris and Schön, when this learning is triggered it has a variety of depths (see Argyris & Schön, 1974 on single-and double-loop learning at the individual level). It is more likely to occur under conditions of crisis, emergency, 'do-or-die' situations of accidental heroes of policy change that only later understand what they have done. We have therefore scope conditions, especially surprise and extreme crisis conditions, for this micro-foundation of contingency.
In conclusion, our journey into micro-foundations reveals that individual learning processes are conditional on heuristics; that emotions are not necessarily distortions and actually may trigger more information search; and that under conditions of extreme surprise learning can follow change, instead of being its cause.

Micro to micro transitions
Micro-micro relations concern learning mechanisms affecting individual A and individual B, and relations between individuals and groups or social networks. There is a colossal literature on both, and as we have seen the problem of explaining these relationships within a causal theory of learning was already present in Heclo's insights on networked groups, within and across countries (Heclo, 1974, pp. 310, 311). Indeed, a year before Heclo published his study, Donald Schön (1973) talked about government as a learning system. This notion of government as learning system is suitable to explore the micro-micro relations because it points towards the evolution of policy ideas within networks of decision-makers.
Ideas can come from advocacy groups or metaphors -for example, recently political ethnographers have documented that a particular chart can show up meeting after meeting in Whitehall, and turn into the winning argument for a policy choice (Stevens, 2011). Ideas however do not necessarily move from the centre of the system to the periphery. Schön argues that innovation can already exist somewhere in the system, in many sources. Diffusion among groups is the process of bringing an idea from point A in the system to the centre and to other points in the system. This is obviously a lens that takes us to reject the notion of the stable state (Schön, 1973) and treat learning as underpinned by disruptive forces (Sabel, 2005). In theoretical approaches to the policy process, John Kingdon (1984) is the author that has theorized about the complex process of selection of policy ideas, pointing to a list of variables that mediate their acceptance or rejection in policy communities.
If we do not live under the conditions of the 'stable state' , then the question arises how do policy-makers get to know what they know, and act in consequence? Sense-making is a process of assembling partial elements of a puzzle and using networks to generate a set of shared or at least not contested meanings (Weick, 1995). For Freeman (2007) this micro-micro relationship is epistemological because it produces sense-making; but it is also bricolage because piecing together is an extraordinarily subtle and complex task. It does not resemble the scientific task of appraising evidence and drawing lessons directly from what works best. Instead, the bricoleur in contrast to the scientist or engineer, acquires and assembles tools and materials as he or she goes, keeping them until they might be used … [N]ot only are tools selected according to the bricoleur's purpose, but that purpose itself is shaped in part by the tools and material available. (Freeman, 2007, p. 486) These observations on the bricoleur lead to the categories of special individuals identified by the literature -something we can only acknowledge here, but has been extensively discussed in studies of policy entrepreneurship and norms entrepreneurs (for an extensive review of entrepreneurs see Cohen, 2016 and on the bricoleur see Deruelle, 2016).
Thinking about causality and what comes first in this set of relations, in another contribution Freeman observes that 'learning is the output of a series of communications, not its input; in this sense it is generated rather than disseminated' (Freeman, 2006, p. 379). Learning in public policy, for example in processes of policy diffusion involving networked actors across countries, 'is interpreted as much as explained' (Freeman, 2006, p. 380) and therefore the key question is about the point of acceptance where within a group there is a realization of what is known and why it matters for policy choice. Further, if there is a threshold or a point within a community of practice in a field of public policy (Lave & Wenger, 1991), this is not necessary the same point for society at large -recall what we said about Peter Hall and the social dimension of learning. Framing effects and their implications for citizen competence, and ultimately social learning broadly conceived, are the territory of behavioural decision theory, mobilization of bias across groups and target populations (Schneider & Ingram, 1993), and heresthetics.
Methodologically, social network analysis (Scott, 2000) including relational approaches (McClurg & Young, 2011;Selg, 2016); diffusion (Gilardi, 2010); interpretive policy analysis (Freeman, 2007); and ethnography, (Stevens, 2011) have delivered interesting findings. Experiments also feature prominently in this field. This blaze was trailed in the post-war years by Leon Festinger (1957) in his seminal work on cognitive dissonance. Inspired by this, in the late 1960s, Serge Moscovici and colleagues reported the results of a color perception task where minorities and majorities interact in groups, producing convergence. Here the most interesting difference is between conversion and compliance: a consistent minority can have influence to the same degree as a consistent majority. But since the former will have to work against the tide of numbers (which are of course in favour of the majority), its influence will have a greater effect, on a deeper level. Following Moscovici, Lage, and Naffrechoux (1969) and Moscovici (1980) the minority creates a conversion behaviour and the majority creates a compliant behaviour. We can reason that conversion is a thicker form of learning because it implies a real change of beliefs. However, it is typically not displayed in public. Thus, conversion is learning at the individual level that does not necessarily lead to policy change. A minority is invariably the carrier of a somewhat deviant judgement. The members of the group may be converted in private. However they will be reluctant to express public acceptance of what they have learned from the minority to avoid the risk of losing face of acting in a sort of deviant faction in the presence of other individuals (Moscovici, 1980, p. 211). This theory of conversion was then re-defined by convergent-divergent theory (see the review in Martin, Hewstone, Martin, & Gardikiotis, 2008), by arguing that minority influence stimulates a greater consideration of alternatives that was contained in the minority's original argument. This shows how influence is not dictated by sheer numbers or prevailing norms (otherwise the majority will always win) and that minorities often spread the seeds of change by convincing and persuading. Additionally, greater message processing (that is, reflecting on the content of the message and greater thought elaboration) occurs when the situation is un-expected. When the expectations about the messages supported by majority or the minorities are violated, individuals are surprised and this encourages an in-depth examination of the message in order to reduce inconsistency. This factors adds to the contingency of learning processes described above. Others have studied cognitive dissonance (Stone & Fernandez, 2008), looking at its motivation engine. Again we find that emotions may not hinder learning. For example, in experiments dissonance arousal motivates more action to restore consistency (see findings in Stone & Fernandez, 2008). On methods: beyond the lab, field-experiments are unlikely to be used due to ethical reasons (Martin et al., 2008, p. 376). However, minority and majority influence can be studied in real-life politics, organizations and minority movements (Martin et al., 2008, p. 376 citing Smith & Diven, 2002. The debate on minority and majority effects remind us of how individuals with particular characteristics (policy brokers and policy entrepreneurs in public policy, or figures like Aung San Suu Kyi in politics) are key to learning in communities of decision-makers and more generally communities of practice (Brown & Duguid, 2000;Wenger, 1998). We already observed that often there is resistance to new ideas simply because they are uncommon, not because they are poor. By the same token, the policy broker is described (by Freeman [2006]) as a stranger because she calls into question taken-for-granted assumptions of policy (Schütz, 1964). Policy entrepreneurs may be pivotal in groups and tilt the minority-majority reactions towards an idea whose time has come (Rietig, 2014 on EU climate policy). Social network analysis documents interaction and reveals the position of pivotal actors in the network or between networks, but can also be used with ideas as unit of analysis. In this shape, it tracks down how ideas co-occur in different policy documents and speeches, gain currency over time and eventually obtain agenda-setting status (Leifeld & Haunss, 2012).
The fact that ideas and beliefs are diffused within and among groups in various patterns, such as emulation, herding, epistemic authority, persuasion, brings us to socialization. Public management literature provides examples of policymakers' practice being influenced by 'experiential learning' (Kolb, 1974) on the job and more formal means such as continuing professional development courses. But, in terms of policy learning theories, the most important dimension added by the micro-to-micro arrow is learning via social interaction.
For policy learning theorists, socialization is not limited to small groups. Indeed, socialization has often been studied with reference to learning mechanisms in international organizations (Checkel, 2005). Socialization is a driver of learning because it can change norms and attitudes and anchor new ones (see however Hooghe, 2005 on the limitations of this mechanism in the case of the European Commission's officials). Checkel talks about type I and type II socialization among public officers -the former referring to individuals learning the norms of behaviour associated with a given situation and the latter to the rightness of a norm associated with a certain role. Type II includes an emotional attachment to the job and is defined as 'internalization' (Checkel, 2005). This distinction is similar to the one found in social psychology between the cognitive and affective dimension of attitudes. Daily work and experience of routines facilitate the development of new attitudes. Meyer-Sahling, Lowe, and van Stolk (2016) illustrate these points in their study of on how Eastern European officers learn 'silently' , in type I mode, via interaction with formal rules and the procedures of the European Union.
The findings on how public managers acquire competence and form new attitudes is mirrored by the debate on competence among the electorate. Here the mechanisms are different. An important mechanism is framing. The literature on framing effects draws on experiments in different forms, from changing questions in a survey to the classic experiment in the lab. Interestingly, in his review of framing effects, Druckman (2001) raises the question of the political conditions under which a framing effect causes competence or incompetence, bringing us to how the macro shapes the micro-micro relations.

Learning aggregation from micro to macro
This final transition 'moves back up from the individual level to the societal level' (Coleman, 1990:8). Whatever the processes of cognition, emotions, attitude change and ultimately learning may be within the individual and the group, we need to scale up to the level of an organization (to demonstrate organizational learning), the meso-level (to prove instrumental learning in a given policy, for example) or the level of society (to make the argument for a change in policy paradigms and Hall's type III learning).
Obviously, this aggregation problem -from micro to macro -is not solved by simply assembling groups, organizations and institutions with their own processes of learning. There is a vast body of work on how organizations, institutions and culture filter, mediate and shape processes of learning in this final transition. Organizational and institutional theorists have demonstrated that institutions not only shape learning. They can also enable or constrain, the core argument being that institutions lock-in certain forms of bias (Immergut, 1998). In turn, the institutional production of bias creates structural power for some agents or groups, and disempowers others. Organizations can also refuse to put some issues on the agenda, hindering processes of learning via non-decisions. With their rules on advisory committees and the role of science in decision-making processes, organizations like independent regulatory agencies confer or withdraw epistemic authority to certain types of evidence, thus influencing who learns what and from what type of science or evidence (Dunlop & James, 2007).
The organization is also the main location where individuals and groups experience the external environment. An interesting finding is that the organization and external environment boundary is fluid and constantly 'enacted' by organizations which frame and invent the external environment (Weick, 1979). Incidentally, the same can be said of the media: they provide the surrogate environment to people who are not there on the scene to observe the 'macro' reality (Lippmann, 1922). Thus, the environment acts on learning processes not just because it is something 'out there' but also in the particular ways in which it is framed or socially constructed within the organization.
At the wider level of society, learning is a dependent variable that generates long-term effects, and in this way it transforms into an independent variable, thus contributing to the historical, iterative evolution of institutions and political systems. To illustrate: learning is a dependent variable shaped by culture, memories, the social construction of nations and identities. Some features of this learning occur every day, gradually but steadily, like in Billig's banal nationalism (1995). But then these features become sticky and hard to reverse, thus shaping attitudes in the long-term. Putnam (1993) argues the repeated interactions in given historical settings teach lessons that are remembered for centuries, as shown by his analysis of social capital in Italy. Rothstein (2000) raises the question of how societies learn how to move from inefficient equilibrium of low social capital and distrust in institutions to an efficient one, avoiding the 'tragedy of the commons' . He adds to Putnam's and others game-theoretical explanation the variable of collective memories. These collective memories are not simply 'given' by history but can be strategically created by political entrepreneurs and political leaders: a case of power meeting learning.
Given the asymmetries of power built in and constantly re-produced by institutions, the puzzle is whether it is possible to learn how to overcome social dilemmas and still establish cooperation (Ostrom, 1990). Sometimes individuals and groups abstain from exploiting the free-ride option within an institution, or, at the opposite, accept an unfair outcome in a given arena. The explanation for that is arguably that society is made up of multiple interlinked institutional arenas (Kashwan, 2016). Institutions and nested games can reshuffle winners and losers across arenas, thus allowing groups to learn cooperation and solve collective action problems like the control of corruption and common-pool resource management.

So what? Discussion and conclusions
We have explored the claim that learning generates change in public policy by considering different levels of analysis. To generate an orderly discussion, we have separated the macro and the individual level. Following Coleman, we have investigated the transition from macro to micro first; then the relationships at the micro-micro level; and the final transition from micro to the macro organizational, institutional or society-wide level. By doing this, we have been able to focus on the mechanisms that operate at each level, and to situate a variety of claims about policy learning at the appropriate level. Our contribution is, therefore, an original way to explore the causal architecture of learning, to map the field in multi-dimensional fashion and to gain explanatory leverage from findings that have been originated outside the literature on policy learning. We argue that the main advantage of using Coleman is to go beyond classifications on the basis of learning types and differentiations of the field in terms of ontological presuppositions. Instead, our approach offers a way to organize the literature. The caveat is that we have not presented a literature review; rather we have populated Coleman's dimensions with illustrative and exemplary studies, trying to bring into relief the causal architecture that lies within the literature and distil the lessons for the research agenda on learning in public policy.
Another result of our analysis is the illustration of different modes of analysis and methods that are appropriate depending on the dimension of the bath-tub we consider. To systematically explore micro-foundations, we need concept formation (what is learning? what is cognition? what is an emotion?) and, turning to methods, experiments. In political psychology and neuro-economics there is an increasing use of brain imaging techniques (Camerer, Loewenstein, & Prelec, 2005). In economics, 'introspection' is a method to find out more about learning within decision-making processes (Earl [2001] on subjective personal introspection).
On the micro-micro level, we have reported on the findings produced by experiments, but other suitable methods are political ethnography, interpretive policy analysis, public opinion analysis, and methods suitable for the analysis of diffusion. The micro to macro transition seems amenable to organizational analysis (including the empirical studies of sense-making in organizations, see for example Brunsson's Mechanisms of Hope, 2006). Historical institutionalism process-tracing and methods of institutional analysis and development have already demonstrated their potential for the empirical analysis of collective memories and how groups and societies learn the solution to collective dilemmas.
Finally, a precise understanding of the levels of learning and the transitions we have discussed can inform the design of learning architectures. Governments and international organizations invest considerable resources in the design of architectures for policy convergence, fights against corruption and poverty, and administrative capacity. These architectures will not work unless they respect the basic logic of learning we seek to illustrate in this article.