Rewards and penalties in an evolutionary game theoretic model of international environmental agreements

Abstract The UNFCC on climate change specifies that all nations must follow the rule of ‘common’ with differentiation regarding their responsibilities for the protection of the global environmental system. Recently, the formulation and stability of the IEA have been increased in the literature by applying the concept of game theory to make the climate agreements successful at the national and the international level. This study provides a novel evolutionary game theoretic model of self-enforcing IEA to overcome the free rider problem. The fundamental difference between our paper and existing literature is that we examine enforcement within a model as IEA has a governing authority while the typical model of enforcement involves a government enforcing a rule that it has imposed. For this purpose, we assign countries into different grades according to their pollution levels, consider a combination of rewards and penalties, use replicator dynamics to derive the conditions for the population steady state, and examine how the proposed regulatory mechanism fares in this steady state. This framework enables us to avoid the free rider and renegotiation problems as well as the rationality assumption. We establish the condition for evolutionary stability. The global environmental problem is managed effectively as a reward-punishment scheme and the monitoring frequency of IEA fulfills this condition. Our results provide an allocation principal with stable conditions under which countries get more benefits by monitoring the IEA and stability of the grand coalition holds.


Introduction
The global increase in greenhouse gases has put pressure on countries to design and re-design policies to mitigate climate change effectively. However, what is best individually may not be optimal at global level. One reason is the existence of the free rider problem regarding the abatement efforts. This problem is analogous to the provision of public goods. There have been attempts to achieve an effective aggregate emission reduction target via international cooperation. The international agenda for mitigation and sustainable development becomes more specific with Paris Accord and 2030 agenda for sustainable development which reflects the recognition that can mutually achieve the goals of sustainable development and climate mitigation (Ivanova et al. 2020). Climate justice addresses the disproportionate effect of climate change on population and community sovereignty (Tramel 2016) and researchers have agreed that climate change and global warming' main source is energy consumption (Ahmad et al. , 2019. Two dimensions of climate justice have been seen: 'distributional (i.e., who is affected by climate change and who benefits from, and pays for, adaptation and mitigation policy)' and 'procedural (i.e., whose voice is heard in decisions)'. Hence, access and allocation problem in climate change mitigation are very complex. International Environmental Agreements (IEA) were recently modified and enacted globally for more effective collaboration among countries (G€ unther & Hellmann 2017). 1 However, the free rider problem still exists. In this study, we develop a dynamic evolutionary game theoretic model of pollution abatement with a global component. This model is unlike those suggested in the literature that relies on punishment strategies, where monitoring is incomplete and costly. Therefore, this paper contributes to a 'reward and penalty scenario' as a possible policy instrument in which countries decide on their emission levels and IEA decides on whether to monitor countries emissions. The model can be used for management of common environmental resources like the global climate. We use the concept of repeated games for stability analysis of the self-enforcing IEA. In our model, IEA is a strategic profile that aims at controlling pollution levels in the dynamics of a repeated game to maximize the countries' joint payoff. We focus on the simple strategy where punishment is executed only if the country is polluting more than the agreed level. Apart from that, if the countries reduce their pollution levels conforming to the environmental standard, then they get the rewards settled by the IEA. Stability in IEA means that no country has an incentive to deviate from cooperation or to renegotiate. The IEA is stable when the equilibrium conditions are satisfied.
Most of the international agreements are monitored by cost compliance. 2 For example, the Montreal Protocol has a Committee that is responsible to gather information on the compliance of the countries which are suspected to have violated (or be violating) the Protocol (Benedick 1998). Similarly, according to the Kyoto protocol, a Compliance Committee has the responsibility to monitor the compliance behavior of the signatory countries and then report their behavior to the Enforcement Branch, which has the power to impose sanctions (Depledge 2000). However, CITES tries to make a similar provision, which can monitor the compliance of the signatories through a Standing Committee with its own managerial body that has the responsibility to enforce the specifications of the agreement. 3 The enforcement provisions included in 'CITES, Montreal and Kyoto Protocol' allow their respective managerial bodies to monitor the parties' behavior in the case of violation ofany of the treaties. Nevertheless, the design and structure of enforcement institutions, which, while being different across IEA, they do have a common characteristic: that the cost of monitoring activities must be borne by the parties. In this sense, our work provides a new extension of reward that helps the institutions to observe and finance the monitoring activities in an effective manner.
In literature (Barrett 1994;Carraro & Siniscalco 1993), Nash equilibrium was found by one shot-game. It is observed that there is Payoff advantage to IEA participation when participation is low and a Payoff advantage to free-riding when participation is high. IEAs membership are formed by internal and external stability from cartel literature (D'Aspremont et al. 1983). However, Barrett (1994) proves that with significant gains from cooperation, there is only one benefit to join an IEA when participation is very low. McGinty (2006) relaxes the assumptions of symmetry in Barrett (1994) and find greater participation in the form of tradable pollution permits when transfers are implemented. However, Barrett (1999Barrett ( , 2001 considers a linear IEA model through a stage game. In Barrett (1999) identical countries has either constant or increasing return to abatement. Barrett (2001) introduces benefit asymmetry by considering constant return to abatement through a prisoner dilemma with a dominant strategy of pollute into a stage game. In 1st stage countries decide whether to participate in IEA or not. In 2nd stage, signatory countries collectively choose to pollute or abate and non-signatory countries individually choose to pollute or abate. However, it is assumed that signatory countries choose to get maximum payoff and comply for this decision. This result is different from Barrett (1994) in which one signatory who decide to pollute cannot result others countries to pollute. Then question arises that what type of IEA that is obtained in Barrett (1999Barrett ( , 2001 through simple linear framework when game is that of partial participation, return to abatement vary, externalities differ both within and across population and time assumption of stage game are relaxed. Barrett (2001) tries to highlight that how transfers among signatories countries can improve IEAs. An equilibrium is obtained for full participation through lower benefit countries and greater participation through high benefit countries in the case where 'cooperation is for sale'. This paper allows a linear evolutionary game and allows stable equilibrium and model the special IEA and allows the externalities that are differ within and across populations.
A stable environmental agreement is crucial for the control of any kind of pollution at the global level. Although there is a large body of literature on emission mitigation strategies, there seems to be more to explore the development of a global environmental governing mechanism. Barrett (1994) seems to be the first who applied it to the analyses of IEA and examined their relations. However, the terminology 'self-enforcing' seems to be misleading because it does not highlight the enforcement of compliance on the members of an agreement. Most researchers who apply this concept just argue that parties of a 'self-enforcing' IEA comply totally with the terms of the agreements (e.g., Barrett, 1994Barrett, , 2003Carraro & Siniscalco, 1993Hoel, 1992). Such a dynamic mechanism can be an effective tool that motivates countries to reduce their pollution levels. However, there is the issue of how to make the countries agree to adopt a governing mechanism of this sort. The reward-penalty scenario in the model incorporates the agreement on the governing mechanism in which IEA plays as a leading role. Environmental regulations in our model refer to the IEA as a governing authority to impose penalties on countries whose pollution exceeds the agreed levels. This kind of regulation relies on a simple rule and may lead to satisfactory results. Dividing the countries into two groups according to pollution standards can make the countries strive for the joint environmental objective and not for individual gains. However, the 'only penalty but no reward' scenario will make the countries see the environmental regulations as a burden. To overcome the evasion of supervision, secret filming and cover-ups happen now and then. Our model considers two aspects. First, we divide the countries into different graded categories according to their pollution levels and develop an evolutionary game model for this population. Second, we introduce the reward and punishment scheme such that reporting their pollution level truth fully in the first place makes the countries better off in equilibrium. This, we believe, is an important contribution of our model to the literature. The evolutionary perspective is adopted not because it seems suitable to capture the idea of survival in the face of environmental problems such as global warming, but because it enables us to avoid the rationality assumption in traditional game theory. According to our knowledge, incorporating both aspects is new to the literature.
This work adopts a different method to study the rewards and penalties by IEA on the basis of monitored activity that is a part of the established international agreements. However, our work maintains the self-enforcing agreements to find out the equilibrium of parties to an IEA. Therefore, to reduce the incentives of a party to attempt to violate the terms and conditions of an IEA, they finance an authorized body which has the power of monitoring extent to which the behavior of any party involved complies to what has been agreed on. The parties to an IEA that are monitored and deemed in violation of the agreement incur a costly sanction. There is also a large literature on the economics of enforcing environmental policies. Some contributions on the topic of enforcing emission standards, tradable emission permits, and emission taxes include Arguedas (2008) and Stranlund (2007). The fundamental difference between our work and the law enforcement literature is that we examine enforcement within a model as IEA has ruling authority, while the typical model of enforcement involves a government enforcing a rule that it has imposed.
The results of our model have some key policy implications. The IEA must impose the rules that will enforce the countries to lower their pollution to the agreed level. Therefore, IEA rules must specify the consequences of penalty as a punishment in cases where the countries deviate from their emission reduction targets; otherwise, the countries will get a reward. This reward-penalty scenario indeed appears to be the best path to control pollution at global level. Our model can also serve as a benchmark that could be helpful in the face of different global environmental problems. Furthermore, our model can be used towards a better understanding of the free rider problem.
The rest of the paper is organized as follows: Sec. 2 discusses where our study stands in the literature. In Sec. 3, we introduce and develop the evolutionary game model that focuses on the stability analysis of the IEA and introduces the equilibrium point. Section 4 explains the results of this paper and allocation criteria are also discussed. Finally, Sec. 5 discusses policy implications and possible extensions to this study.

Literature review
Game theoretic models are widely used in the literature to analyze International Environmental Agreements (IEA) on air pollution. Starting with Barrett (1994), some scholars studied the free-riding problem by applying the concept of one-shot or repeated games. G€ unther and Hellmann(2017) and Benchekroun and Long (2012) provide excellent reviews of recent game theoretic models in environmental economics. However, the formal mechanisms for climate change cooperation are discussed in Hovi et al.(2015). In their review of climate change cooperation models, Hovi et al.(2015) paint a bleak picture of global cooperation, but also imply that large and stable coalitions may be possible. Koo and Hong (2017) have commented that various efforts have been made for greenhouse gas reduction where carbon reduction and energy performance certificate can be achieved through voluntary national energy saving campaign. They have proposed four incentives and four penalty programs by considering three criterion such as building, community and national level. On the other hand, researchers like Pavone (2018) have documented that CO 2 emissions are in rise steadily and efforts should be made to counter the problem. Our study aims to specify the conditions under which this may occur.
Considering global pollution as a repeated game, plenty of studies treated IEA as the strategies of coalitions that will maximize utility for participants that are committed to reducing their pollution levels. In most of these studies, all member countries are involved in the punishment of the deviator. However, the non-cooperative equilibrium cannot be improved on due to only a few countries cooperating to implement such punishment (Barrett 1994(Barrett , 1997. To reduce the incentive of renegotiation, some scholars introduced punishment strategies. Asheim and Holtsmark(2009) and Froyn and Hovi (2008), for example, introduce the 'penance' strategies that are modified with 'penance-k' strategies in Farrell and Maskin (1989), where a subgroup of k-players punish a deviator for a finite number of periods until the countries revert to cooperation. Our model differentiates itself from the existing literature by incorporating both reward and penalty strategies. This is a novel structure to model the IEA.
An enforcement model for global pollution control was proposed in Heitzig et al. (2011). The linear compensation in their model dynamics does not provide efficient punishments, but a redistribution scheme of liabilities, based on the level of compliance with the previously committed levels, is required. This kind of setup allocates the targeted distribution and is an efficient solution for full cooperation with strong stability properties. They did not apply the concept of renegotiation to keep pollution levels on steady basis and showed that any allocation of optimal total payoff can be sustained as a subgame perfect equilibrium. They indicate a need for incorporating bounded rationality in future work. The evolutionary game theory approach enables us to avoid the rationality and the common knowledge of rationality assumptions. Some of the environmental models treat pollution as a flow variable. However, it may be acceptable for short-lived climate gases and it requires the imposition of assumed simplification of greenhouse gases that remain in the climate for a long time. Rubio and Casino (2005) and Rubio and Ulph (2007) were the first studies that applied IEA for a pollution stock. The literature that treats pollution as stock is detailed in Calvo and Rubio (2013). The stability conditions for environmental agreements are studied by Kratzsch, Sieg, and Stegemann (2012) in which emission stock builds over time and payoff depends on pollution level. They showed that it is possible to enforce international cooperation via 'penance-k' strategies. Although we treat pollution as a flow variable in our model, we also discuss how our results fare when pollution is considered as stock.
The application of evolutionary game theory to environmental pollution problems has been picked up recently in the literature. For example, Zhang et al. (2014) investigates the optimal controlled strategies by using two-stage programmed model of regional water pollution. The paper of Miao et al. (2014) applied an 'interval-fuzzy De Novo programming' model to study the analysis of optimal allocation for water resources. Similarly, the optimal designs to manage the regional energy problems are studied by Suo et al. (2013) and Zhang et al.(2014). We do not resort to this approach because optimal pollution control strategies derived from the optimization methods are appropriate at the regional level. Instead, we combine the blame game of Ellingsen and € Ostling (2011) with a rewards-penalties scenario at the global level to examine IEA.
We find a few evolutionary game theoretic models that deal with the global pollution problem in a standardized manner for IEA. Breton et al. (2010) develop a dynamic evolutionary game to examine which countries join the IEAs in the equilibrium condition. In this model, only signatory countries impose the punishment to non-signatory countries on some cost and introduce a mechanism to report how the countries reach a stable IEA. This model characterizes stable partial cooperation within the IEA over a period and captures the situation in which all countries participate in a stable agreement. McGinty (2010) introduces an evolutionary equilibrium in which no signatory country prefers to remain outside the IEA and the evolutionary equilibrium is robust to trembles. He finds a unique interior evolutionary equilibrium in two populations with a decreasing return in abatements and small asymmetries in externalities within the populations. He concludes that the IEA's predictions on the polluter pay and the ability to pay principles result in a Pareto inferior outcome. In the light of literature on evolutionary game theoretic models for international environmental agreements (IEA), our model introduces an extension in the reward and penalty scenario that overcomes the free rider problem and suggests a regulatory mechanism that can mitigate global pollution effectively. Bollen et al. (2009) introduce a cost-benefit analysis where some pollution control policies can generate extra benefits for climate change mitigation. Following their recommendation, we formulate a 'reward and punishment' scenario that allows controlling pollution at the global level. Our evolutionary game theoretic model presents a policy suggestion that makes participants better off.

Penalty-reward mechanism
In this paper, we are interested to investigate and understand that how the addition of Reward-penalty mechanism in IEAs can affect countries decision to participate. In this sense, cost-benefit calculation derives countries participation decision (Bernauer et al. 2010(Bernauer et al. , 2013Roberts et al. 2004;Spilker & Koubi 2016). For joining an IEA, countries bind themselves through international law to obligate an agreement. Although, such obligation is costly (Downs et al. 1996). This kind of cost includes not only direct cost to implement an agreed policy 4 , but also includes indirect cost for the lose of sovereignty and autonomy. Joining an IEA also results in transaction cost due to cooperation and coordination with other treaties when a conflict arise (For example, Bernauer et al. 2013;Downs et al. 1996;Spilker & Koubi 2016). Literature also provides evidence that such costs decrease countries willing to join an IEA (Hathaway 2003;Roberts et al. 2004). While, some studies show that treaties feature can reduce such costs (for example, availability of assistance) that will increase countries participations level (Spilker & Koubi 2016). To increase countries participation level, Asheim and Holtsmark (2009) and Froyn and Hovi (2008) introduce penalties strategies. For example, 'penance' strategies that are modified with 'penance-k' strategies in Farrell and Maskin (1989) is a subgroup of k-players which will punish a deviator for a finite number of periods until the countries revert for cooperation. We add to this literature by examining whether the inclusion of reward-penalty mechanism adds or reduces to treaty participation to make an IEA more successful In order to encourage countries to control pollution, we assume that IEA not only punishes countries based on their pollution level but also rewards those that meet the pollution emission standards under MPC. Let us suppose that the IEA has to encourage the countries not only by means of penalty but also by providing rewards to those countries that reached the targeted pollution reduction level settled by IEA. Suppose that IEA divides the countries into different ranks by their pollution levels. The IEA dividing scheme will be as follows: if the country's pollution level is lower than the target level, then this is denoted by E 1 : Acountry's pollution level increases by a fixed amount to levels denoted as E 2 , . . . , E i , . . . , E K , where E i 2 N þ and then the pollution emission strategy set is written as L ¼ ðE 1 , E 2 , . . . , E K Þ: Then, the IEA imposes a penalty of E i E 1 À1 on those countries that take the strategy E i , where e is a punishment amount and E i E 1 À1 determines the extent of punishment. In our game, there are N countries that are free to make the decision to choose E i : The strategy set of the jth country isd j , where d j 2 L: L belongs to the higher effort level of the countries for pollution reduction. It means that the countries can freely choose their pollution emission strategies from Figure 1. Hence, the strategy set of N countries is shown asS ¼ ðd 1 , d 2 , . . . , d N Þ: Suppose that E ¼ minðd 1 , d 2 , . . . , d N Þ and E ¼ maxðd 1 , d 2 , . . . , d N Þ: The reward-penalty mechanism can be described as follows: giving a reward c for countries that meet environmental standards and imposing a penalty c on the most serious polluters, then, the rewards and penalties for the countries that adopt the strategy E i are indicated by the function,c:R i : Where, R i is a reward-penalty indicator as in Eq. (1) below: (1)

Evolutionary game modeling on IEA's monitoring for pollution control
In order to tackle the global pollution problem, suppose there are N countries in the evolutionary game model. Our evolutionary game theoretic model is based on the reward-penalty scenario. The countries freely make their decisions according to the principle of maximizing their benefits. Set the proportion of countries that take the strategy of E i in all N countries as X i in period t; then, the proportion vector that depicts the countries' pollution emission situation can be written asX ¼ X 1 , X 2 , . . . , X K f g , where P K i¼1 X i ¼ 1: A country adopting the E i strategy obtains a payoff of CðE i Þ: If the IEA applies the monitoring strategies, then the countries suffer from the graded penalty E i E 1 :e and gain the reward-penalty compensations c:R i : Then, the utility functionp A ðE i Þ of a country adopting the E i strategy that will be monitored by the IEA is as follows: Similarly, the utility function p B ðE i Þ of a country adopting the E i strategy that is not monitored by the IEA is as follows: It is very simple to prove that all countries adopt the high efforts level in 'unique pure strategy equilibrium'. The detailed proof of the pure strategy equilibrium can be found in Propositions 1 and 2 and dynamic equilibrium conditions of our game model can be found in the Appendix.
Proposition 1. The 'unique pure strategy equilibrium' of an Evolutionary Game will be (K, K, … , K).

Proof.
Let g< e C : For any strategy profile with E <K, the payoff is given to a country for choosing :e þ c:R i : For a strategy profile in which E ¼ K, then there is a unique best response of all countries which choose to play K. Therefore, the 'unique pure strategy equilibrium' will be ðK, K, . . . , KÞ: In discrete versions of the Evolutionary game, the conditions of a 'unique equilibrium' are a bit more restrictive. Otherwise, if the conditions are met, we can relax the solution concept. This paper considers particular finite strategies set ð1, 1 þ k, . . . :, K À k, KÞ with K>0: Proposition 2: Suppose e>Ck. The only strategic profile that will survive iterated elimination of a strictly dominated strategy will be (K,K, … ,K).
Proof. The proof tries to show that the low effort level is strictly dominated by the second low effort level. Once the low effort level is eliminated, the second low effort level is strictly dominated by the third, and so on. To prove this, suppose that E<K is a low effort level which will not be eliminated yet. In this sense, this shows that Clearly, playing E Ã strictly dominates playing E as long as ee>Ck: This process of elimination continues until all effort levels, except E ¼ K, are eliminated. The IEA sets the monitoring cost and probability as C and P at time t, respectively. The pollution control cost is denoted as uðE i Þ and the negative impacts of not monitoring are rðE i Þ: Then, the utility function u A , when IEA monitors, is defined as in (4): Similarly, the utility function u B , when IEA does not monitor, is as follows: The stability analyses are presented in Table 1. Table 1 shows the evolutionary stability of the equilibrium point that is determined from strategic benefits. If a strategy receives higher benefits than other strategies, then the countries will chooseE i strategy by continuous imitation as an equilibrium strategy in the long-run steady state of the repeated game. For other strategies E j ði 6 ¼ jÞ or mixed strategies, the equilibrium will be unstable. If the benefits of all strategies are equal, then the countries' evolutionary strategies are more complex, and bifurcation might occur. Therefore, the IEA should adopt the affordable levels of rewards and penalties to stimulate strategies that yield higher benefits for countries, like, pðE 1 Þ>max pðE 2 Þ, . . . , pðE K Þ È É :

Simple example, allocation criteria for emission reduction
In case of multiple countries that are committed to reducing their pollution levels, assume that IEA is going to divide the countries into two grades and this grading system is according to the pollution levels as in we examine the interaction between the IEA and countries' pollution strategies.
For IEA, the expected benefits of adopting the monitoring and not-monitoring strategies are defined as u A ¼ À1ÀX 2 þ X 2 :e, u B ¼ X 2 from Eqs. (4) and (5) respectively. If u A >u B , it means thatX 2 > 1 ð eÀ2Þ, = then the expected benefits of choosing the monitoring strategies are higher than that of not monitoring. our model argues that if a country limits his emission level at P K i¼1 X i ¼ 1, then that country can claim his reward in the case when IEA monitorand and its expected benefits is as u A ¼ À1ÀX 2 þ X 2 :e: In the case, when IEA adopts not-monitoring strategy then its expected benefits is as u B ¼ X 2 : Therefore, the IEA will decide to adopt the monitoring strategies after a long-term repeated game. However, at P ¼ 1, the IEA will become consistent and will be in the equilibrium state at which it could enforce the laws independently. If u A <u B , it means that X 2 < 1 ð eÀ2Þ; = in this case, the expected benefits of not-monitoring are higher than those of monitoring. Therefore, the IEA will decide to adopt the not-monitoring strategy in the long-run steady state of the population. Moreover, at P ¼ 0, the IEA will not be in an equilibrium path, which will lead to failure of supervision and to environmental degradation. If u A ¼ u B , it means that X 2 ¼ 1 ð eÀ2Þ = and any P 2 ½0, 1 is an equilibrium strategy, but not an evolutionary stable strategy. So, we conclude that, if the proportion of countries with pollution levels is higher than the critical values 1 ð eÀ2Þ, = then the IEA will tend to monitor, otherwise not.
For the countries, the expected benefits of adopting the strategies E 1 or E 2 are defined as pðE 1 Þ ¼ P:c and pðE 2 Þ ¼ 1ÀP:eÀP:c, respectively. Two equilibrium conditions, X 1 ¼ ð1, 0Þ T and X 2 ¼ ð0, 1Þ T are obtained from Eq. (A4) from the Appendix. We can conclude that the stable strategies are as follows; if pðE 1 Þ> pðE 2 Þ and  0, 1, 0, . . . . . . :, 0Þ T Source: Authors own calculations based on proposed model conditions. pðE 1 Þ>0, i.e., 2P:c þ P:e>1, the compensation incentives are higher than the punishments. Therefore, in this case, the countries will control their pollution level in the long run. The E 1 will be the equilibrium strategy in this case and the IEA will effectively impose regulations to prevent environmental degradation. If pðE 2 Þ> pðE 1 Þ and pðE 2 Þ>0, i.e., P:c þ P:e<1, then the compensation incentives are lower than the punishments imposed by IEA. Therefore, the countries' pollution levels will rise in the long run. If the aforementioned conditions are not fulfilled, then there are no evolutionary stable strategies. When this situation occurs, the IEA monitoring will not be significant, and the countries will randomly choose their strategies. So, we conclude that IEA must adjust the reward and penalty system and increase the monitoring frequencies ð2P:c þ P:e>1Þ in order to encourage the countries to control their pollution levels effectively.

Allocation principle
For pollution control, the second commitment period of the Kyoto protocol is going to end in 2020 after the extension decided in the UNFCC meeting on climate change. During this period, a different suggestion was introduced for future institutional setup and most of them employ different perspectives. Some of the signatory states suggest a governance structure based on a global emission trading scheme (Bradford 2004). Some of the researchers put forward a framework of international technology development Benedick 1998;Rose et al. 1998). Some of them propose an institutional framework in terms of political feasibilities (Aldy et al., , 2010Bodansky, 2004;Ringius et al., 2002). The allocation of CO2 emission reduction among countries remained in discussion in all of the proposals during the Kyoto conference and thereafter it is referred to as 'burden-sharing'. Some other proposals introduce ideas that go beyond the country's institutional setup that provides partnerships and other non-states mechanisms.
The most challenging task in the research on climate change is allocation. The allocation criteria for emission reduction can be categorized in three broad categories: responsibility, capability, and efficiency (den Elzen et al., 2007). Consequently, these criteria create a new allocation method under the same principles. Every nation is responsible for climate change according to this approach. Some proposals allocate emission limitations on a country level such as a clean development mechanism (La Rovere et al., 2002) that is based on the polluter pay principle. However, according to the same criteria, earlier industrialized countries like the UK bear more burden in comparison to developed countries like the USA and Japan. Methodologically, some questions arise regarding this criterion, such a show to select the carbon cycle in a model, the starting year of calculation, and how to deal with reasonable data over the past 100 years. Therefore, there was a need to develop a strategic model in which each emitter who is not curious for pollution control should monitor through a strategic plan to sustain global cooperation.
Another responsibility criterion is based on per capita emissions. Although the figures of per capita emission vary between developed and underdeveloped countries and under these criteria, the final allocation should be equal. The formulation of this approach can be found in the literature as 'contraction and convergence' (Aslam, 2002;den Elzen & Meinshausen, 2006). Under this approach, countries are committed to converging their emissions per capita by a certain year, like 2050, 2100. Some studies focus on the allocation criteria under responsibility (see, for example, Ott et al. 2004).
The ability to pay is also an allocation scheme that focuses on economic perspectives for pollution control. Under this approach, the parties can only reduce their emission when they have exceeded a certain level of per capita welfare (Jacoby et al., 1999). Finally, the efficiency criteria emphasize the allocation of benefits and burden implied by climate mitigation. The allocation under this approach seems fair in terms of economic competitiveness among countries (Kanie et al. 2010). Under the allocation principle, a question arises: can a mechanism design reach the large IEA? Our finding provides a stable condition under which countries get more benefits when IEA implements monitoring practices and the stability of the grand coalition will hold. In this line, the Montreal Protocol and the Vienna convention seem real examples that were initially signed by 28 and 46 countries respectively and reached universal ratification in 2009. Our results show an important difference from the literature that has considered the concept of MPC (Rubio & Casino 2005). In our case, the number of countries is taken endogenously, while it was exogenous in the reference case. Carraro et al. (2009) choose to consider MPC endogenously by adding the first stage in which countries set their minimal coalition size for a treaty to be implemented.

How IEA can overcome free riding?
Free rider problem can be overcome by trade measure and issue linkage. The trade measures can be considered as attractive incentive for free ride. High incentives can increase participation which can be mitigated by a trade measure that is more effective for larger participation. Tow kind of trade's measures can be considered: trade control and trade sanction. 'A trade control' looks like an instrument that will be used for regulating a product that might be in the agreement. 'A trade sanction' works as a particular act that will force government actions and will take an action for non-compliance or non-conformity to an international standard. The trade control is adopted for international environmental treaties like Montreal protocol. 5 Therefore, Brack, Grubb, and Windram (2000) highlighted that such kind of controls for greenhouse gases could create the difficulties for applying and could create the severe limitation for trade as well as a higher welfare loss. However, they suggest that similar kind of controls can be very operative and can be considered for developing a climate rule. 6 While in the case of 'trade sanctions', any ecological treaty can't use it as an instrument to implement like WTO. 7 Victor (2004) highlighted that implementation of 'trade sanctions' in the climate change system can successfully connected with WTO. 8 However, the studies on the role of economic sanction in global organization can't promise high efficiency. Even though, the implementation is not being enforced positively, so the climate regimes must take the alternative enforcement technique 9 (Charnovitz 2003). In spite of various difficult problems, as well as compatibility with WTO rule, implementations of the trade measure cannot be positively applied. Such kind of implementations measures might be consider as the last option that can guaranteed a real cooperative agreement.
The free rider incentive could be dealt with another strategic policy that is the 'issue linkage'. Cooperation that is exaggerated by free-riding-i.e., the cooperation for global climate change-could be associated with the club or quasi-club goods. The perception might be that an incentive for free-riding with non-excluded benefit of public goods could be equalized with an incentive that will come from excluded benefit from the provision of club goods. Barrett (1994Barrett ( , 1997) offers a link that will protect the environment to negotiate on trade liberalization. In this way, potential freeriders are frightened with the threats of trade sanctions. In Carraro and Moriconi (1997), the environmental cooperation is linking to 'cooperation in Research and Development'. If any country doesn't want to join an agreement to control the environment, then it will lose the benefits from technological cooperation. Mohr (1995) and Mohr and Thomas (1998) propose linking climate negotiations to international debt swaps.

Conclusion and policy
In this paper, we provide a novel evolutionary game theoretic model of self-enforcing IEA. For this purpose, we assign countries into different grade categories according to their pollution levels, consider a combination of rewards and penalties, use replicator dynamics to derive the conditions for the population steady state, and examine how the proposed regulatory mechanism fares in this steady state. This framework enables us to avoid the free rider and the renegotiation problems of IEAs, the rationality and common knowledge of rationality assumptions of the traditional game theory. We establish the condition for a steady state. The global environmental problem is managed effectively as the reward-punishment scheme and monitoring frequency of the IEA fulfills this condition under MPC.
Our main findings indicate that in the steady state of the game, countries conform to their agreed pollution levels, when the IEA's rewards and penalties, and the monitoring frequency meet the equilibrium condition. This means that the IEA can fulfill its duties and prevent environmental degradation efficiently in the case of equilibrium under the evolutionary game. Otherwise, the benefits for deviation will be higher than the reward for accommodation leading to increased pollution levels. We conclude that, in order to keep global pollution levels under control, the IEA needs to continuously adjust the level of rewards and punishments as well as the monitoring frequency.
We impose a restriction on IEA such that countries only have one option to control their pollution levels. Therefore, in a repeated game, the penalty will be imposed only if the country pollution level increase. In reality, countries may not be flexible to adjust their pollution levels. However, they have another option to prevent deviation. For example, one option is trade sanctions imposed on the signatories of the IEA. To sustain global cooperation, Barrett (1997) shows that imposing trade sanctions is the only possible way that can provide necessary incentives to countries to join IEA.
Allowing for trade sanctions in an evolutionary game theoretic model could be a valuable extension of our model.
Due to the generalizability of our model, our mechanism design could serve as a benchmarking mechanism that could be furnished and refined in the future. We observe that the pollution targeting policies put forward in Bollen et al. (2009), considering the effects of global and local pollution could help improve global cooperation. Considering the local spillover of pollution, our reward-penalty mechanism could be designed accordingly helping countries to avoid free riders without making the agreement vulnerable to renegotiation. In this way, some countries (US, UK, EU) that initially agreed to reduce the pollution on a conditional basis of others joining them, might achieve the global cooperation by taking the pollution structure into account. Last but not least, our mechanism design is not limited to the application of pollution reduction. It can be easily applied to other problems pertaining to the provision of global public goods.
A surprising policy implications emerge from model. Both, 'polluter pay principle' and 'ability-to-pay principle' dictates that rich countries are required to abate. It is a central force behind the Kyoto Protocol in which reducing global inequality was clear goal. If Annex-1 countries were allowed to purchase tradeable pollution permits from Non-Annex-1 countries to meet their abatement requirements. In this way, a transfer mechanism from rich to poor countries can be settled for pollution control. But in case of decreasing return to scale, all countries will be better off if a poor make a transfer to rich. Rich countries bring more to agreement, but receive higher incentive to remain outside. To overcome this free riding incentive is a reason that rich countries are receiving positive transfer. Effective IEA must be pare to-improving and increase global inequality. One possible reason in this inequality is issue-linkage. Therefore, a multiissue agreement has a potential to increase payoffs and reduce inequality. Notes 1. Examples are the Oslo protocol in 1994 for reducing sulfur, the Montreal protocol in 1987 for the depletion of the ozone layer and the Kyoto protocol in 1997 for the reduction of greenhouse gas emissions. 2. This point is argued by Heister et al.(1997) (Barrett 2003). 5. In the Montreal Protocol, parties are required to ban trade with non-parties of ozonedepleting substances and products containing them. 6. More limited measures such as the application of duties or taxed against various categories of imports from non-parties could also be employed, according to (Brack et al. 2000). 7. The only two international organizations that impose trade sanctions against noncompliance are the UN Security Council and the WTO. 8. (Victor 2004) suggests a program of penalty tariffs and trade sanctions to counteract the economic advantage gained through non-compliance. Schram Stokke (2004) has also argued that trade measures could be an effective instrument against non-compliance. He predicts that such sanctions would work best if they were carried out multilaterally against the country at fault. 9. One possibility would be to enhance transparency and public participation in the international supervisory system in the hope of putting internal political pressure on governments to comply. The climate regime could also consider the use of monetary assessments against non-complying governments, a technique employed in the European Union, and being tested in new free trade agreements, e.g., U.S.-Singapore. 10. Schlag (1998) shows that the dynamics of a population with randomly matched bounded rational players changing their strategies via imitation of successful strategies observed can be approximated by replicator dynamics.