ECON 159 - Lecture 22 - Repeated Games: Cheating, Punishment ...

Game Theory

ECON 159 - Lecture 22 - Repeated Games: Cheating, Punishment, and Outsourcing

Chapter 1. Repeated Interaction: The Grim Trigger Strategy in the Prisoner’s Dilemma (Continued) [00:00:00]

Professor Ben Polak: So last time we were focusing on repeated interaction and that’s what we’re going to continue with today. There’s lots of things we could study under repeated interaction but the emphasis of this week is can we attain–can we achieve–cooperation in business or personal relationships without contracts, by use of the fact that these relationships go on over time? Our central intuition, where we started from last time, was perhaps the future of a relationship can provide incentives for good behavior today, can provide incentives for people not to cheat.

So specifically let’s just think of an example. We’ll go back to where we were last time. Specifically suppose I have a business relationship, an ongoing business relationship with Jake. And each period I’m supposed to supply Jake with some inputs for his business, let’s say some fruit. And each period he’s supposed to provide me with some input for my business, namely vegetables. Clearly there are opportunities here, in each period, for us to cheat. We could cheat both on the quality of the fruit that I provide or the quantity of the fruit that I provide to Jake, and he can cheat on the quantity or quality of the vegetables that he provides to me. Our central intuition is: perhaps what can give us good incentives is the idea that if Jake cooperates today, then I might cooperate tomorrow, I might not cheat tomorrow. Conversely, if he cheats and provides me with lousy vegetables today I’m going to provide him with lousy fruit tomorrow. Similarly for me, if I provide Jake with lousy fruit today he can provide me with lousy vegetables tomorrow.

So what do we need? We need the difference in the value of the promise of good behavior tomorrow and the threat of bad behavior tomorrow to outweigh the temptation to cheat today. I’m going to gain by providing him with the bad fruit or fewer fruit today–bad fruit because those I would otherwise have to throw away. So that temptation to cheat has to be outweighed by the promise of getting good vegetables in the future from Jake and vice versa. So here’s that idea on the board. What we need is the gain if I cheat today to be outweighed by the difference between the value of my relationship with Jake after cooperating and the value of my relationship with Jake after cheating tomorrow.

Now what we discovered last time–this was an idea I think we kind of knew, we have kind of known it since the first week–but we discovered last time, somewhat surprisingly, that life is not quite so simple. In particular, what we discovered was we need these to be credible, so there’s a problem here of credibility. So in particular, if we think of the value of the relationship after cooperating tomorrow as being a promise, and the value of the relationship after cheating as being a threat, we need these promises and threats to be credible. We need to actually believe that they’re going to happen.

And one very simple area where we saw that ran immediately into problems was if this repeated relationship, although repeated, had a known end. Why did known ends cause problems for us? Because in the last period, in the last period of the game we know that whatever we promise to do or whatever we threaten to do, in the last period, once we reached that last period, in that sub-game we’re going to play a Nash equilibrium. What we do has to be consistent with our incentives in the last period. So in particular, if there’s only one Nash equilibrium in that last period, then we know in that last period that’s what we’re going to do.

So if we look at the second to last period we might hope that we could promise to cooperate, if you cooperate today, tomorrow. Or you could promise to punish tomorrow if you cheat today, but those threats won’t be credible because we know that tomorrow you’re just going to play whatever that Nash equilibrium is. That lack of credibility means there’s no scope to provide incentives today for us to cooperate and we saw things unravel backwards. So the way in which we ensure that we’re really focusing on credible promises and credible threats here is by focusing on sub-game perfect equilibrium, the idea that we introduced just before the Thanksgiving break.

We know that sub-game perfect equilibria have the property that they have Nash behavior in every sub-game, so in particular in the last period of the game and so on. So what we want to be able to do here, is try to find scope for cooperation in relationships without contracts, without side payments, by focusing on sub-game perfect equilibria of these repeated games. Right at the end last time, we said okay, let’s move away from the setting where we know our game is going to end, and let’s look at a game which continues, or at least might continue.

So in particular, we looked at the problem of the Prisoner’s Dilemma which was repeated with the probability that we called δ each period, with the probability δ of continuing. So every period we’re going to play Prisoner’s Dilemma. However, with probability 1 - δ the game might just end every period. We already noticed last time some things about this. The first thing we noticed was that we can immediately get away from this unraveling argument because there’s no known end to the game. We don’t have to worry about that thread coming loose and unraveling all the way back. So at least there’s some hope here to be able to establish credible promises and credible threats later on in the game that will induce good behavior earlier on in the game.

So that’s where we were last time, And here is the Prisoner’s Dilemma, we saw this time, and we actually focused on a particular strategy. But before I come back to this strategy that we focused on last time let’s just see some things that won’t work, just to sort of reinforce the idea. So here’s a possible strategy in the Prisoner’s Dilemma. A possible strategy in the Prisoner’s Dilemma would be cooperate now and go on cooperating regardless of what anyone does. So let’s just cooperate forever regardless of the history of the game.

Now if two players, if Jake and I are involved in this business relationship, which has the structure of a Prisoner’s Dilemma and both of us play this strategy of cooperate now and cooperate forever no matter what, clearly that will induce cooperation. That’s the good news. The problem is that isn’t an equilibrium, that’s not even a Nash equilibrium, let alone a sub-game perfect equilibrium. Why is it not a sub-game perfect equilibrium? Because in particular, if Jake is smart (and he is), Jake will look at this equilibrium and say: Ben is going to cooperate no matter what I do, so I may as well cheat, and in fact, I may as well go on cheating. So Jake has a very good deviation there which is simply to cheat forever.

So the strategy cooperate now and go on cooperating no matter what doesn’t contain incentives to support itself as an equilibrium. And we need to focus on strategies that contain subtle behavior that generates promises of rewards and threats of punishment that induce people to actually stick to that equilibrium behavior. So is everyone clear that cooperating no matter what–it sounds good–but it isn’t going to work. People aren’t going to stick with that. So instead what we focused on last time, and actually we had some players who seemed to actually–they’ve moved now–but they seemed actually to be playing this strategy.

We focused on what we called the grim trigger strategy. And the grim trigger strategy is what? It says in the first period cooperate and then go on playing cooperate as long as nobody has ever defected, nobody has ever cheated. But if anybody ever plays D, anybody ever plays the defect strategy, then we just play D forever. So this is a strategy, it tells us what to do at every possible information set. It also, if two players are playing the strategy, has the property that they will cooperate forever:, that’s good news. And what we left ourselves last time was checking that this actually is an equilibrium, or more generally, under what conditions is this actually an equilibrium.

So we got halfway through that calculation last time. So what we need to do is we need to make sure that the temptation of cheating today is less than the value of the promise minus the value of the threat tomorrow. We did parts of this already, let’s just do the easy parts. So the temptation today is: if I cheat today I get 3, whereas if I went on cooperating today I get 2. So the temptation is just 1. What’s the threat? The threat is playing D forever, so this is actually the value of (D, D) forever. You’ve got to be careful about for ever: when I say for ever, I mean until the game ends because eventually the game is going to end, but let’s use the code for ever to mean until the game ends.

What’s the promise? The promise is the value of continuing cooperation, so the value of (C,C) for ever. That’s what this bracket is, and it’s still tomorrow. So let’s go on working on this. So the value of cooperating for ever is actually–let’s be a bit more detailed–this is the value of getting 2 in every period, so it’s value of 2 for ever; and this is the value of 0 forever. So the value of 0 forever, that’s pretty easy to work out: I get 0 tomorrow, I get 0 the day after tomorrow, I get 0 the day after the day after tomorrow. Or more accurately: I get 0 tomorrow, I get 0 the day after tomorrow if we’re still playing, I get 0 the day after the day after tomorrow if we’re still playing and so on. But that isn’t a very hard calculation, this thing is going to equal 0. So this object here is just 0.

This object here is 3 - 2, I can do that one in my head, that’s 1. So I’m left with the value of getting 2 for ever, and that requires a little bit more thought. But let’s do that one bit of algebra because it’s going to be useful throughout today. So this thing here, the value of 2 for ever is what? Well I get 2, that’s tomorrow, and then, assuming I’m still playing the day after tomorrow–so I need to discount it–with probability of δ I’m still playing the day after tomorrow–and I get 2 again. And the day after the day after tomorrow I’m still playing with the probability that the game didn’t end tomorrow and didn’t end the next day so that’s with probability δ² and again I get 2. And then the day after, what is it? This is tomorrow, the day after tomorrow, the day after the day after tomorrow: this is the day after the day after the day after tomorrow which is δ³ 2 and so on.

Everyone happy with that? So starting from tomorrow, if we play (C, C) for ever, I’ll get 2 tomorrow, 2 the day after tomorrow, 2 the day after the day after tomorrow, and so on. And I just need to take an account of the fact that the game may end between tomorrow and the next day, the game may end between the day after tomorrow and the day after the day after tomorrow and so on. Everyone happy with that? So what is the value, what is thing? Let’s call this X for a second. So we’ve done this once before in the class but let’s do it again anyway.

This is the geometric sum, some of you may even remember from high school how to do a geometric sum, but let’s do it slowly. So to work out what X is what I’m going to do is I’m going to multiply X by δ, so what’s δX? So this 2 here will become a 2δ, and this δ2 here will become a δ²2, and this δ²2 will become a δ³2, and this δ³2 will become a δ42, and so on. Now what I’m going to do is I’m going to subtract the second of those lines from the first of those lines. So what I’m going to do is, I’m going to subtract X–δX. So I’m going to subtract the second line from the first line. And when I do that I’m going to notice I hope that this 2δ is going to cancel with this 2δ, and this δ²2 is going to cancel with this δ²2, and this δ³2 is going to cancel with this δ³2 and so on.

So what I’m going to get left with is what? Everything’s going to cancel except for what? Except for that first 2 there, so this is just equal to 2. Now this is a calculation I can do. So I’ve got X = 2 / [1-δ]. So just to summarize the algebra, getting 2 forever, that means 2 + δ2 + δ²2 + δ³2 etc.. The value of that object is 2/[1-δ]. So we can put that in here as well. This object here 2/[1-δ] is the value of 2 forever. Now before I go onto a new board I want to do one other thing. On the left hand side I’ve got my temptation, that was 1, I’ve got the value of cooperating forever starting from tomorrow which is 2/[1-δ] and I’ve got the value of defecting forever starting from tomorrow which is 0.

However, all of these objects on the right hand side, they start tomorrow, whereas, the temptation today is today. Temptation today happens today. These differences in value start tomorrow. Since they start tomorrow I need to discount them because we don’t know that tomorrow is going to happen. The world may end, or more importantly the relationship may end, between today and tomorrow. So how much do I have to weight them by? By δ, I need to multiply all of these lines by δ and so on. Now this is now a mess so let’s go to a new board. Now let’s summarize what we now have,

What we’re doing here is asking is it the case that if people play the grim trigger strategy that that is in fact an equilibrium? That is a way of sustaining cooperation. The answer is we need 1, that’s our temptation, to be less than 2/[1-δ], that’s the value of cooperating for ever starting from tomorrow, minus 0, that’s the value of defecting forever starting tomorrow, and this whole thing is multiplied by δ because tomorrow may not happen. Everyone happy with that so far? I’m just kind of collecting up the terms that we did slowly just now.

So now what I want to do is–question mark here because we don’t know whether it is–I’m going to solve this for δ. So when I solve this for δ I’ll probably get it wrong, but let’s be careful. So this is equivalent to saying 1-δ < 2δ and it’s also equivalent to saying therefore that δ > = 1/3. Everyone happy with that? Let me just turn my own page. So what have we shown so far? We’ve shown that if we’re playing the grim trigger strategy, and we want to deter people from doing what? From defecting from this strategy in the very first period, then we’re okay provided δ is bigger than 1/3.

But at this point some of you could say, yeah but that’s just one of the possible ways I could defect from this strategy. After all, the defection we just considered, the move away from equilibrium we just considered was what? We considered my cheating today, but thereafter, I reversed it back to doing what I was supposed to do: I went along with playing D thereafter. So the particular defection we looked at just now was in Period 1, I’m going to defect, but thereafter, I’m actually going to do what the equilibrium strategy tells me to do. I’m going to go along with the punishment and play my part of (D,D) forever.

So you might want to ask, why would I do that? Why would I go along? I cheated the first time but now I’m doing what the strategy tells me to do. It tells me to play D. Why am I going along with that? You could consider going away from the equilibrium by defecting, for example in Period 1, and then in Period 2 do something completely different like cooperating. So we might want to worry, how about playing D now and then C in the next period, and then D forever. That’s just some other way of defecting. So far we’ve said I’m going to defect by playing D and then playing D forever, but now I’m saying let’s play D now and then play a period of C and then D forever.

Is that going to be a profitable deviation? Well let’s see what I’d get if I do that particular deviation. What play is that going to induce? Remember the other player is playing equilibrium, so that player is going to induce, in the first period, I’m playing D and Jake’s playing C. In the second period Jake’s going to start punishing me, so he’s going to play D and according to this deviation I’m going to play C. So in the second period I’ll play C and Jake will play D, and in the third period and thereafter, we’ll just play D, D, D, D, D, D. So these are just some other deviation other than the one we looked at. So what payoff do I get from this? Okay, I get three in the first period, just as I did for my original defection, that’s good news. But now in the second period discounted, I actually get -1, I’m actually doing even worse in the second period because I’m cooperating while Jake’s defecting, and then in the third period I get 0 and in the fourth period I get 0 and so on.

So the total payoff to this defection is 3 - δ. Now, that’s even worse than the defection we considered to start with. The defection we considered to start with, I got 3 in the first period and thereafter I got 0. Now I got 3 in the first period, -1 in the second period, and then 0 thereafter. So this defection in which I defect–this move away from equilibrium–in which I cheat in the first period and then don’t go along with the punishment, I don’t in fact play D forever is even worse. Is that right? It’s even worse. So what’s the lesson here? The lesson here is the reason that I’m prepared to go along with my own punishment and play D forever after a defection is what? It’s if Jake is going to play D forever I may as well play D forever. Is that right? So another way of saying this is the only way which I could possibly hope to have a profitable deviation, given that Jake’s going to revert to playing D forever is for me to defect on Jake once and then go along with playing D forever. There’s no point once he’s playing D, there’s no point me doing anything else, so this is worse, this is even worse. This defection is even worse.

More generally, the reason this is even worse is because the punishment we looked at before, which was (D, D) for ever, the punishment (D,D) forever is itself an equilibrium. It’s credible because it’s itself an equilibrium. So unlike in the finitely repeated games we did last time, unlike in the two period or the five period repeated games, here the punishment really is a credible punishment, because what I’m doing in the punishment phase is playing an equilibrium. There’s no point considering any other deviation other than playing D once and then just going on playing D. So that’s one other possible deviation, but there are others you might want to consider.

So far all we’ve considered is what? We’ve considered the deviation where I, in the very first period, I cheat on Jake and then I just play D forever. But what about the second period? Another thing I could do is how about cheating not in the first period of the game but in the second. So according to this strategy what am I going to do. The first period of the game I’ll go along with Jake and cooperate, but in the second period I’ll cheat on him. Now how am I going to check whether that’s a good deviation or not? How do I know that’s not going to be a good deviation?

Well we already know that I’m not going to want to cheat in the first period of the game. I want to argue that exactly the same analysis tells me I’m not going to want to cheat in the second period of the game. Why? Because once we reach the second period of the game, it is the first period of the game. Once we reach the second period of the game, looking from period two onwards, it’s exactly the same as it was when we looked from period one initially. So to say it again, what we argued before was–on the board that I’ve now covered up–what we argued before was, I’m not going to want to cheat in the very first period of the game provided δ > 1/3. I want to claim that that same argument tells me I’m not going to want to cheat in the second period of the game provided δ > 1/3. I’m not going to want to cheat in the fifth period of the game provided δ > 1/3. Because this game from the fifth period on, or the five hundredth period on, or the thousandth period on looks exactly the same as is it does from the beginning.

So what’s neat about this argument is the same analysis says, this is not profitable if δ > 1/3. So what have we learned here? I want to show you some nerdy lessons and then some actual sort of real world lessons. Let’s start with the nerdy lessons. The nerdy lesson is this grim strategy works because both–let’s put it up again so we can actually see it–this grim strategy, it works because both the play that it suggests if we both cooperate and the play that it suggests if we both defect are themselves equilibria. These are credible threats and credible promises because what you end up doing both in the promise and in the threat is itself equilibrium behavior. That’s good.

The second thing we’ve learned, however, is for this to work we need δ > 1/3, we need the probability continuation to be bigger than 1/3. So leaving aside the nerdy stuff for a second–you have more practice on the nerdy stuff on the homework assignment–the lesson is we can get cooperation in the Prisoner’s Dilemma using the grim trigger. Remember the grim trigger strategy is cooperate until someone defects and then defect forever. So you get cooperation in the Prisoner’s Dilemma using the grim trigger as a sub-game perfect equilibrium. So this is an equilibrium strategy, that’s good news, provided the probability of continuation is bigger than 1/3.

Chapter 2. The Grim Trigger Strategy: Generalization and Real World Examples [00:29:21]

Let’s try and generalize that lesson away from the Prisoner’s Dilemma. So last time our lesson was about what in general could we hope for in ongoing relationships? So let’s put down a more general lesson that refines what we learned last time. So the more general lesson is, in an ongoing relationship–let me mimic exactly the words I used last time–so for an ongoing relationship to provide incentives for good behavior today, it helps–what we wrote last time was–it helps for that relationship to have a future. But now we can refine this, it helps for there to be a high probability that the relationship will continue.

So the specific lesson for Prisoner’s Dilemma and the grim trigger strategy is we need δ, the probability continuation, to be bigger than 1/3. But the more general intuition is, if we want my ongoing business relationship with me and Jake to generate good behavior–so I’m going to provide him with good fruit and he’s going to provide me with good vegetables–we need the probability that that relationship will continue to be reasonably high. I claim this is a very natural intuition. Why? Because the probability that the relationship will continue is the weight that you put on the future. The probability that the relationship will continue, this thing, this is the weight you put on the future. The more weight I put on the future, the easier it is for the future to give me incentives to behave well today, the easier it is for those to overcome the temptations to cheat today.

That seems like a much more general lesson than just the Prisoner’s Dilemma example. Let’s try to push this to some examples and see if it rings true. So the lesson we’ve got here is to get cooperation in these relationships we need there to be a high probability, a reasonably high probability that they’re going to continue. We know exactly what that is for Prisoner’s Dilemma but the lesson seems more general. So here’s two examples.

How many of you are seniors? One or two, quite a few are seniors. Keep your hands up a second. All of those of you who are seniors–we can pan these guys. Let’s have a look at them. Actually, why don’t we get all the seniors to stand up: make you work a bit here. Now the tricky question, the tricky personal question. How many of you who are seniors are currently involved in personal relationships, you know: have a significant other? Stay standing up if you have a significant other. Look at this, it’s pathetic. What have I been saying about economic majors? All right, so let’s just think about, stay standing a second, let’s get these guys to think about it a second. So seniors who are involved in ongoing relationships with significant others, what do we have to worry about those seniors?

Well these seniors are about to depart from the beautiful confines of New Haven and they’re going to take jobs in different parts of the world. And the problem is some of them are going to take jobs in New York while their significant other takes a job in San Francisco or Baghdad or whatever, let’s hope not Baghdad, London shall we say. Now if it’s the case that you are going to take a job in New York next year and your significant other is going to take a job in Baghdad or London, or anyway far away, in reality, being cynical a little bit, what does that do to the probability that your relationship is going to last? It makes it go down. It lowers the probability that your relationship’s going to continue.

ECON 159 - Lecture 22 - Repeated Games: Cheating, Punishment ...

So what is the prediction–let’s be mean here. These are the people with significant others who are seniors, how many of you are going to be separated by a long distance from your significant others next period? Well one of them at the back, okay one guy, at the back, two guys, honesty here, three, four of you right? So what’s our prediction here? What does this model predict as a social science experiment. What does it predict? It predicts that for those of you who just raised your hands, those seniors who just raised their hands who are about to be separated by large distances, those relationships, each player in that relationship is going to have a lower value on the future. So during the rest of your senior year, during the spring of your senior year what’s the prediction of this model? They’re going to cheat.

So we could actually do a controlled experiment, what we should do here is we should keep track of the people here, the seniors who are going to be separated–you can sit down now, I’m sorry to embarrass you all. We could keep track of those seniors who are about to be separated and go into a long distance relationships, and those that are not. The people who are not are our control group. And we should see if during the spring semester the people who are going to be separated cheat more often than the others. So it’s a very clear prediction of the model that’s relevant to some of your lives.

Let me give you another example that’s less exciting perhaps, but same sort of thing. Consider the relationship that I have with my garage mechanic. I should stress this is not a significant other relationship. So I have a garage mechanic in New Haven, and that garage mechanic fixes my car. And we have an ongoing business relationship. He knows that whenever my car needs fixing, even if it’s just a small thing like an oil change, I’m going to go to him and have him fix it, even though it might be cheaper for me to go to Jiffy Lube or something. So I’m going to take my car to him to be fixed, and he’s going to make some money off me on even the easy things. What do I want in return for that? I want him to be honest and if all I need is an oil change I want him to tell me that, and if what I actually need is a new engine, he tells me I need new engine.

So my cooperating with him, is always going to him, even if it’s something simple; and his cooperating with me, is his not cheating on fixing the car. He knows more about the car than I do. But now what happens if he knows either that I’m about to leave town (which is the example we just did), or, more realistically, he kind of knows that my car is a lemon and I’m about to get rid of it anyway. Once I get a new car I’m not going to go to him anymore because I have to go to the dealer to keep the warranty intact. So he knows that my car is about to break down anyway, and he knows that I know that the car is about to break anyway, so my lemon of a car is about to be passed on–probably to one of my graduate students–then what’s going to happen? So I’m going to have an incentive to cheat because I’m going to start taking my useless car to Jiffy Lube for the oil changes. And he’s going to have an incentive to cheat. He’s going to start telling me you know you really need a new engine or a new clutch–it’s a manual so I have a clutch: it’s a real car–so I’m going to need a new clutch rather than just tightening up a bolt.

So once again the probability of the continuation of the relationship, as it changes, it leads to incentives to cheat. It leads to that relationship breaking down. That’s the content, that’s the real world content of the math we just did.

Chapter 3. Cooperation in Repeated Interactions: The “One Period Punishment” Strategy [00:37:56]

Let’s try and push this a little further. Now what we’ve shown is that the grim trigger works provided δ > 1/3, and δ being bigger than 1/3 doesn’t seem like a very large continuation probability. So just having a probability of 1/3 that the relationship continues allows the grim trigger to work, so that seems good news for the grim trigger. However, in reality, in the real world, the grim trigger might have some disadvantages.

So let’s just think about what the grim trigger is telling us in the real world. It’s telling us that if even one of us cheats just a little bit–I just provide one item of rotten fruit to Jake or he gives me one too few branches of asparagus in his provisions to me–then we never do business with each other again ever. It’s completely the end. We just never cooperate again. That seems a little bit drastic. It’s a little bit draconian if you like. So in particular, in the real world, there’s a complication here, in the real world every now and then one of us going “to cheat” by accident. That day that I didn’t have my glasses on and I put in a rotten apple in the apples I supplied to Jake. In the fruit, he was counting out the asparagus and he lost count at 1,405 and he gave me one too few.

So we might want to worry about the fact that the grim trigger, it’s triggered by any amount of cheating and it’s very drastic: it says we never do business again. The grim trigger is the analog of the death penalty. It’s the business analog of the death penalty. It’s not that I’m going to kill Jake if he gives me one too few branches of asparagus, but I’m going to kill the relationship. For you seniors or otherwise, who are involved in personal relationships, it’s the equivalent of saying, if you even see your partner looking at someone else, let alone sitting next to them in the class, the relationship is over. It seems drastic.

So we might be interested because mistakes happen, because misperceptions happen, we might be interested in using punishments that are less draconian than the grim trigger, less draconian than the death penalty. Is that right? So what I want to do is I want to consider a different strategy, a strategy other than the grim trigger strategy, and see if that could work. So where shall I start? Let’s start here, so again what I’m going to revert to is the math and the nerdiness of our analysis of the Prisoner’s Dilemma but I want you to have in mind business relationships, your own personal relationships, your friendships and so on. More or less everything you do in life involves repeated interaction, so have that in the back of your mind, but let’s be nerdy now.

So what I want to consider is a one period punishment. So how are we going to write down a strategy that has cooperation but a one period punishment. So here’s the strategy. It says–it’s kind of weird thing but it works–play C to start and then play C if–this is going to seem weird but trust me for a second–play C if either (C,C) or (D,D) were played last. So, if in the previous period either both people cooperated or both people defected, then we’ll play cooperation this period. And play D otherwise: play D if either (C, D) or (D, C) were played last.

Let’s just think about this strategy for a second. What does that strategy mean? So provided people start off cooperating and they go on cooperating–if both Jake and I play this strategy–in fact, we’ll cooperate forever. Is that right? So I claim this is a one period punishment strategy. Let’s just see how that works. So suppose Jake and I are playing this strategy. We’re supposed to play C every period. And suppose deliberately or otherwise, I play D. So now in that period in which I play D, the strategys played were D by me and C by Jake. So next period what does this strategy tell us both to play? So it was D by me and C by Jake, so this strategy tells us to play D. So next period both of us will play D. So both of us will be uncooperative precisely for that period, that next period.

Now, what about the period after that? The period after that, Jake will have played D, I will have played D. So this is what will have happened: we both played D, and now it tells us to cooperate again. Everyone happy with that? So this strategy I’ve written down–it seems kind of cumbersome–but what it actually induces is exactly a one period punishment. If Jake is the only cheat then we both defect for one period and go back to cooperation. If I’m the only person who cheats then we both defect for one period and go back to cooperation. It’s a one period punishment strategy. Of course the question is, the question you should be asking is, is this going to work? Is this an equilibrium?

So let’s just check. Is this an SPE. Is it an equilibrium? So what do we need to check? We need to check, as usual, that the temptation is less than or equal to the value of the promise–the value of the promise of continuing in cooperation–the value of the promise minus the value of the threat. And once again we have to be careful, because the temptation occurs today and this difference between values occurs tomorrow. Is that right? So this is nothing new, this is what we’ve always written down, this is what we have to check.

So the temptation for me to cheat today, that’s the same as it was before, it’s 3 - 2. The fact that it’s tomorrow is going to give me a δ here. Here’s our square bracket. So what’s the value of the promise? So provided we both go on cooperating, we’re going to go on cooperating forever, in which case we’re going to get 2 for ever. Is that right? So this is going to be the value of 2 forever starting tomorrow (and again for ever means until the game ends). The value of the threat is what? Be a bit careful now. It’s the value of–so what’s going to happen? If I cheat then tomorrow we’re both going to cheat, so tomorrow, what am I going to get tomorrow? 0. So it’s the value of 0 tomorrow: we’re both going to cheat, we’re both going to play D. And then the next period what’s going to happen?

We’re going to play C again, and from thereon we’re going to go on playing C. So it’s going to the value of 0 tomorrow and then 2 forever starting the next day. That’s what we have to evaluate. So 3 - 2, I can do that one again, that’s 1. So what’s the value of 2 forever, well we did that already today, what was it? It’s in your notes. Actually it’s on the board, it’s the X up there, what is it? Here it is, 2 for ever: we figured out the value of it before and it was 2/[1–δ]. So the value of 2 forever is going to be 2/[1–δ]. How about the value of 0? So starting for tomorrow I’m going to get 0 and then with one period delay I’m going to get 2 for ever. Well 2 forever, we know what the value of that is, it’s 2/[1–δ], but now I get it with one period delay, so what do I have to multiply it by? By δ good.

So the value of 0 tomorrow and then 2 forever starting the next day is δ x 2/[1–δ]. And here’s the δ coming from here which just takes into account that all this analysis is starting tomorrow. So to summarize, this is my temptation today. This is what I’ll get starting tomorrow if I’m a good boy and cooperate. And this is the value of what I’ll get if I cheat today. Starting tomorrow I’ll get nothing, and then I’ll revert back to cooperation. And since all of these values in this square bracket start tomorrow I’ve discounted them by δ. Now this requires some math so bear with me while I probably get some algebra wrong–and please can I get the T.A.’s to stare at me a second because I’ll probably get this wrong. Okay so what I’m going to do is, I’m going to look at my notes, I’m going to cheat, that’s what I’m going to do.

Okay, so what I’m going to do is I’m going to have 1 is less than or equal to, I’m going to take a common factor of 2 / [1–δ] and δ, so I’m going to have 2δ/[1–δ], and that’s going to leave inside the square brackets: this is a 1 and this is a δ. So this δ here was that δ there, and then I took out a common factor of 2/[1–δ] from this bracket. Everyone okay with the algebra? Just algebra, nothing fancy going on there. So that’s good because now the 1-δ cancels, this cancels with this, so this tells us we’re okay provided 1/2 <= δ: it went up. So don’t worry too much about the algebra, trust me on the algebra a second, let’s just worry about the conclusion.

What’s the conclusion? The conclusion is that this one period punishment is an SPE, it will be enough, one period of punishment will be enough to sustain cooperation in my Prisoner’s Dilemma repeated business relationship with Jake, or in the seniors’ relationships with their significant others, provided δ > 1/2. What did δ need to be for the grim strategy? 1/3, so what have we learned here? We learned–nerdily–what we learned was that for the grim strategy we needed δ > 1/3. For the one period punishment we needed δ > 1/2, but what’s the more general lesson? The more general lesson is, if you use a softer punishment, a less draconian punishment, for that to work we’re going to need a higher δ. Is that right?

So what we’re learning here is there’s a trade off, there’s a trade off in incentives. And the trade off is if you use a shorter punishment, a less draconian punishment–instead of cutting people’s hands off or killing them, or never dealing with them again, you just don’t deal with them for one period–that’s okay provided there’s a slightly higher probability of the relationship continuing. So shorter punishments are okay but they need–the implication sign isn’t really necessary there–they need more weight δ on the future.

I claim that’s very intuitive. What its saying is, we’re always trading things off in the incentives. We’re trading off the ability to cheat and get some cookies today versus waiting and, we hope, getting cookies tomorrow. So if, in fact, the difference between the reward and the punishment isn’t such a big deal, isn’t so big–the punishment is just, I’m going to give you one fewer cookies tomorrow–then you better be pretty patient not to go for the cookies today. I was about to say, those of you who have children. I’m probably the only person in the room with children. That cookie example will resonate for the rest of you–wait until you get there–you’ll discover that, in fact, cookies are the right example. So shorter punishment, less draconian punishments, less reduction in your kid’s cookie rations tomorrow is only going to work, is only going to sustain good behavior provided those kids put a high weight on tomorrow. In that case, it isn’t that the kids will worry about the relationship breaking down, you’re stuck with your kids, it’s just that they’re impatient.

Chapter 4. Cooperation in Repeated Interactions: Repeated Moral Hazard [00:53:09]

Okay, so we’ve been doing a lot of formal stuff here and I want to go on doing formal stuff, but what I want to do now is spend the rest of today looking at an application. An application is, I hope going to convince you that repeated interaction really matters. So this is assuming that the one about the seniors and their boyfriends and girlfriends wasn’t enough. Okay, so the application is going to take us back a little bit because what I want to talk about is repeated moral hazard. Moral hazard is something we discussed the first class after the mid-term.

So what I want you to imagine is that you are running a business in the U.S. and you are considering making an investment in an emerging market, and again, so as not to offend anybody who watches this on the video, let’s just call that emerging market Freedonia, rather than give it a name like Kazakhstan, a name like something other than Freedonia. So Freedonia, for those of you who don’t know, is a republic in a Marx Brothers film. So you’re thinking of outsourcing some production of part of what your business is to Freedonia. The reason you’re thinking of doing this outsourcing, what makes it attractive is that wages are low in Freedonia. So you get this outsourced in Freedonia. You think you’re going to get it done cheaply.

The down side is because Freedonia is an emerging market, the court system, it doesn’t operate very well. And in particular, it’s going to be pretty hard to enforce contracts and to jail people and so on in Freedonia. So you’re considering outsourcing. The plus is, from your point of view, the plus is wages are cheap where you’re going to get this production done. The down side is it’s going to be hard to enforce contracts because this is an emerging market. So what you’re considering doing is employing an agent and you’re going to pay that agent W, so W is the wage if you employ them. I’ll put this up in a tree in a second.

Let’s assume that the “going wage” in Freedonia is 1: we’ll just normalize it. So the going wage in Freedonia is 1, and let’s assume that to get this outsourcing to work you’re going to have to send some resources to your agent, your employee in Freedonia. And let’s assume that the amount you’re going to have to send over there is equivalent to another 1. So the going wage in Freedonia is 1 and the amount you’re going to have to invest in giving this agent materials or machinery is another 1. Let’s assume that this project is a pretty profitable project. So if the project succeeds, if the project goes ahead and succeeds, it’s going to generate a gross revenue of 4. Of course you have to invest 1 so that’s a net revenue of 3 for you, but nonetheless there’s a big potential return here.

The bad news is that your agent in Freedonia can cheat on you. In particular, what he can do is he can simply take the 1 that you’ve sent to him, sell those materials on the market and then go away and just work in his normal job anyway. So he can get his normal wage of 1 for just going and doing his normal job, whatever that was, and he can steal the resources from you. So let’s put this up as a kind of tree. This is a slight cheat, this tree, but we’ll see why in a second. So your decision is to invest and set W. So if you invest in Freedonia, you’ll invest and set W, set the wage you’re going to pay him. The going wage is 1 but you can set a different wage or you could just not invest.

If you don’t invest you get nothing and your agent in Freedonia just gets the going wage of 1. If you do invest in Freedonia and set a wage of W, then your agent has a choice. Either he can be honest or he can cheat. If he cheats, what’s going to happen to you? You had to invest 1 in sending it over there, you’re going to get nothing back, so you’ll get -1. And he will go away and work his normal job and get 1, and, in addition, he’ll sell your materials so he’ll get a total of 1 + 1 is? 2, thank you. So he’ll get a total of 2. On the other hand, if he’s honest, then you’re going to get a return of 4 minus the 1 you had to invest minus whatever wage you paid to him. So your return will be 3 minus the wage you pay him. You’re only going to pay him once the job’s done, 3 - W, and he’s going to get W. He’s done his job–he hasn’t exercised his outside option, he hasn’t sold your materials–so he’ll just get W.

Now, I’m slightly cheating here because this isn’t really the way the tree looks because I could choose different levels of W. So this upper branch where I invest and set W is actually a continuum of such branches, one for each possible W, I could set. But for the purpose of today this is enough. This gives us what we needed to see. So let’s imagine that this is a one shot investment. What I want to learn is in this one shot investment, I invest in Freedonia. I hire my agent once, what I want to learn is how much do I have to pay that agent to actually get the job done? Remember the starting position. The starting position is it looks very attractive. It looks very attractive because the returns on this project are 4 or 4 - 1, so that the surplus available on this project is 3 minus the wage, and the going wage was just 1. So it looks like there’s lots of profit around to make this outsourcing profitable.

I mumbled that so let me try it again. So the reason this looks attractive is the going wage is just 1, so if I just pay him 1 and he does the project then I’ll get a gross return of 4 minus the 1 I invested minus the 1 that I had to pay him for a net return of 2. It seems like that’s a 100% profitable project, so it looks very attractive. What’s the problem? The problem is if I only set–this is going to give us backward induction–if I set the wage equal to the going wage, so if I set W = 1 what will my agent do? He’s going to cheat. The problem is if I set W = 1, which is the going wage, the going wage in Freedonia, the agent will cheat. If he cheats I just lose my investment.

So how much do I have to set the W to? Let’s look at this. So we have to set W. What I need is I need his wage to be big enough so that being honest and going on with my projectoutweighs his incentive to cheat. I need W to be bigger than 2. Is that right? I need W to be at least as big as 2. So in setting the wage, in equilibrium, what are we going to do? I’m going to set a wage, let’s call it W* = 2 (plus a penny), is that right? So this is an exercise which we visited the first day after the mid-term. This is about incentive design. In this one shot game, which we can easily solve by backward induction, I’m going to need to set a wage equal to 2, and then he’ll work.

So in a minute, we’re going to look at the repeated version of this, but before we do let’s just sum up where we are so far. What is this telling us? It’s telling us that when you invest in an emerging market, where the courts don’t work so they aren’t going to be able to enforce this guy to work well–in particular, he can run off with your investment–even though wages are low, so it seems very attractive to do outsourcing, if you worry about getting incentives right you’re going to have pay an enormous wage premium to get the guy to work. So the going wage in Freedonia was 1, but you had to set a wage equal to 2, a 100% wage premium, to get the guy to work. So the wage premium in this emerging market is 100%, you’re paying 2 even though the going wage is 1.

By the way, this is not an unreasonable prediction. If you look at the wages payed by European and American companies in some of these emerging markets, which have very, very low going wages, and if you look at the wages that are actually being paid by the companies that are doing outsourcing you see enormous wage premiums. You see enormous premiums over and above the going wage. Now what I want to do is I want to revisit exactly the same situation, but now we’re going to introduce the wrinkle of the day. What’s the wrinkle of the day? The wrinkle of the day is you’re not only going to invest in Freedonia today, but if things go well you’ll invest tomorrow, and if things go well again you’ll invest the day after at least with some significant probability.

So the wage premium we just calculated was the one shot wage premium. It was getting this job–this single one shot job–outsourced to Freedonia. Now I want to consider how much you’re going to have to pay, what are wages going to be in Freedonia in the foreign investment sector, if instead of just having a one shot, one job investment, you’re investing for the long term. You’re going to be in Freedonia for a while. So consider repeated interaction with probability δ of continuing. So we don’t know that you’re going to go on in Freedonia. Things might break down in Freedonia because there’s a coup. It might break down in Freedonia because the American administration says you’re not allowed to do outsourcing anymore. All sorts of things might happen, but with some probability δ the relationship is going to continue. So repeated interaction with probability of δ.

Let’s redo the exercise we did before to see what wage you’ll have to charge. Our question is what wage–let’s call it W**–what wage will you pay? The way we’re going to solve this, is exactly using the methods we’ve learned in this class. So what we’re going to compare is the temptation to cheat today–and we better make sure that that’s less than δ times the value of continuing the relationship minus the value of ending the relationship. Let’s call this tomorrow.

So what’s happening now is, once again, I’m employing my agent in Freedonia, and provided he does a good job, I’ll employ him again tomorrow, at least with probability δ. But if he doesn’t do a good job, if he runs off with my investment and doesn’t do my job, what am I going to do? What would you do? You’d fire him. So the punishment–it’s clear what the punishment’s going to be here–the punishment is, if he doesn’t do a good job, you fire him. The value of ending the relationship. This is firing and this is continuing. So let’s just work out what these things are. So his temptation to cheat today: if he cheats today, he doesn’t get my wage. But he does run off with my cash, and he does go and do his job at the going wage. So if he cheats today he gets 2, he stole all my cash, and he’s going off and working at the going wage, but he doesn’t get what I would have paid him W** if the job was well done. We need this to be less than the value of continuing the relationship.

Let’s do the easy bit first. What does he get if we end the relationship? He’s been fired, so he’ll just work at the going wage for ever. So this is the value of 1 for ever, or at least until the end of the world. This is the value of what? As long as he stayed employed by me what’s he going to get paid every period? What’s he going to get paid? W**. So the value of W** for ever. Let me cheat a little bit and assume that the probability of some coup happening that ends our relationship exogenously is the same probability of the coup happening and ending his ongoing wage exogenously, so we can use the same δ.

So let’s just do some math here, what’s the value of W** forever? So remember the value of 2 forever was what? 2/[1-δ]. So what’s the value of W** forever? So this is going to be W**/[1-δ]. What’s the value of 1 forever? 1/[1-δ]. The whole thing is multiplied by δ and this is 2-W**. Now I need to do some algebra to solve for W**. So let’s try and do that. So I claim that this is the same as [1–δ] 2–[1–δ] W** < W**δ - δ 1. Everyone okay with that? One more line: let me just sort out some terms here. So taking these on the other side, I have [1–δ] 2 + δ1 <= W**δ + [1–δ] W** = W**. So someone should just check my algebra at home, but I think that’s right. So the last two steps were just algebra, nothing fancy.

What have we learned? We have learned that the wage I have to pay this guy, the wage I have to pay him lies somewhere between 2 and 1, but we can do a bit better than that. Let’s just delete everything here. So in particular, if δ = 0, what’s W**? If δ = 0, W** is equal to what? Somebody? Equal to 2 and that’s what we had before. In the one shot game, there it is up there, where there was no possibility of continuing the relationship tomorrow, I had to pay him a wage of 2, or if you like, a wage premium of 100%. If there’s no probability–if there’s no chance of continuing this relationship, if δ = 0–we find again that I’m paying 100% wage premium.

Let’s take the other extreme. If δ = 1, so I just know this relationship’s going to continue–if δ = 1, so there’s no probability of the world ending or there being a coup–then what’s W**? It’s equal to 1. What’s that? What’s 1? It’s the going wage. So this is the going wage. If I know for sure we’re going to continue forever I can get away with paying the guy the going wage, at least in the limit. If we know we’re not going to continue then I have to play the one shot wage.

But let’s look at a more interesting intermediate case. Suppose δ = ½. There’s just a 1/2 probability–that’s pretty low–there’s 1/2 probability that your company, American Widgets, is going to stay in Freedonia: with probability 1/2 it’s going to be done next period, with probability 1/2 it’s going to stay. What does that do to the wage? What happens to the wage in this case in which there’s a probability of 1/2 of American Widgets staying in Freedonia? It’s a 1/2 between 2 and 1, which is therefore one and a half½. Or another way of saying that is, the wage premium is now only 50%.

Chapter 5. Cooperation in Repeated Interactions: Conclusions [01:13:53]

What have we learned from this example? Just an example of using repeated games. Well the first thing we’ve learned is it’s going to be easy, once we get used to it, it’s easy to use this technology of comparing temptations to cheat, with values of continuing in a cooperative relationship versus the value of the punishment, which is in this case was just firing the guy. But more specifically in this example we’ve learned that even a relatively small probability of this relationship continuing–so this is good news for those of you who are seniors and are about to move to San Francisco and your significant other is going to London–even a small probability of the relationship continuing drastically reduces the wage premium. The amount you have to “pay” your significant other not to cheat on you as they go off to London or San Francisco is drastically lower if there’s some probability, in this case just a ½, of continuing.

Before you leave, one more thought okay. So how did this all work? Just to summarize, to get good behavior in these continuing relationships there has to be some reward tomorrow. That reward needs to be higher, if the weight you put on tomorrow, if the probability of continuing tomorrow, is lower. The less likely tomorrow is to occur the bigger that reward has to be tomorrow. We’re going to have to charge wage premia to employ people in Freedonia but those premiums will come down once we realize that we’re in established relationships in Freedonia–once the American firms are established and not fly by night operations in Freedonia. Whether that’s good news or bad news for Freedonia we’ll leave there. On Monday, totally new topic.

[end of transcript]

ECON 159 - Lecture 22 - Repeated Games: Cheating, Punishment ...