Posts with tag "evolution":
posted on: Sunday, February 17, 2013 (12:59 pm) by Chase Stevens
This is part three of a three-part series. For the first installment, see Part One (Overview). For the second, see Part Two (Design and Genome).
Genetic algorithm implementation details
Our genetic algorithm had an incredibly large search space to explore, given this, it was imperative that multiple instances of the algorithm be run in parallel. Several different populations were maintained across multiple computers, with the standard population consisting of 150 AIs. The most successful members of different populations were occasionally manually merged into a single population, so as to allow for cross-population competition. Particularly successful members were also manually archived for later inclusion in a sample pack of pre-evolved AIs (available now on StratLoc's homepage
). Overall, while many hundreds of thousands of games of StratLoc were played as part of the genetic algorithm, it was possible only to go through 40 or 50 generations of evolution per population.
Contrary to most genetic algorithms, one of the easiest parts of evolving AIs for StratLoc was the creation of the fitness function
. Given that our goal was to have AIs capable of being competent players of the game, it made sense to rank AI success based on the percentage of played games they were able to win, when pitted against other (randomly selected) AIs in the same gene pool.
After each AI had participated in at least 5 games, all AIs which had a below-average fitness rating were discarded from the population. The empty slots were re-filled by children of various successful AIs. Three different methods were used to mate AIs. The first took the first half of the genome of one parent and combined it with the second half of the second parent's genome. The second mating method was similar, but instead of splitting the genomes evenly it would take a random number of bytes from the genome of one parent, then fill in the rest of the genome using the remaining parent. The third, final mating method was a simple "zipper" operation whereby the bytes of the child genome alternated in their parental source. Mutation was performed after mating on 5% of all children. If selected for mutation, each byte of an AI's genome would have a 6.67% chance of being replaced by a new, random byte.
Issues during evolution process
As the AIs were being evolved, members of the development team would occasionally take several members of a population and begin a game using them. Several days into AI generation, we observed that, although the AIs seemed to be successfully winning games of StratLoc against each other, their play was severely uninteresting. AIs never seemed to make any cities or even build mines. Instead, the metagame of the populations had seemed to favor the evolution of zerg-rush-like strategies
. In the case of StratLoc, this meant spending the initial resources given to a player on a military unit instead of a worker (with which to cultivate an economy). The AIs would then be in a position of being unable to afford anything else, and were stuck with a single anti-air unit. However, the fast production of said unit would more often than not allow them to defeat other AIs who were "properly" playing the game. When several AIs deployed the same rush strategy, the game essentially became a matter of which AI could get their unit to opponents' undefended cities first.
That this was allowed to happen was a mixture of several failures on my part. The first was a misunderstanding of what we were really attempting to evolve: while the fitness function used by the genetic algorithm was successfully producing AIs that could win
the game, what we wanted were AIs that were interesting and challenging to play against for humans
. This is to say, while rushing may have technically been the best strategy for StratLoc, it certainly wasn't one that made for enjoyable and engrossing gameplay. Solving this involved awarding not only a single point to AIs for winning games, but also additional partial points for each city owned at the end of the game. Doing this meant that even AIs that didn't flat-out win games could still be rewarded for having built (or at least captured) a large number of cities; it also meant that, overall, games wherein more cities were produced ended up awarding higher fitness ratings to participant AIs.
The second issue highlighted by the evolution and dominance of the rush strategy was one of game balance, for which I was also responsible. To counteract the success of the rush, I adjusted the costs of units and initial starting resources for players such that it was impossible to build a military unit on the first turn. In this way, the only possible path to building military units was to first build a worker with which to develop an economy, then go from there. While this simple change would have eventually resulted in AIs which were more economically-minded during the early game, the prevalence of the rush strategy amongst the AIs would have meant that, for many generations, AIs would continue to try (unsuccessfully) to build military units at the game's start. It was therefore necessary (for the sake of time, at least) to hard-code actions for the AIs for the first few turns of the game, which would lead them to build workers and create mines. While this may have in some sense ruined the "purity" of the genetic algorithm, it ultimately made the game more interesting and playable for humans and prevented the AIs from having to re-evolve more sensible strategies for the beginning of the game.
This concludes my series on the evolution of StratLoc's AIs. The full source code of StratLoc, as well as sample AIs (and the game itself) can all be found here
. If you find yourself with any questions, or perhaps would like to talk about your own experiences evolving game AIs, please feel free to contact me
Tags: evolution, game design, genetic algorithms, programming, stratloc, video games
posted on: Sunday, January 6, 2013 (12:58 pm) by Chase Stevens
This is part two of a three-part series. For the first installment, see Part One (Overview).
As mentioned in this series' previous post, the aim of evolving AIs for StratLoc
was not to generate systems capable of manipulating individual units but, instead, to have them issue overall goals or directives for their turns in response to environmental data fed to them from the game. These directives, in turn, would be implemented by hardcoded finite state machines. An immediately obvious failure of this design was that AIs could easily issue directives that were flat-out impossible, such as attempting to attack an enemy player without possessing any units. There exist two naïve solutions to this issue, neither of which work particularly well. The first would be to ignore unattainable goals or unimplementable directives. This has the distinct disadvantage of allowing AIs to accomplish nothing during a turn, simply because they have not evolved to have goals compatible with the current state of the game. Moreover, because of the extremely large search space we were already tasking our genetic algorithm with exploring (thanks to the number of environmental variables we were feeding it, as will be discussed below), even AIs that for the most part had evolved excellent strategies could find themselves left dead in the water when presented with a novel situation. The other solution would be to simply feed in more data to the AI, so that it could adapt to any situation, and could evolve in a manner such that, if it did not have military units, it would not attempt to attack anyone. However, as mentioned, the search space we were already operating in was quite large – it would have taken several years to fully explore it with our available hardware, according to my calculations. Each additional bit of information exposed to the genetic algorithm would have increased the search space by a factor of two, as well as doubling the size of the genome (which I had already calculated as being at least a sizable 6.4 megabytes).
My solution to this was to imbue the AI's directives with decompositionality. This is to say, for any action the AI might attempt to make that had circumstances under which it would fail, the state machine charged with executing that action would, under those circumstances, defer to an alternate state machine in an attempt to remedy the circumstances. As an example, supposing in some situation the AI attempted to build a tank but had insufficient resources to do so. The task of making a tank would then decompose into the task of seizing additional resources. Should this be impossible (due, perhaps, to a lack of workers with which to build mines), the task would decompose into building workers, and so on. Even when an entire chain of actions led to a dead end my design attempted to let the AI do something by (in most circumstances) deferring to a "default action" specified by the genome. Only in extraordinary cases would the end result of an AI directive lead to nothing being done.
This section will give a general layout of the genome. If you are interested in specifics, I recommend taking a look at the official structure specification
I wrote up before StratLoc was implemented. Overall, the genome can be broken into two sections. The first contains a series of "threshold" values and takes up 26 bits in total. The threshold values encode variables referenced by the action-implementing finite state machines, such as "combat win chance tolerance", variables that are used in the generation of environmental inputs, such as the "[unit] queue length tolerance", and the aforementioned "default" AI action. The second can be imagined as a 2D array, with the first index being a 21-bit value representing environmental conditions, and the second index pointing to any of five 5-bit AI action encodings, all of which are intended to be executed during a given turn. This allows for 32 possible actions. While most actions are straightforward orders to create units, attack enemy players, build structures or fortify locations, seven actions are reserved for manipulating the internal state of the AI. This allows it to alter its combat win chance tolerance (which will result in different combat behavior), alter its turns left threshold (which is reflected in the environmental conditions), or clear the AI's unit queue. Specifics as to the execution of AI actions can be found here
, but be forewarned, these underwent some changes in the final implementation. Environmental conditions encode what occurred last round (such as being attacked or building cities), the ranking of the AI (in several regards), which military unit type was most numerous, and whether any of the AI's thresholds had been triggered. While it might seem like these sparse details would provide insufficient information to the AI, any increase in the size of the environmental conditions would have (quite literally) exponentially increased the difficulty of the genetic algorithm's task. Overall, however, the AIs seemed to do quite well with the details they were provided and, regardless, the intelligent decomposition of AI actions ensured that even totally randomly generated AIs would have been moderately successful at playing StratLoc.
In my next post
, I'll go over the implementation details of the genetic algorithm itself and detail some of the problems we ran into with the AI.Tags: evolution, game design, genetic algorithms, programming, stratloc, video games
posted on: Friday, December 28, 2012 (3:05 am) by Chase Stevens
In this series of posts, I’ll be discussing my experience designing and helping to implement AI opponents for the turn-based, open-source strategy game StratLoc. The AIs were created using a genetic algorithm which was used to crudely simulate natural selection and evolution, allowing our AIs (over the generations) to become competent players of the game. Although the use of genetic algorithms in video games is not uncommon, the technique is often applied when something needs fine-tuning, such as the numeric value of some "aggressiveness" variable, or for the optimization of well-defined subtasks with limited scope, such as situational unit movement patterns or the micromanagement of a base. Our AI, on the other hand, was tasked with providing turn-by-turn, high-level strategies and actions in response to the overall climate of the game, the execution of which was then handled by a hardcoded system.
This first introductory post will give an overview of Stratloc as a game, so as to provide some context. StratLoc is a turn-based strategy game played on a hexagonal board, a la Civilization
a great deal of complexity). The objective of the game is to eliminate your opponents by capturing their cities. The game is played on a randomly generated map featuring plains, hills, mountains and water tiles (both of which are impassable), metal tiles, and four starting cities (one for each player). From cities, players can build military units and workers; players must also control at least one city to avoid losing the game. Cities are permanent fixtures which can be captured by opposing players.
The game has a single overt resource, "metal", which can be harvested from metal tiles by building mines. Mines provide a steady stream of metal per turn inversely proportional to the distance of the mine to its owner’s nearest city. Mines, once built, cannot be destroyed, but can switch ownership if captured by another player. The player must also manage "power", which acts as a supply cost
for units. Power is provided in small quantities by cities and in bulk by power stations. Power stations are not capturable, but may be destroyed by enemies. A final building, the fortification, can be built in order to provide a defensive bonus and gain vision around a tile. Fortifications are permanent but capturable. All buildings (including cities) must be built using a "worker" unit, which is destroyed in the process.
There are three military units in the game: anti-air, tanks, and air. Anti-air units have a standard range and low base damage, but are cheap and gain a combat advantage against air units. Tanks are slightly more expensive, with a standard base damage but high range. Tanks also suffer a penalty against air. Air units have a high metal cost and high power usage, with a low HP. However, they make up for this by providing huge line of sight and high mobility. In this way, combat strategy is more or less a game of rock-paper-scissors, with air countering tanks, tanks countering anti-air, and anti-air (obviously) countering air. Units can gain a defensive bonus by being located on a city, fortification, or hill, or by accumulating a fortification bonus by remaining in the same spot for several turns.
That concludes the general overview of the game and its rules. In my next post
, I’ll discuss the design of the AI from a high-level perspective, then explore the details of the AI genome.Tags: evolution, game design, genetic algorithms, programming, stratloc, video games
posted on: Friday, November 25, 2011 (1:56 pm) by Chase Stevens
Human beings are social animals. We live and interact in societies, adopt social norms, and incorporate our society into our identity. I would even say that our sense of justice, morality, and fairness stem from the fact that we are constantly interacting with each other within (in the ancestral environment) a mostly closed community. Doing so means that we are in a constant, mutiplayer iterated prisoner's dilemma, wherein we can earn reputations which will influence our interactions with others. For those not familiar with this concept, allow me to explain: the prisoner's dilemma can be explained as a two-player game. Each of the two players has but a single option: to either cooperate or to defect (in the original scenario, there were two prisoners who could either choose to "rat out" their criminal accomplice or to remain silent). Both players cooperating produces a moderately good outcome for both, whereas both defecting produces a mutually bad outcome for both. Should one player choose to cooperate while the other chooses to defect, the cooperating player gets the worst possible outcome and the defecting player gets the best. This can be represented in the below table:
|Player 1 Cooperates||Player 1 Defects|
|Player 2 Cooperates||Player 1: 3|
Player 2: 3
|Player 1: 5|
Player 2: 0
|Player 2 Defects||Player 1: 0|
Player 2: 5
|Player 1: 1|
Player 2: 1
If two players choose to play only one round of this game, the optimal strategy is to defect, as defection will net an average of 3 points, whereas cooperation will net only 1.5. Moreover, if this game is viewed under the lens of the maximin principle (which states that you should opt for the decision that will have the best worst-case scenario
), the worst case for cooperating gets you 0 points, whereas defecting ensures you are rewarded with at least one point.
However, if two players choose to play this game for a number of rounds, the general best strategy is to behave in an altruistic but reciprocal manner (although for larger groups and more complex but similar games, other more refined strategies
come into play). This is to say, initially assume in good faith that your partner is going to cooperate and do so yourself, but if they defect, defect yourself in turn. When this is applied to a larger group, this translates into looking into the reputation of the person you have been partnered with, seeing what their record is in terms of defection and cooperation, and responding accordingly.
As has been a fairly recent focus of attention
, this poses a problem on the internet. Online, being anonymous (or at least having the ability to operate under a discardable pseudonym) is the rule, not the exception, the result of which being that you either are incapable of gaining a reputation or can shed one easily if necessary. This breaks the connection between iterations that allows for cooperative strategies in the prisoner's dilemma, essentially turning the game into a series of one-off rounds, as neither party can ever truly know whom they're interacting with. This is only compounded by the fact that human empathy grossly decreases
when we're not face-to-face with one another; inability to observe the facial expressions, verbal tone, and body language of a person you're interacting with impairs your ability to create a model of that person
and thereby your ability to empathize with them. This leads to people on the internet often making incredibly callous, distasteful and offensive statements, the likes of which they would never normally make "in real life," the obvious example of this being the infamous 4chan
. If given the opportunity, such as in the massively mutiplayer EVE Online, people will lie, cheat, steal, betray, backstab, defraud and destroy each other
with nary a second thought. However, when these people are encountered offline they are typically soft-spoken, courteous and otherwise normal
Of course, the internet isn't the only environment in which this can happen. Driving
also grants us an unusually high level of anonymity while cars themselves abstract our fellow humans into mere "drivers". This results in a casual level of rudeness that seems to avoid descending into complete sociopathic chaos only by the virtue of everyone having some degree of self-preservation.
Where else are these environmental conditions manifested? The voting booth. Although obviously a tad more complex than a prisoner's dilemma, it stands that voters quite often are put in a position where they could vote in a manner directly in accordance with their own self-interest, and not that of the community/country/what have you. Given what things people do in other circumstances when granted anonymity, it does not seem overly bold to assume that at least a certain percentage of people vote selfishly. The natural question that therefore arises (and which I'd like to address) is that of the benefits of voting records becoming public.
Before delving into that, however, we should investigate whether or not votes being made public would really change anything. It could perhaps be the case that people would continue to vote the same way regardless. However, the Bradley effect
would seem to claim otherwise. To be concise, the Bradley effect refers to a phenomenon wherein non-white candidates score higher in polls than they do once votes have actually been tallied, the presumption being that people being polled claim that they'll vote for the non-white candidate when asked by a pollster to avoid a potential accusation of racism. Historically the effect seems to have been responsible for polling discrepancies of up to 12 points. A reverse-Bradley effect has also been observed, such as in the case with former Louisiana State Representative David Duke, a Nazi sympathizer and former KKK Grand Wizard, who won his position despite few people being willing to admit to pollsters that they supported him. So indeed, it seems that, at least in some cases, people will change their votes (or at least admitted voting intentions) if the way they voted will be made public. Although perhaps the difference wouldn't be as great as observed in the above cases (more being at stake when actually voting, as opposed to telling someone how you'll vote), it would be reasonable to expect to see some change.
What is the advantage of having peoples votes be influenced by societal pressure? Well, for one, it would seem as though we might be able to avoid having KKK members as state representatives, which most people would agree is a definite boon. We could also expect to see more effects such as this, where people wouldn't vote in ways that others might find objectionable. One example might be gay marriage: those who privately oppose it could be far less willing to have their objections exposed publicly (this, of course, assumes that the legalization of gay marriage would be a "good thing"). Public voting outcomes could be as different from the anonymous voting outcomes we have now as behavior in public is from anonymous online behavior.
Of course, there are numerous objections to this course of action. What if society's standards are unconscionable? One might easily imagine an area of the country in which the pressure would be to vote for the KKK member, as opposed to against him. Could those who would otherwise oppose racism then be peer pressured into voting against their morals? Moreover, in a less extreme case, could the Bradley effect cause people to vote for non-white candidates not because of political ideology, but instead to not appear racist? Such a system would also enable a great deal of coercion to come into play, enabling unscrupulous groups or individuals to influence votes through bribery or threat. Buying votes would be a simple matter if only one could verify how someone voted. Lastly, does one not have a right to vote selfishly? It could easily be argued that the voting booth is one of the few places in which one can express their opinions and influence the world around them in an honest way, free from repercussions or persecution.
While the suggestion of having one's votes be made public might not be a likelihood, it's probably not as fanciful as we might be inclined to imagine. Although currently no one has access to how you voted, in the United States any interested party can find out whether or not you voted. In fact, in 2009 a group in Virginia called "The Know Campaign" sent out a letter to voters
informing them not only of their past voting history, but also that of their neighbors, all in an effort to use societal pressure in order to get people to vote more (the implication being that the neighbors have also received your
voting record). The campaign was directly stated by executive director Debra Girvin as being not a "shame tactic", but instead a "peer pressure tactic." Although the efficacy of the campaign hasn't (to my knowledge) been disclosed, it did generate at least 1,000 letters of complaint.Tags: anonymity, democracy, ethics, evolution, game theory, gaming, government, internet, morality, philosophy, probability, society, united states, voting