Neuroevolution

From Realpolitik.io
Jump to navigation Jump to search

Aside from game tree search, another way to explore the consequences of quantitative realism is through the artificial intelligence technique of neuroevolution. This approach computationally mimics biological evolution as a way to train neural networks. In this case, the goal would be to create networks that ingest a power structure and output an agent's optimal move. Networks are pitted against each other and those who survive procreate children, such that over the generations agents emerge with an instinct for navigating power struggles.

Pros and Cons

One attractive facet of the neuroevolutionary approach is that it reduces the number of fundamental assumptions that we must make. In particular, it eliminates Axiom 4, the assumption that agents pursue a combination of absolute and relative power and its specific mathematical formulation. Instead, agent preferences would emerge from the evolutionary struggle. This reduction of assumptions promises to yield a more elemental understanding of what quantitative realism is.

On the other hand, neuroevolution can consume a significant amount of computation, memory, and time.

Implementation

General

Challenges and considerations when pursuing the neuroevolutionary approach include:

  1. How is it decided which members of the population should be eliminated after each round?
    1. One option is Malthusian elimination, in which disasters fall disproportionately upon those who are relatively powerless. Agents are randomly killed off, with the smaller agents having a higher chance of being killed.
    2. Another option is to reward agents for not dying (i.e. make the likelihood of reproducing proportional to age).
  2. In addition to this "natural" death, agents can also use their power to kill each other by reducing another agent's power to 0.
  3. In order to avoid evolving agents who all cooperate with each other all the time, there may need to be a limit to the total power, or carrying capacity, that's available to the agents.
  4. Dead agents are replaced by randomly cross-breeding the living and then subjecting them to some level of random mutation.
  5. Ternary tactics should be used with a Hamming distance of 1.
  6. Tactics should be symmetrized and reciprocalized.

Neural network

Input. The input vector has [math]\displaystyle{ n^2 }[/math] elements, representing:

  1. The sizes of the agents. The focal agent is always in the first position, and the other agents are placed in descending order by size. This makes it easier to train the network.
  2. The tactic matrix. The tactic matrix is reordered to be consistent with the ordering of the new size vector. Then the diagonal is removed (it is redundant) and the rest of the ternary tactic matrix is flattened.

Network architecture. The network is a simple feedforward net with 5-10 hidden layers, with Tanh activation functions. The last layer uses Softmax to normalize the values so that they sum to 1.

Output. The output "policy" vector has [math]\displaystyle{ 3(n-1) }[/math] elements that indicate the probability or strength of the available moves. Moves are assumed to be ternary with a Hamming distance of 1. Because there are only [math]\displaystyle{ 2n+1 }[/math] such moves available, some of the elements of the policy vector are redundant and should return 0.



<< Prev | Next >>