Artificial Intelligence: Learning Machines

Vincent Johnson

"HAL: ‘I’m sorry Frank, I think you missed it: Queen to Bishop three, Bishop takes Queen, Knight takes Bishop. Mate.’

Frank: ‘Uh, huh. Yeah, looks like you’re right. I resign.’

HAL: ‘Thank you for a very enjoyable game.’

Frank: ‘Yeah. Thank You.’"

-From the movie ‘2001: A Space Odyssey’

Introduction:

As a child, the thought of artificial intelligence meant the capability of a machine (or computer) to think like a human would. I was biased by a world of science fiction which gave the typical computer the ability to think is foreseen as both humanity’s greatest achievement and often our greatest downfall. Science fiction brought us images, such as HAL from the film 2001, which gave people this notion that a computer could have a personality and make human decisions, as well as play a few games of chess. While not originally intended to play games, the computer is nonetheless considered a major source of gaming entertainment, from simple games of solitaire to complex 3-dimensional worlds, and all requiring some level of computer intelligence.

The Learning Machine and Games:

Often science has sought to bridge the gap between science fiction and science fact. The idea that computers, like HAL, could be capable of making decisions has driven scientists to make attempts at developing computer systems that mimic human thought. On May 3rd, 1997, IBM’s chess playing supercomputer, Deep Blue, demonstrated that we might have significantly come closer to bringing intelligent computers together with human intelligence when it tied Garry Kasparov, the then World Chess Champion [1]. But we are only part way toward the goal of giving a computer, or machine, the ability to think. Deep Blue’s decision-making is based on a fixed database that was changed by hand by its developers in between each game [2]. There is, however, a problem in Deep Blue’s game playing ability that stems from only being as good as the data it is given. Give it a inferior database or even no database and it will always make bad decisions and play poorly.

There has been significant effort in researching and implementing systems where the computer learns and develops its own skills. These learning machines are the next step beyond Deep Blue. Systems such as SAL and the Morph Project are discussed in this paper, however, they do not cover the extent of the research that has been done on the topic. The goal is to create a system that can both think effectively and is always learning to do better. In his paper "General Game-Playing and Reinforcement Learning," Robert Levinson identifies a series of twelve guiding principles for developing machine learning. Some of these primary goals and principles are:

The last of these listed principles is one of the primary goals of the Morph project, of which Levinson is leading. He recognized the importance of neural networks and genetic algorithms in machine learning. They both are based on "pre-classified input-output pairs that attempt to learn functions, [4]" but he also recognized that there is no link between the graph-theoretic understanding of machine learning and their structures. He also recognized that they rely on human-supplied or base state input, which is unacceptable to the earlier goals listed. Another important thing to note is that Levinson sought only to create a machine that could play with high performance and not "biological plausibility" which is what the developers of neural networks are trying to create.

Most of the projects in development are based on the computer’s ability to learn to play games, typically board games, where there is a notion of complete knowledge of the game to all players. Michael Gherrity, a Doctor of Philosophy in Computer Science, discusses in his dissertation the reasoning behind choosing these types of games:

Typical board games require a player to make a sequence of decisions in a limited amount of time. There is usually insufficient time to investigate all the possible consequences of these decisions. Once made, the decisions cannot be revoked. The outcome is uncertain since the actions of the opponent are not always predictable. [3]

Most often what makes these games interesting is the ability to exhaustively search through all possible actions in a game tree that represents every possible game state. As I have examined in my own experience, however, a game as simple as tic-tac-toe can have a very large tree, with even more complex games having ridiculously large game trees. Michael Gherrity examined several of these games with his own learning machine, SAL, including chess, tic-tac-toe, checkers and connect four while the Morph project has focused primarily on chess. And while other learning systems exist, such as NeuroChess or Hoyle, the remainder of this paper will focus on SAL and Morph only.

Graph Theory and Pattern Identification:

One of Levinson’s first observations state that many games may be examined in graph-theoretic terms. One such term he applied is a hypergraph with the following definition:

A hypergraph is a set of nodes S coupled with a set of hyperedges that are subsets of S. A hyperedge is directed if its nodes are ordered. A hypergraph is nested if its nodes themselves ma be hypergraphs. We will call a hypergraph basic if it is not nested. [4]

Levinson claims that this notion of hypergraphs applies to a large class of state-based games. These games have characteristics such as a specified initial state and players that perform operations on the game state to produce new game states. Levinson showed that all state-space search problems can be viewed as relation-based transformations over hypergraphs and tied logic, graph theory and state-space representations together [4].

Levinson’s research also shows that each domain, or game, has a finite set of domain objects and a finite set of piece objects. For example, in the case of chess or checkers, each square on the game board represents a single domain object and the chess pieces or checkers themselves were the pieces. Each game state is a hypergraph of each of these objects. These objects also have a set of relations applied to them as well as a set of operators to change these relations. The terminal conditions of a game are a set of predefined conditions that are applied to the operators with the added possibility of a reward condition. Levinson also has an added a static relation function called FLIP whose sole purpose is to take care of symmetries in a variety of game-states. The focus of this function is on multi-player games where the difference between two boards may only be by the owner of the pieces on the board [4]. These definitions applied to games create the pattern languages that were required in the development of SAL and Morph.

SAL and Morph also have two other distinguishing similarities between them in their design. They contain a set of search methods as well as evaluator functions. The search methods take the database that is in front of them and decide an appropriate next move. Evaluator functions are what assign weights to given patterns and evaluate (or re-evaluate) them for submission to the database for future use.

SAL:

Michael Gherrity, lead developer and researcher of SAL, criticizes the lack of ability in most chess playing machines in that they can only ever play chess (as is the case with Deep Blue). His focus of research is on "two-person, deterministic, zero-sum board games of perfect information" [3]. In a game of perfect information, all the necessary information for a player to decide their next course of action is presented before them making for a more simplified area of game play to study [3]. There are additional restrictions on his system such as the board being restricted to a rectangular shape and fixed types of pieces. The system requires a user supplied move generator (written in C only), as well as the constraints on the board size and the number and types of pieces used in the game [6].

It chooses its moves through a two-ply, full-width alpha-beta search, plus a consistency search for quiescence. Gherrity recognized the innefficiency of searching through all possible next game states and required several methods to eliminate unnecessary searching. The consistency search is a search developed by Don Beal that makes use of SAL’s two-ply search method by determining "consistency" based on similar values from a previous, backed-up value [7]. They also rely on the concept of a quiescence search, which stops the search if it recognizes a problem, for instance, a move that seems to benefit one player, but the following move is more detrimental [8]. Quiescence and consistency boil down to a type of pruning, not dissimilar to alpha-beta pruning in minimax algorithms.

SAL’s evaluation function is isolated from the game-playing portion of the program. It maintains a database of board position patterns and stores weight information for them. It also learns by use of backward propagating neural networks and it does not re-evaluate any game states until after a game is done being played [3].

SAL has some extreme limitations at this time, however. Various tests of SAL have shown that it is an incredibly slow learner. It took nearly 20,000 games before it learned to play perfectly [6]. Gherrity used a program he called TTT which contained very basic tic-tac-toe conditions of win if it can, block if necessary, otherwise take a random square [3]. Using gnuchess, a widely accepted chess-playing program, SAL played 4,200 games and only managed to tie eight of those games, losing every other. Gherrity’s data suggests, however, that SAL was learning to play better and better, but the rate at which it was learning was exceedingly slow.

The Morph Project:

The Morph Project’s initial goal was to create a learning chess program and later to spawn off two other projects directed at general game learning, Morph II and Morph III. Levinson, head of the Morph Project, makes note, as Gherrity had, that most of the games that can be applied also have an implied perfect public knowledge available to both players. While SAL focused primarily on two player games, Morph is extendable to an infinite number of players, or agents as Levinson calls them, or just as few as single player games like Towers of Hanoi, Nim and Magic Squares puzzles.

Morph is an application of Adaptive-Predictive Search (or APS) methods that search with "experience" [4]. As noted by Levinson, a typical game state in Morph can be evaluated on average with up to fifty other sub-patterns, keeping in mind that this particular part of the project only applied to chess [4]. Morph is an attempt at an informed learner, which is given simple, domain-independent heuristics, an abstract mathematical definition of the state space and experience [4]. Morph learns by creating a database of patterns and applying weights to them using "temporal differences, simulated annealing, a genetic algorithm-like method and generalization and specialization of patterns and explanation-based generalization" [5]. The database is initially empty and knowledge is stored as pattern-weighted pairs. Evaluations are made upon a state in relation to the most specific stored patterns that apply to that state using their weights to calculate a new value. In addition to the weight value of a state, any other important data can be stored in the database allowing us to derive additional information in calculating the value of a state. After weights have been calculated and a database is formed, Morph utilizes a one-ply search to find the best move.

Once Morph was implemented, ideas for expanding the pattern searching to other games became simple. Morph II was the result and had significant findings in relation to the graph theories and games. The primary find was that the existence of a sub-graph of a larger graph representing the current state carried predictive value over the outcome of the game. There is a certain potential in this to create another relational operator to apply to this value to find a relationship with the goals of the game. This information cannot be obtained out of a direct relation to a particular sub-graph, however, and can only be obtained statistically through experience. Morph II uses a scheme that allows for incremental updates to the database and combines scores of patterns hierarchically rather than globally as its predecessor does. Morph II is also distinguished by the fact that it makes use of "blind learning" [4]. Blind learning is a type of learning implementation that provides the system with no definition of the state-space or pre-supplied heuristics, only the rewards and legal states to choose from at each point.

Morph III is a further expansion of previous research, primarily extending upon Morph as a chess-playing program using better techniques developed from Morph II. Morph III is still under development.

There are a few faults in Morph, however, as well. First, it always plays like a beginner, though it learns to play the games it is presented faster than SAL [5]. The problem with its game play is primarily due to Morph’s reliance on combining pattern values without a known relationship between patterns [4]. Morph II seeks to eliminate this problem. Second, there is also a problem in the specific nature of chessboards where Morph generalizes them too much causing them to lose much of their value. This problem is prevalent in specific boards that are seen only a few times in the system’s career resulting in inaccurate values [4].

Conclusions:

After having researched both learning systems, both seem interesting to study, but their own systems still seem overly limiting. SAL, for instance, suffers from neural networks and the horrible simplicity of the back-propagation associated with them. The advantage of SAL seems to be in the power of its consistency search.

But as Deep Blue has shown us, we already have good searching methods. Greater importance should be placed on the methods of evaluation and re-evaluation of game states. This seems to be the great weaknesses of Morph, that it only performs a one-ply search. The addition of a second level of search is not as easily the answer for Morph as adding an additional level in games that have more than one player can significantly slow down the system for very little gain in game play. As is suggested, a more proper approach to fixing Morph’s problems lies in fixing its heuristic functions and its ability to search for a good game state. As pointed out by Michael Gherrity, the only reason that the second level of searching performs well is due to the consistency search, something that Morph does not take advantage [6].

Works Cited

 

[1] Krauthammer, Charles. "BE AFRAID. The Meaning of Deep Blue’s Victory." www.chebucto.ns.ca/~adw/AI.html.

[2] "Deep Blue FAQ." www.research.ibm.com/deepblue/meet/html/d.3.3a.html

[3] Gherrity, Michael. "A Game-Learning Machine." University of California, San Diego. ©1993.

[4] Levinson, Robert. "General Game-Playing and Reinforcement Learning." University of California. ©1995.

[5] Scott, Jay. "The Morph Project." forum.swarthmore.edu/jay/learn-game/projects/morph.html

[6] Scott, Jay. "SAL." satirist.org/learn-game/systems/sal.html

[7] Scott, Jay. "Consistency Searches." satirist.org/learn-game/methods/search/consistency.html

[8] Scott, Jay. "Quiescence Search." satirist.org/learn-game/methods/search/quiesce.html