Section 1

An Introduction to Inductive Learning

In Game Theory the most difficult aspect of creating a good computer player is deriving the board evaluation function that is needed in any tree-based algorithm. Determining the evaluation function often requires repeated examination and testing before even an adequate function is found. While there may be a wide range of games, each with unique solutions, they all contain the same two elements in their evaluation function:

Traditionally an evaluation function h will return a scalar value that estimates the goodness of a board for a given player. During the evaluation a feature will be detected, give forth a scalar value, and lastly a weight is multiplied to that value that compensates for importance of the feature.

What can be done to simplify the derivation of the h function is to calculate the weights associated with each feature by using the computer’s exceptional numerical ability, through a process known as Inductive Learning.

In Inductive Learning we are given a database of pairs (x, f(x)) that correspond to an initial value x and the "true" function’s output f(x). From these pairs we create a guess, or hypothesis function h, that attempts to match the true function f such that f(x)=h(x) for all x values given. It is also important to know that we may not be given all of the possible pairs (x,f(x)) and thus our h function will attempt to extrapolate from a set of given points the f(x) values that are not given, as shown by example in Figure 1.1.

 

Figure 1.1 – Four pairs are given on graph A from the true f function. Graphs B and C are two possible guesses for f. In both B and C we attempt to extrapolate on the four given pairs by defining the h(x) values for other x values not supplied.

The first step in applying Inductive Learning to gaming is to acquire pairs (x, f(x)) that our program will attempt to learn the evaluation function from. We will get these data pairs from the play of an expert human (or even expert machine) player. It is extremely difficult for a player to express his mental rationale for choosing a move in general, however we are only concerned in the results of the expert, the starting board and the follow board.

Assuming the expert is acting rationally, given any board x, he will attempt to discern what his best next move is. In essence the player is using an f function to determine the next move, where f is the process by which the player selects his next manuever. If we express x as a board, and f as the expert’s function we get the following pairs:

(xboard, f(xboard))

In other words, we will look at pairs consisting of a starting board and the following board the expert chose as the best.

Yet how do we know that the expert chose the best next move? The answer to this question is that we don’t know exactly, but we can assert some assumptions to help aid our learning process. Our h function will be derived from the expert’s board pairs, thusly it will only be as good as the pairs we use. Immediately we eliminate any board pairs in a game where the expert lost. If we assume that at some point in the game the expert could have avoided loss, then at least one pair of boards was not the best.

It can also be speculated that if an expert decisively defeats someone on his level, then these pairs will prove the most useful to us. In this manner we can search through records of games played for board pairs that we will attempt to learn from.

Once we have constructed a database of pairs we will repeatedly adjust the weights for each of our programmed board factors, in an attempt to create an h function that matches as many pairs as possible. In Figure 1.2 we first plot the starting and next boards given to us in our database, as denoted with a yellow box. We then create a set of weights that give us h function 1, and into this function we put those same starting board that are found in our database. As a result we get next boards chosen by h function 1 which are represented by purple triangles. The same process generates an alternate function h 2 with a new set of weights. We can then compare the two functions by how many matches to the database they achieved, and conclude that function 2 if preferred. This repeated adjusting and testing is how the process of Inductive Learning works.

 

 

 

 

Figure 1.2 – Database pairs and the output of two h functions

 

Section 2

Determining the Factors and Creating the Database

To illustrate the process of Inductive Learning and show results, we will look at an attempt to learn an h function in the game of Tic-Tac-Toe. Before any learning can take place we need to complete two initial steps. The first is to determine the factors in the board that we will look at, and second to create a database of pairs from a player.

Selecting the factors that we find important is a difficult problem, however because we are learning the importance of the factors we do not need to ask ourselves certain possibly complicated questions:

These questions do not speak on the features, but instead the weights associated with the factors. The first question is simply whether the weight is positive or negative, and the second is the magnitude of the weight. Since we will use Inductive Learning to determine the weights, we do not need to concern ourselves with these questions.

Additionally, we may overstate the number of features because any feature that is irrelevant will be given a weight of zero, and thusly does not contribute to the overall goodness of a board. Given unlimited time to run our learning algorithm we could easily create a very accurate h function that considers 50 different features in a Tic-Tac-Toe game. However, since we do not have unlimited time we must choose an adequate number of features that will allow us to learn an effective and simple h function.

(Note: The word "row" in this paper represents rows, columns, and diagonals. It is meant to signify 3 squares in a row, and not simply a horizontal row.)

Feature #1 A row with two of a mark and an open space

We know from playing the game of Tic-Tac-Toe that having a row with two of a mark causes your opponent to have to block. More advanced players tend to with when they have two rows with two of their mark, thereby making it impossible for the other player to block.

Feature #2 A row with three of a mark

It is obvious that a row with three of your mark is desirable because you win. However, the importance of this feature is unknown to our learning function and we hope that it will determine this feature to very important.

Feature #3 Center Square

Many people tend to start our in the center square, and among players it tends to be a prized square since it lies in nearly every row, column, and diagonal.

Feature #4 Corners

Corners tend to be taken because they can lead an opponent into unblockable situations.

Feature #5 A row with one of a mark

Perhaps it is important to have a chance to win in as many directions as possible, and thus the more rows you can fill with you mark the better you are doing.

(Note to self: Put in Appendix also)

Through inductive learning we will attempt to learn the integer (or decimal) weights for each feature in our h function:

h(board) = (Feature1*Weight1) + (Feature2*Weight2) + (Feature3*Weight3)

+ (Feature4*Weight4) + (Feature5*Weight5)

One thing to note is that features 1,2, and 5 all return the number of your rows matching the feature minus the number the opponent’s rows. Likewise, feature 3 returns 1, -1, or 0 depending if you, your opponent, or nobody has a mark in the center square. Feature 4 returns the number of your corners minus the number of your opponent’s corners. In this way we can cover a wide variety of possible importances for each feature, as shown in Figure 2.1.

Feature Value

Weight Value

Result (Goodness)

Explanation

Positive

Positive

Positive

You posses more of this beneficial feature

Positive

Zero

Zero

You posses more of this unimportant feature

Positive

Negative

Negative

You posses more of this detrimental feature

Negative

Positive

Negative

Your opponent has more of this beneficial feature

Negative

Zero

Zero

Your opponent has more of this unimportant feature

Negative

Negative

Positive

Your opponent has more of this detrimental feature

Figure 2.1 – A table describing the various combinations of Features and Weights with interpretation of the results

The final step is to define a database of pairs that we will check our potential h function against. Since Tic-Tac-Toe does not have masters or documented games, I instead created a program that would create database. (Program Section, #1) It displays boards and asks the user what his/her next move would be. The program generates a random number from zero to seven, which correspond to the number of marks on the board, and then the program randomly generates that board.

This method of generation provides some benefits. The first is that the database will contain a wider range of boards than could be achieved by playing a few games. Secondly, boards that have fewer marks tend to be both more critical and also more frequent, which the program compensates for by generating a random number of marks on the board. Had the program simply generated a random board, then the chances of ever being prompted with the starting board would be very slim.

With the determination of the board features and creation of the database finished, we can examine the technique of Inductive learning and the results that I obtained in my experiment.

Section 3

Inductive Learning Process and Results

In this section we will go into a more in-depth description of the process that I went through to implement the Inductive Learning algorithm, following which results of the process will be presented and explained.

The first step in the process was to create a database that the algorithm could attempt to learn from. The program generates random boards and asks the user his next move, to which the user responds with a number from the Numerical Keypad, and that is translated into the resultant position on the board. For each starting board generated and input given a line of information is written to a database file. The information is of the following format:

D#SSSSSSSSS:AAAAAAAAA,BBBBBBBBB,CCCCCCCCC,etc…//

To explain some of these details further, let us select an example from our sample output:

Board #:1

| |

-+-+-

| |X

-+-+-

| |

Placement of *O*8

4#000001000:010020000,000020010,//

The D value that is generated by the program is 4 because the board that is given really represents four different boards…..

| | | | | | |X|

-+-+- -+-+- -+-+- -+-+-

| |X --> | | X| | | |

-+-+- -+-+- -+-+- -+-+-

| | |X| | | | |

If we assume that the person would have selected the same relative spot on the board, no matter which of the 4 orientations shown above, then we can represent a larger amount of information in our database. While it may seem like the program defined only one board, in essence it defined four.

Similarly, it may seem like the user defined one next board, however it turns out to be two.

| | | | |O|

-+-+- -+-+- -+-+-

| |X --next--> | |X & | |X

-+-+- -+-+- -+-+-

| | |O| | |

Since the starting board has horizontal symmetry, then an O on the bottom and top are both equivalent boards. The general rule used by the program is that whenever horizontal, vertical, or diagonal symmetry is broken, then there are multiple boards that are equivalent to the one selected by the user. This aspect of the program is very important to us because we want to make it simple as possible for our algorithm to learn the h() function, and having several valid next boards allows us to do so. Instead of our algorithm having to work with pairs (Starting board, Follow Board) a 1-to-1 correspondence, we can instead work with (Starting Board, Multiple Follow Boards) a 1-to-many relationship. This increases our odds of finding a suitable h() function, while preserving the integrity of the data because each follow board is an equivalent move.

It is important to understand that both above-mentioned aspects of the program are only useful in games with a high occurrence of symmetry. Games such as Chess would best be represented straightforward, without analyzing aspects of symmetry in the board simply because there are very few opportunities were symmetry arises.

The human’s database is then passed to the Inductive Learning program to determine the best-fit h() function. (The code for this program is found in the Program Section, #2) The program first asks the user for the terminal level for a Minimax search, and then asks the user for the bounds on the five weights.

The algorithm asks for bounds on the range of possible weights, which is a way of helping the algorithm in the learning process. Thusly, my implementation of the Inductive Algorithm is a supervised learner because it is given a confine of values to look at.

This is suitable (as in most gaming examples) for two reasons. The first is that widening the ranges on the weights will just give us redundant answers. Consider that the best results for an h() function are the weights: 4,-2,3. Then widening the range will just give us alternate (yet equivalent) answers 8,-4,6 12,-6,9 etc….

Secondly, the time that the program takes to learn the h() function increases dramatically with small increases in the ranges of the weights. The overall loop structure for the program is as follows:

For each possible value for weight 5

For each possible value for weight 4

For each possible value for weight 3

For each possible value for weight 2

For each possible value for weight 1

For each starting board in the database

Run Minimax

Consider a computer than can process 250 runnings of Minimax per second (compared to 100 per second on my Pentium 400) and a database of size 50. Time grows much faster than the range of the values.

Figure 3.1 – Graph to depict how a small increase in the range of a weight will result in a large increase in time

A range of values is not only necessary, but minimizing it is necessary to get results in a reasonable amount of time. In all of my results I used the range –10 to 10 for each of the weights.

After giving the program the limits it goes through each possible set of values for the h() function. Given a particular set of weights the program will take the first board in the database and perform Minimax on it, resulting in a best board for our set of weights. If that best board is one of the boards specified in the database then we have a match, and can increment the number of matched pairs by one. This procedure is repeated for each pair in the database, ultimately leading to a percentage of matched boards for that particular set of weights. Likewise, the procedure is repeated for each set of weights and the one with the highest matching percentage is recorded. Upon completion of the program the best set of weights is outputted along with it’s percentage of pairs matched. These set of weights give us the h() function that best estimates the true function.

The data given in the results section is runs of the algorithm with a Minimax depth of 2 levels and ranges from –10 to 10. My data is recorded under the name Anthony, and I would consider myself a grandmaster Tic-Tac-Toe player (especially after studying aspects the game for months). The data recorded under the name Amanda is that of my girlfriend who is a player who has not played the game in a very long time.

Matches

No matter the skill level or intelligence of the player, we can never expect to match 100% of the boards in any large database. Inconsistency on the part of the human can be partly to blame, for example a player may pick the center square when presented with the starting board yet later pick a corner, thusly making it impossible for the algorithm to match both pairs. Additionally, we are trying to represent the actions of a human player through 5 simple variables. There is a mismatch in the complexity of a human’s decision and our h() function.

As demonstrated in the graph labeled Matches we can see that as we increase our database size we loose precision in our function. This is because of the above-discussed problems, inconsistencies and impossibilities become more and more frequent and lowers the overall number of matches.

Something can be said about the fact that Amanda’s percentage is below that of Anthony’s. Either Amanda’s play is more sporadic and inconsistent than Anthony’s, or Anthony’s rational for choosing a move is closely related to the five factors identified by the programmer. In conclusion, it is possible to use the matching percentages to determine if the factors your are using are correlating to the database your are using. If we discount Amanda’s sporadic game play then we can only assume that her rational is less related to the five factors than Anthony’s rational is.

Weights

The optimal weights determined by the algorithm can be verified to a large degree. The most obvious example of this is that for both players the factor describing the WIN/LOSS was the most important. This may seem obvious to a human, but it is important to understand that the algorithm was never told the importance or magnitude of this feature, and yet it was able to determine it always to be of the highest importance.

Additionally, the resultant values of the corner and center features are very accurate to the skill of both players. Amanda is the typical novice Tic-Tac-Toe player who believes that the center square is a very important square on the board. Given the starting board she continually chooses the center square, as do most players. However, Anthony being a studied player realizes that statistically his chances of winning are greater if he goes for the corners, which allows him to at times have a win in two directions. Over almost all database sizes, the algorithm determines the corner to have higher importance than the center for Anthony. Likewise for Amanda the center has a higher significance than the corner.

In summary, Inductive learning is a very useful method by which one can simplify the creation of the often difficult h() function. My research focused only on Tic-Tac-Toe, however the algorithm can be well suited for more complex games. In Chess we know that it is important to have as many Knights, Rooks, Bishops, as possible. However, we do not easily know how important each of the pieces is. Is a Knight worth two pawns? Is a Queen worth two Bishops? Inductive learning can be used to determine the weights for each of the pieces, thereby simplifying the generation of the h() function. Additionally, the algorithm has the benefit of being able to capitalize on the knowledge of other, more skilled, game players. It may take quite a while to calculate the values, however Inductive learning provides a simple way for an unstudied programmer to "learn" a skilled and accurate h() function that would have been impossible to derive based solely on the programmer’s knowledge of the game.