Image Recognition Techniques Applied to Intelligent Game Play
by Sam Baskinger
If analyzed, most, if not all games, have some elements in the which there are redundant elements. These elements can be game situations or they can be patterns of action in a game, such as turns taken, rounds, or simply cycles which are built up out of the nature of the game. An example of this in how most good chess players approach the opening of a chess game. They adopt an opening book move to start off the game seeking the known benefits of that strategy against their opponent. The opponent should, optimally, be able to pick a counter strategy which is proven effective against the opponent's tactic. This illustrates the best possible situation for a computer piece of intelligence. The opening moves do not change or vary much, if ever at all, but what about instances where the game offers very similar situations, but these situations appear in permutations. Again, an illustration can be taken from a chess game. There is, in chess, a situation called a skewer. This is when a player positions a piece in such a way that one of the opponent's pieces is in danger and should that endangered piece be moved it will make vulnerable another piece to the aggressor's attack. The attack is just like it sounds, the line of attack proceeds through two pieces making at least one of them a potential victim upon the next move. This situation, as described, can occur in many places in the game an in many forms. A method for recognizing and analyzing data pertinent to the problem at hand, if utilized, may offer great strides to increasing real time performance. These techniques are ones that are all common to image processing and recognition.
The problem of image recognition and the problem of game situation recognition can be mapped to one another very easily. First, to recognize an image, some basic bits of information are first needed. The image being dealt with should contain known instances. That is, this type of intelligence can not synthesize anything, but seeks to recognize known instances of situations. The image must also be able to be broken down into elementary components. The relations between all these components will determine the exact objects found in the image.
To more fully flush out what an image component, and subsequently the characteristics of a game component, might be, consider a satellite picture of a suburban neighborhood. It would look much like a group of discrete shapes, such as rectangles. The intelligence must first be able to label and group objects in this picture to be subject to identification. It does the computer no good to assume that the entire picture is one giant instance. Such an approach would lead to a huge database of instances and would leave little flexibility for varying instances.
The preferable method is to find small components which are relevant to an instance and its composition. In the case of our satellite neighborhood photo, recognizing rectangles would be very beneficial. These could be a multitude of objects such as cars, houses, or even mail boxes. Once these components are realized other computations can take place to find their relations to one another and from that recognize a known state.
Also important is the context in which this image is found. Context is a major component to correctly analyzing the image given to the program. It narrows the search field so that only valid assumptions are made making for shorter processing time. If not given a context, it is also conceivable that multiple correct answers can be arrived at. One is correct in reality, but in the perceptions of the program, both are correct. There is nothing to distinguish them.
An example is one of cars on a highway.[1] If a group of rectangles can all be recognized falling along a larger ribbon-type shape, these can be recognized as cars on a highway. If a string of rectangles are found to be in strait lines, they may be computed as houses because there is a distinct lack of the ÒribbonÓ object which in the case of the cars was a highway. Context in a game situation is equally important. Not only does it allow for an accurate analysis of a situation, but it also allows for some optimizations to be made in computation, which is critical in many games and real life applications of AI.
The next step in recognizing an image, once context and basic shapes have been determined, is to define the relationships between all the defined objects. Relationships can take many forms; Such is the case of the satellite picture or the neighborhood. If all the rectangles are oriented so that their longest sides are parallel most of the time and they line the sides of, what can be determined to be, a road, then the rectangles are most likely houses.
Other relationships in an image might also be relative motion, such as can be observed in traffic. It might also be seen in numeric correspondence. If there is on very small rectangle in front of large rectangles determined to be houses, then the small rectangle might be the mailbox.
Relationships take a bit of creativity and insight on the part of the programmer to develop. A missed relationship may deny the game a critical insight. Relationships are also set at an optimal level, but not always are optimal levels achieved. Consider a picture of a dog. If this picture was tilted slightly, most image recognition software could still determine the contents of the picture as being a dog even though the strict and optimal input was not given. This process is called relations relaxing. This technique is very important in giving a piece of AI the flexibility to perform well in similar situations and not store every instance of every situation.
The way that image relaxation applies to image recognition is also to add flexibility. There is a strict definitive model adhering the theoretical outcome, or the most plausible real outcome. What is then done to the model is that it has defined in it flexible relations and inflexible relations. The flexible relations also may have assigned to them a tension allowing for liberal flexing and maleateing of the relation, or denying much change.
I very good example of this would be software for recognizing a car viewed from the side so that only two tires are visible. This is a simplified image for the sake of this example. The software would have an inflexible relation saying that a car must have two tires which must a share a tangential line and the tires must be a distance apart greater than the radius of the tires. A flexible relation would be the distance between the tires. The tension of the tire distance would increase as the distance between the tires moved to far from what is an acceptable value for tires.
The process of relations relaxation[1] is a systematic of loosening the constraints on a prefect instance to an instance which fits within the bounds of an instance. This process seeks to find cases which are almost perfect, and as such simply permutations of a known state. The idea in image processing is that these are instances of the object rotated or viewed in a particular light.
In game AI, however, this process can not only be used to find disguised forms of a known state of the game, but also can be used in some cases to generalize about states which are almost the same. These states might be subject to the same or very similar rules of the more well known state. A good implementation of this might give a machine and edge in obscure cases. Again, perfect play is not the goal, but very good play.
Finally images may use some level of filtering to remove information which is not relevant to the processing and recognizing procedure. This filtering can be very complex and may require some information about how the data was achieved. For instance, in a very sunny day in the late afternoon, a picture can be expected to have very long shadows in it and be of a very high contrast. A filtering function can capitalize on this and make steps to isolate physical objects apart from what might be seen as object, but which in reality are only projected shadows of objects. A filtering function is very similar to a putting a picture in context in that data is being excluded, but rather than excluding data from the possible set of objects, data subject to analysis is being removed. Context might be thought, also, as a form of filtering in which possible results are filtered out by their relation to the situation involving the data collection. For instance, pictures of the lunar surface should not contain Oldsmobiles, so Oldsmobiles are not considered as a possibility.
Again, this can all be applied to a game to not only speed up the computation, but to also avoid duplicate outcomes. By filtering out data that is not relevant, what is focused on is the problem at hand and what should come out of all of this is a solution to the problem at hand.
What follows is a brief mapping of this information and perspective to the previously presented problem of the skewer. Going back to the beginning of this discussion we define the base objects of chess to be pieces. This might be a simplistic model for dealing with a complex game as chess, but this is only an example of the possible advantages of image processing and recognition on game play. The context will have to be determined one the problem is presented. Relations between elements in this game are board positions and what can be called attack vectors. An attack vector is the line along which a piece can take another piece. A simple example is a rook has an attack vector along the x axis and the y axis (if the board is seen as a cartesian plane). A bishop posses an attack vector along diagonals and so on. No much filtering should be done in chess as the entire board can conceivably come into play at some level.
Now, consider the instance of the skewer. Piece A is threatening piece B. Should B move, piece C will be threatened and may be taken in the next turn. If B does not move, B may be taken in the next turn. The relationship between A and B is that B is in A's attack vector. C is also in the attack vector, but is blocked by B. So, the basic model for a skewer can be constructed so that when two pieces are in one attack vector, but one is protected by another, it is a skewer. These are the inflexible relations. The flexible relations are if there are more than two pieces in the attack vector or if piece A is threatened by another piece. There are a host of other variations which can be played on a skewer rendering it to be a devastating situation, or a minor inconvenience.
The definition of context in this situation depends mostly on who's turn it is and whether there is anything more pressing occurring at another part of the board. Remember that the filtering function was noted as not being a good idea because all of the board is important. This is why as the skewer might be very interesting, but not if there is a possible check mate taking shape on the opposite side of the board. The same can not always be said for context.
Context in the very early stages of the game is very important, especially in chess. If the opponent commits to a certain opening book move, chances are they will pursue it to it's goal. If this opening book move is recognized early, at least one or more subsequent move should be analyzed in the context of the motivations of the opening book sequence which has, and very well might be played out. A clever chess player will allow the sequence to continue and try to take advantage of the known pattern taking place by setting up a means to foil the objective and position pieces to exploit the failed outcome.
Note that none of this actually deals with computing a next move. Rather, this recognition process it to give information to a piece of intelligence which will make the next move. The only thing that the discussed AI wants to do is to recognize a pattern which it knows something about.
Looking briefly at the relations relaxation possibilities, again, consider the skewer. The a fore mentioned model did not take into account orientation of pieces, such as distance or numbers involved in the attack vector above two. These details are left out because they do not affect the basic play involved in the scheme. Because of the exacting nature of chess, it is very difficult to apply relaxation techniques to board states. A more appropriate use of relation relaxation would be to generalize about the state of the board. Consider the placement of all the powerful pieces in the center of the board. Some generalization can be made about this situation. These generalizations might also help to choose an appropriate context. Is ownership of the center of the board a defensive or offensive act? Should it be taken as a sign of an aggressive player and thus should the computer assume a defensive strategy until it can secure a hold on its position?
It is clear that there are correlations between multiple branches of AI which can be applied to other branches. In this case it seems that the more general problem of recognizing information in a different but equivalent state is critical to many AI applications. A computer can benefit from recognizing where it has been before in an office and what it has done before in a game and even how a word is pronounced multiple times. The overlap is amazing and seems to point to a smaller set of generalized problems common to all AI fields.
BIBLIOGRAPHY
1. "Control Strategies for Two Player Games", Bruce Abramson, ACM Computing Survey, Vol 21, No. 2., June 1989.
2. "Computational Strategies for Object Recognition", Paul Suetens, ACM Computing Surveys, Vol 24, No. 1., March 1992.
3. "Model Based Recognition in Robot_ Vision", Roland T. Chin, Charles R. Dyer, ACM Computing Surveys, Vol. 18. No. 1., March 1987
4. http://www.dontveter.com/basisofai/basisofai.html, Donald Tveter, "The Pattern Recognition Basis of AI".
5. "Uncertainty in AI", Scott M Thomson, comp.ai.games, March 31, 1995.