Python - PGN Processing Specifications

PGN Processing Specifications
	Chapter 42. Chess Game Notation

PGN Processing Specifications

There are several parts to a PGN processing program. There is the parsing of a PGN input file, the resolution of moves, and maintenance of the game state. Each can be dealt with separately with suitable interfaces. Each of these modules can be built and tested in isolation.

First, some preliminaries. In order to resolve moves, the game state must be kept. This is a dictionary of locations and pieces, plus the five other items of information that characterize the game state: active color (w or b), castling availability, en passant target, half-move draw count and turn number. The board has an interface that accepts a move and executes that move, updating the various elements of board state.

Moves can use the Command design pattern to separate king-side castle, queen-side castle, moves, captures and promotions. The Board object will require a fully-specified move with source location and destination location. The source location is produced by the source resolution algorithm.

A well-defined Board object could be used either for a single-player game (against the computer) or as part of a chess game server for two-player games.

Second, the hard part: resolution of short notation moves. Based on input in algebraic notation, a move can be transformed from a string into a 7-tuple of color, piece, fromHint, moveType, toPosition, checkIndicator and promotionIndicator.

The color is either w or b.
The piece is omitted for pawns, or one of RNBQK for the other pieces.
The fromHint is the from position, either a file and rank or a file alone or a rank alone. The various search algorithms are required to resolve the starting piece and location from an incomplete hint.
The moveType is either omitted for a simple move or x for a capturing move.
The toPosition is the rank and file at which the piece arrives.
The checkIndicator is either nothing, + or #.
The promotionIndicator is either nothing or a new piece name from QBRK.

This information is used by Algorithm G to resolve the full starting position information for the move, and then execute the move, updating the board position.

Finally, input parsing and reporting. A PGN file contains a series of games. Each game begins with identification tags of the form [Label "value"]. The labels include names like Event, Site, Date, Round, White, Black, Result. Others labels may be present. After the identification tags is a blank line followed by the text of the moves, called the “movetext”. The movetext is supposed to be SAN (short notation), but some files are LAN (long notation). The moves should end with the result (1-0, 0-1, *, or 1/2-1/2), followed by 1 or more blank lines.

In order to handle various forms for the movetext, there have to be two move parsing classes with identical interfaces. These polymorphic classes implement long-notation and short-notation parsing. In the event that a short-notation parser object fails, then the long-notation parser object can be used instead. If both fail, the file is invalid.

A PGN processing program should be able to read in a file of games, execute the moves, print logs in various forms (SAN, LAN and Descriptive), print board positions in various forms. The program should also be able to convert files from LAN or Descriptive to SAN. Additionally, the processor should be able to validate logs, and produce error messages when the chess notation is invalid.

Additionally, once the basic PGN capabilities are in place, a program can be adapted to do analysis of games. For instance it should be able to report only games that have specific openings, piece counts at the end, promotions to queen, castling, checks, etc.


Game State		Bibliography