30 December 2017

Process Improvements

Continuing with Engine Trouble (September 2017; 'Talk about a disastrous tournament!'), in that post I described the background with a few sentences:-
The tournament was the final section of the LSS 2015 Chess960 Championship, a three-stage elimination tournament. [...] The 2016 final tournament starts soon, but I probably won't participate. I've taken enough psychological punishment for one year.

I finally decided to participate in the 2016 final for two reasons:-

  • It's not so easy to reach the final and this might be my last opportunity.
  • I couldn't do any worse than in the 2015 event.

I also decided not to upgrade my engines for this event, but I did introduce a number of 'process' improvements in the way I use engines. Specifically, I tried four new techniques that I'll discuss separately:-

  1. Castling manually
  2. Using two engines with different qualities
  3. Making first a null move, then applying opponent's expected move
  4. Using coarser granularity

1. Castling manually

In another follow-up post to 'Engine Trouble', The Seeds of Disaster, I noted, 'A chess960 opening can thus be logically divided into three phases': before either player has castled, after one player has castled, and after both players have castled. (The phrase 'player has castled' can also mean that a player has somehow lost the castling privilege by moving the King or by moving both Rooks.) In many games I've noticed some apparent inconsistencies in engine evaluation across these three phases. Was this another problem related to tapered evaluation as discussed in Chess Engines : Advanced Evaluation (September 2015)?

I decided to experiment by evaluating two identical lines. In one line I castled normally; in the other I castled artificially by moving the King and the Rook (in chess960, sometimes only one or the other moves) one square at a time, inserting null moves for the opponent, eventually reaching the castled position. I once used a similar technique in The Engines' Value of Castling (May 2015). I indeed recorded some differences, although it's too early to report on the results because the games are still running.

2. Using two engines with different qualities

The previous discussion is only relevant to chess960, but the remaining points are also relevant to traditional chess. I often use two (or more) different engines to evaluate the same position, then analyze any differences between them. This helps me understand unclear positions, especially where there is unbalanced material on the board. For example, I've often noticed that Houdini is better than Komodo at tactical play, but Komodo is better than Houdini at positional play.

I avoided using Stockfish for this sort of comparison because its search depths don't compare to the other engines. Roughly speaking, it takes Houdini and Komodo twice the time to calculate each successive ply, but Stockfish takes only 50% more time. I was reminded of this during the latest TCEC season, which I wrapped up with Houdini, Komodo, Stockfish, and AlphaZero. In a TCEC report from last month, Interview with Robert Houdart, Mark Lefler and GM Larry Kaufman (chessdom.com), the following exchange took place:-

Nelson [TCEC organizer]: What quality of your program do you think may be superior to your opponent in the Superfinal?

Larry [Team Komodo]: Basically, we have much more comparison with Stockfish because Stockfish is open source so we can easily compare our ideas and see what works better or worse. I don’t really know the inside workings of [Houdini], but what I can tell you is that my belief is that Komodo is better in most things than Stockfish. But there is something holding us back that has to do with search depth. We’ve been trying to figure it out for years, I don’t know what it is, but there is some reason we are not able to get the same search depth as Stockfish even if we tried to copy all their algorithms. We’ve tried experiments where we’ve tried to make Komodo act like Stockfish but it doesn’t work, and I don’t know why, but I feel that if we ever figure that out we’ll just be clearly #1. But almost every time we tried any idea from Stockfish in Komodo, nine times out of ten it makes Komodo weaker.

If the developers of a world class chess engine can't explain the phenomenon, what hope is there for the rest of us? It probably has something to do with pruning, as in Chess Engines : Pruning (September 2015). Long story short, I started comparing Houdini's lower-depth evaluations with Stockfish's higher-depth, but am not yet sure what I'm seeing.

3. Making first a null move, then applying opponent's expected move

It sometimes happens that no matter what move the engine proposes, the expected response by the opponent is always the same. When this situation occurs, I started using a technique where I first make a null move, then apply the expected response, then evaluate the resulting position. Any further analysis of the subsequent variations requires inserting a null move for the opponent. This should help to understand the tradeoffs for the immediate move.

4. Using coarser granularity

I also started experimenting with relaxed granularity. Engines normally return their (numerical) evaluations in units of centi-Pawns (0.01). I've often noted that it's shortsighted to favor one move over another simply because the first has a value of 0.02 and the second has a value of 0.01. This is even more shortsighted when one of the values is 0.00, which can mean all sorts of things. Any relaxing of strict numerical order is what engine developers call 'contempt' (see that Chessdom.com interview for more about the concept), but I started applying it to the moves suggested by the engines. I'll try to cover this in another post.

No comments: