20 May 2023

When Chess960 Reduces to Chess

Let's talk turkey. Once in a while I like to document my recent experience playing chess960 online. Earlier this year I posted The Fascinating World of Chess960 (January 2023), where I wrote,
Last month's post was also about switching to a different online service for playing chess960. [...] I continued playing on LSS until last year. I was playing chess960 in a couple of multi-stage events, where success in one stage promotes a player to the next stage. I decided to skip the next stages, essentially taking a year off from serious play. [...] I switched to Chess.com in May 2022, playing one or (maximum) two games of correspondence chess at a time.

A month later I wrote a post about a Chess.com service, Chess.com Reviews a Chess960 Opening (February 2023). Since then I've stopped playing Chess.com for reasons that I won't discuss in this post. I went back to LSS, partly with the intention of evaluating Chessify for chess960.

This is the first time I've mentioned Chessify on this chess960 blog, although I've discussed the service several times on my main blog. In Chessify Resources (March 2023), I wrote,

The main problem with chess960 in a traditional chess environment stems from the castling rules. Since chess960 games tend to become extremely tactical after a few moves have been played, there is nevertheless some value in trying to confirm the tactics with a traditional, non-chess960 engine. [...] I'll continue using Chessify to look at chess960 positions.

There are three phases of a chess960 opening (often overlapping with the early middle game):-

  • Both sides can castle.
  • One side loses the castling privilege.
  • The other side loses the castling privilege.

I say 'loses the castling privilege', because it can arise when castling, when the King moves, or when both Rooks move. The point where castling is no longer an option is exactly where chess960 starts to look and feel like chess starting from the traditional position (SP518 RNBQKBNR). This is the point where a chess service like Chessify becomes fully viable.

22 April 2023

Leela in TCEC FRC Events

My previous post Breaking the 4000 Barrier (April 2023) was about chess960 ratings for engines. I wrote,
The fourth [rating] list was based on chess960 games. Here are the top 25 engines from that list.

I received a question -- @nicbentulan: Is Leela on this list? (twitter.com) -- against the Twitter anchor for the post. A quick look at the list reveals that Leela is on the most recent list at no.42, rated 3008:-

no.42 • Lc0 0.29.0 CPU_744706 • 3008

That rating is nearly 1000 points lower than no.1 'Stockfish 15.1', currently rated 4005. The Lc0 rating looks dubious, given that Leela is currently one of the top three engines in the world. How has it done in TCEC FRC competitions? Following is a list of posts from this blog:-

  • 2022-07-30: TCEC C960 FRC5 • In the 'Final League', Stockfish and LCZero finished in a tie for 1st and 2nd to qualify for the 'Final Match'. [...] In the 'Final Match' Stockfish beat LCZero +17-13=20.'
  • 2022-01-22: TCEC C960 FRC4 • 'Stockfish and LCZero tied for 1st/2nd in the [TCEC] FRC4 'Final League', a point ahead of KomodoDragon. Stockfish beat LCZero +13-9=28 in the Final.'
  • 2021-03-27: TCEC C960 FRC3 • 'In the 'FRC 3' final, KomodoDragon beat Stockfish by a score of +2-1=47.'
  • 2020-12-26: TCEC FRC2 • 'In the FRC2 Final League, LCZero and Stockfish finished first and second to qualify for the 50-game final match. Stockfish beat LCZero +8-0=42.' [...] 'The first of the three posts above linked to Stockfish, the Strong (July 2014) on this blog, plus two other followup posts based on FRC1. FRC1 was held three and a half years before AlphaZero made waves with its revolutionary AI/NN technology, soon to be followed by Leela Chess Zero (aka LCZero / LC0).'
  • 2014-08-02: TCEC Season 6 - Chess960 • 'After posting Stockfish, the Strong, winner of this year's TCEC Season 6 Special Event (chess960), I started looking at the games from the event. I couldn't find a crosstable, so I made one myself, shown below.'

Why is Leela so far down on the CCRL rating list? I'll leave that for the CCRL to answer.

15 April 2023

Breaking the 4000 Barrier

The title comes from a recent post on my main blog Breaking the 3600 Barrier (April 2023), where the '3600 Barrier' refers to a chess rating level. In that post I wrote,
I see that there are four CCRL rating lists. Shown below are the top five engines from three of the lists.

Why didn't I show all four lists? Because the fourth list was based on chess960 games. Here are the top 25 engines from that list.


CCRL 40/2 FRC [C960] - Index

The most striking feature of the list is that the top engine, Stockfish 15.1, is rated over 4000. This is more than 250 points higher than the top rating shown on the '3600 Barrier' chart. In fact the top four engines on the CCRL FRC list are higher then the top engine on the earlier list. I couldn't determine why this is.

[NB: The domain given here (ccrl.chessdom.com) is not the same as the domain in the right sidebar (computerchess.org.uk), although the two sites appear to be identical. The ratings are based on the same database of games used to compute the CCRL 'Opening statistics'.]

25 March 2023

Evolving Evaluations

The previous post Myth No.6 - 'Forced Wins for White' (March 2023) introduced 'the Molas study', a data scientist's effort 'to find if there’s a [chess960] *start position* that's better than the others'. One of the datasets used in the study was:-
Stockfish evaluation at depth ~40 for all the starting positions

This is also known as the 'Sesse' resource and I gave its URL in the post. The Molas study concluded,

Stockfish evaluations don’t predict actual winning rates for each variation

This didn't surprise me. If you consider that each start position (SP) leads to a mega-zillion possible games and that Sesse reduces each SP to a single two-digit number, much more surprising would be to find a meaningful correlation between an SP's W-L-D percentages and its Sesse value.

I discussed the Sesse numbers once before in A Stockfish Experiment (February 2019). That post mentioned another discussion, What's the Most Unbalanced Chess960 Position? (chess.com; Mike Klein; March 2018 / February 2020). Fun Master (FM) Mike observed,

Let's now take the most extreme case the other way -- the position where Sesse claims White enjoys the most sizable advantage. The lineup BBNNRKRQ delivers a whopping +0.57 plus for the first move. The advantaged is so marked that some chess960 events may even jettison this arrangement as a possible option (a total of four positions are +0.50 or better for White, but none are as lopsided as this one).

That position, also known as 'SP080 BBNNRKRQ', has received some notoriety thanks to Sesse, so I decided to investigate further. I downloaded the SP080 file from the CCRL (see link in the right sidebar), loaded it into SCID, and discovered that it contained 554 games. SCID gave me percentages for White's first moves, which I copied into the following chart.

There are 11 first moves for White listed in the top block of the chart. I then expanded the first two of those moves -- 1.g3 (65.7% overall score for White) and 1.Nd3 (59.7%) -- into the second and third blocks of the chart to see how Black has responded to those moves.

You might be wondering why I said there were 554 games in the file, but the SCID extract counts only 519 games. SCID was designed to handle the traditional start position (SP518 RNBQKBNR) and knows nothing about chess960 castling rules. SP080 allows 1.O-O on the first move, which SCID rejects. The 35 missing games (554 minus 519) are games that started 1.O-O. When I'm using SCID for a chess960 correspondence game, I have a technique to account for this anomaly, but I won't go into details here.

Similarly, the charts for 1.g3 and 1.Nd3 show '[end]' as one of the first moves for Black. These are games where Black played 1...O-O on the first move. The corresponding percentage scores are among the worst for Black, showing once again that early castling is a risky strategy.

If I were playing SP080 in a correspondence game, I would analyze both 1.g3 and 1.Nd3. A promising continuation after 1.g3 is 1...c5, which the score '43.9%' says, 'Favors Black'. Of course, I would have to look at White's second moves in this variation, where one move will appear to be superior to the others. And so on and so on.

To be useful, the SCID tool needs to be handled intelligently. I recently blundered into a wrong evaluation that I doumented in The CCRL Is Unreliable (Not!) (December 2021). I'm hopeful that some day a tool will appear that rivals SCID functionality *and* that understands chess960 castling. For now, I make do with the software I have.

For a look at two more SPs where evaluations have shifted with experience, see SP864 - BBQRKRNN and SP868 - QBBRKRNN, which are both attachments to this blog. One lesson I've learned from playing chess960 for almost 15 years : nothing is fixed in stone.

18 March 2023

Myth No.6 - 'Forced Wins for White'

Upon encountering chess960 for the first time, one of the first questions a new player asks is 'Are all 960 positions fair?'. I included a statement of this concern in Top 10 Myths About Chess960 (May 2012), where one bullet said,
Some start positions are forced wins for White

Remembering that I wrote this more than 10 years ago, at a time when I wasn't absolutely 100% sure that such unfair positions didn't exist, my standard response to the statement was, 'Which positions are forced wins? Please provide a specific example'. I never received a single example. Ten years later I can say with more confidence -- although still not 'absolutely 100% sure' -- that while some positions are difficult for Black to play, none of the 960 positions is lost before a single move is made.

In January a new study titled, Analyzing Chess960 Data | Alex Molas | Towards Data Science (towardsdatascience.com), appeared. Its subtitle announced,

Using more than 14 million chess960 games to find if there’s a variation that's better than the others.

There is considerable knowledge presented in the study and I don't pretend to understand all of it. I might well need several posts to unravel its subtleties, so I'll start by summarizing its references; in the following discussion, '>>>' means a direct quote from 'Analyzing Chess960 Data'.

>>> 'The original post was published here...'

[NB: I'll come back to this reference later; see '(A)' below. First I need to point out that there's an important issue with terminology. When chess players use the term 'variation', they mean a sequence of play arising from a specific position; e.g. 'In this position I had two variations and I had to work out which variation was better for me.' • In the Molas study, I'm convinced that the word 'variation' refers to one of the well-defined 960 start positions that are legal for chess960. I read the subtitle of the towardsdatascience.com article as 'to find if there’s a *start position* that’s better than the others' and the title of the amolas.dev post as saying 'Discovering the best chess960 *start position*'. I won't repeat this caveat each time, but it's important and helps to understand the discussion.]

>>> 'Ryan Wiley wrote this blog post where he analyzes some data from lichess..'

>>> 'There’s also this repo with the statistics for 4.5 millions games (~4500 games per variation)...'

[NB: There's an issue with the word 'variant' here, but it's not as important as the previous 'NB'. Chess960 purists will know what I'm talking about.]

>>> 'In this spreadsheet there’s the Stockfish evaluation at depth ~40 for all the starting positions...'

>>> 'There’s also this database with Chess960 games between different computer engines. However, I’m currently only interested in analyzing human games, so I’ll not put a lot of attention to this type of games...'

>>> 'Lichess -- the greatest chess platform out -- maintains a database with all the games that have been played in their platform...'

>>> 'To do the analysis, I downloaded ALL the available Chess960 data (up until 31–12–2022). For all the games played I extracted the variation, the players Elo and the final result...'

>>> 'The scripts and notebooks to donwload [sic] and process the data are available on this repo...'

At this point the article launches into 'Mathematical framework; 'Bayesian A/B testing; [...]'. This, of course, is the essence of the study and I won't go any further in this current post. Let's get back to '(A)', where there's another key reference.

>>> 'This post got some attention in Reddit...'

I could end the post here, but I need to make an admittedly subjective observation. There are two example of bias in the above references.

The first bias is 'I’m currently only interested in analyzing human games'. Huge caveat here. In my not-so-humble opinion, the CCRL is the best source of chess960 opening theory. Period. Full stop. The CCRL engines are rated at least 1000 points higher than most human players on Lichess. The engines don't make simple tactical errors and they calculate deeper into every position than any human can. If there is an unfair chess960 start position, the engines will find it, just like they find errors in most games played between humans.

I can understand ignoring the engines because humans grapple with different challenges in chess960 openings, but the purpose of the study was 'to find if there’s a *start position* that’s better than the others'. Ignoring the experience of the best players on the planet is severely limiting.

The second bias is 'Lichess -- the greatest chess platform out'. The main alternative here is Chess.com. Why ignore games played on the world's largest chess platform? Maybe there's a good reason, but I can't think of one. On a personal note, last year I investigated which of the two sites would be better to continue my own chess960 correspondence play. I determined that Chess.com was more serious about eliminating human players who cheat by using engines in games with other humans. Since my goal was playing no-engine games, I went with Chess.com. How much of the Lichess data involves concealed engine use?

Biases notwithstanding, the Molas study is an important step in evaluating the fairness of all 960 positions in chess960/FRC. I'm looking forward to understanding it in more depth.

25 February 2023

Chess.com Reviews a Chess960 Opening

In last week's post, Chess.com Pinpoints a Tactical Error (February 2023; see the post for a copy of the game's PGN), I used 'Chess.com's Game Review Tools' to find out where I had made the first mistake in losing a chess960 game. This week I'll use the same tools to extract comments on the opening moves.

The following diagram shows the start of the same game featured in the 'Tactical Error' post. There's more I can say about the look and feel of the 'Analysis' tool itself, but I'll save that for a series I'm doing on my main blog. The most recent post in that series was Chess.com's Game Review Tools PGN (February 2023).

Shown on the left is the start position for the game, 'SP350 NRKQRBBN'. On the right is a summary of the overall quality of the players' moves ('Brilliant', 'Great Move', ..., 'Blunder'). For example, the tool considers that both players made one 'Great Move'.


AV vs. bemweeks | Analysis (chess.com)

The following table shows the tool's comments on the first 12 moves of the game (24 ply deep). I stopped the analysis when I reached the move where I committed the 'Tactical Error'.

Move Short
Comment
Long
Comment
Eval  
1.e4 is excellent This prepares the bishop for development. This threatens to reveal an attack on a pawn. +0.13
1...e5 is good This prepares your bishop for development. +0.30
2.f4 is excellent This exposes an attack, threatening a pawn. +0.27
2...Nb6 is excellent Your piece jumps in to protect a pawn! +0.41
3.fxe5 is best Right on target. +0.41
3...f6 is an inaccuracy You are threatening to attack a trapped rook. +1.09
4.Nb3 is a mistake This loses a pawn. +0.05
4...fxe5 is best That wins a free pawn! +0.05
5.Ng3 is excellent One of the best moves. -0.02
5...Qf6 is best You activate your queen by moving it off of its starting square. -0.02
6.Be3 is good This moves the bishop to a better location, allowing it to control more squares. -0.30
6...O-O-O is good Your rooks can see each other now, allowing them to provide mutual defense. +0.02
7.d3 is best That's what I would have recommended. +0.02
7...d5 is best You are threatening to kick a bishop. +0.02
8.Qd2 is an inaccuracy This ignores a better way to develop a queen off its starting square. -0.47
8...Qc6 is good You are threatening to kick a bishop. -0.10
9.Bg5 is good This wins a tempo by threatening a rook and forcing it to move away. -0.38
9...Be7 is a mistake You are threatening to win material. +0.56
10.Bxe7 is best After all captures, this is an equal trade. +0.56
10...Rxe7 is best You trade off equal material. +0.56
11.Nf5 is good This attacks a rook, winning a tempo when it moves away. This threatens to fork pieces. +0.27
11...Red7 is good You have now doubled your rooks, allowing them to team up to create threats. +0.62
12.exd5 is best This exposes an attack, threatening a pawn. +0.62
12...Qxd5 is a mistake You overlooked a better way to recapture a piece. +1.63

There's much more I could say about the comments, but it would not be useful at this point. Here are a few comments that jumped off the screen at me.

  • 8...Qc6; 'You are threatening to kick a Bishop.' • The move defends against a nasty x-ray threat.

  • 9.Bg5; 'This wins a tempo by threatening a Rook and forcing it to move away.' • The move doesn't win a tempo, but it might lose a tempo by forcing the Rook to a better square.

  • 9...Be7; 'Is a mistake. You are threatening to win material.' • It's a mistake to win material? Something does not compute here.

And so on. The long comments are generally lame and show little understanding of chess960 opening objectives. What happened in the fight for the center?

The most valuable part of the exercise is to see the change in evaluation from one move to the next. It reaffirms the severity of my mistake on the 12th move.

18 February 2023

Chess.com Pinpoints a Tactical Error

In last month's post, The Fascinating World of Chess960 (January 2023), I discussed a shift in focus for my own chess960 games. In a nutshell, to play chess960 I switched from a site allowing engines to a site forbidding them. I finished the post saying,
That's the background for a series of posts that I plan to write for my games on Chess.com. There are several aspects to be covered, e.g. Game review tools

I introduced those tools on my main blog in Chess.com's Game Review Tools (February 2023), using an example chess960 game. I'll use the same game in this current post. After four wins, it was the first game I lost on Chess.com starting June 2022, when I adopted the no-engine approach.

The 'Game Review Tools' post used numbers to identify the different screen shots and I'll follow the same numbering scheme in this current post. There are three different review tools that I numbered '02', '03a', and '05a'. The '02' tool shows the moves of the game and the times used for each move. It also provides (1) an entry into the '03a' and '03b' tools, and (2) a PGN download of the moves of the game, without variations or comments.

At this point, I don't see much difference between the '03a' and '05a' tools. The differences seem mainly cosmetic, so I'll continue with the '05a' tool. It's accessed via a feature called 'Saved Analysis'. In the game I was outplayed tactically and didn't know where I had gone wrong.

I had Black in a game starting 'SP350 NRKQRBBN'. I was pleased with my position after the first few moves and thought that I had equalized, maybe even gained a slight advantage. Then suddenly I had an inferior game. Why?

The '03a' and '05a' tools offer commentary on each move played in the game. I won't discuss the early comments in this post, because I'm not yet convinced they are helpful. The critical position is shown in the following screenshot, Black to move.


AV vs. bemweeks | Analysis - Chess.com • After 12.e4-d5(xP)

White has just captured a Pawn on d5 with 12.exd5, and Black has four possible recaptures. The move 12...Rxd5 is a '??' blunder, allowing the family fork 13.Ne7+. That leaves three other moves. Since White's last move also discovered an attack on the e5-Pawn, I played 12...Qxd5, protecting that Pawn. That was a mistake, which the tool flags with the remark '(?) Qxd5 is a mistake'.

What's better? The tool suggests 12...Nxd5. Now if 13.Rxe5, Black has 13...Qf6, when White is in trouble. I didn't see that possibility during the game and never recovered. Following is the PGN as provided by the '02' tool.

[Event "Let\\'s Play! - Chess960"]
[Site "Chess.com"]
[Date "2022.07.05"]
[Round "?"]
[White "Andreasvinckier"]
[Black "bemweeks"]
[Result "1-0"]
[Variant "Chess960"]
[SetUp "1"]
[FEN "nrkqrbbn/pppppppp/8/8/8/8/PPPPPPPP/NRKQRBBN w EBeb - 0 1"]
[WhiteElo "1977"]
[BlackElo "1907"]
[TimeControl "1/86400"]
[EndDate "2022.07.23"]
[Termination "Andreasvinckier won by resignation"]
[initialSetup "nrkqrbbn/pppppppp/8/8/8/8/PPPPPPPP/NRKQRBBN w EBeb - 0 1"]

1. e4 e5 2. f4 Nb6 3. fxe5 f6 4. Nb3 fxe5 5. Ng3 Qf6 6. Be3 O-O-O 7. d3 d5 8. Qd2 Qc6 9. Bg5 Be7 10. Bxe7 Rxe7 11. Nf5 Red7 12. exd5 Qxd5 13. Ne3 Qc6 14. g3 Be6 15. Qa5 Bxb3 16. axb3 e4 17. Bh3 exd3 18. O-O-O dxc2 19. Bxd7+ Rxd7 20. Rxd7 Qxd7 21. Qxa7 Nf7 22. Qa5 Nd6 23. Nd5 1-0

The Chess.com tools offer several different PGN downloads. I'll discuss those in another post on my main blog.