22 January 2022

TCEC C960 FRC4

In a recent post on my main blog, Stockfish Wins Both TCEC FRC4 and CCC16 Bullet Events (January 2022), I made three observations related to chess960:-
(1) 'Stockfish and LCZero tied for 1st/2nd in the FRC4 'Final League', a point ahead of KomodoDragon. Stockfish beat LCZero +13-9=28 in the Final.'

(2) 'A [TCEC] note mentioned, "!bookfrc • Final League and the Final will use unbalanced books [...] On the edge between draw and white win." For more info, see TCEC FRC 4, under 'FRC Book Generation'.'

(3) 'After FRC4, the site ran an event called 'S22 - DFRC Sanity Check'. What's DFRC? "!dfrc • Double Fischer random chess: The same as Fischer random chess, except the White and Black starting positions do not mirror each other. Double FRC has 921,600 (960*960) possible starting positions."'

The 'more info' reference in (2) was for TCEC FRC 4 - TCEC wiki (wiki.chessdom.org), where the nuts and bolts of the tournament are explained. I covered the previous event in TCEC C960 FRC3 (March 2021). One welcome difference between FRC3 and FRC4 was the number of competitors in the 'First phase' of four leagues. In FRC3, there were four engines in each league; in FRC4, six engines were planned, although only 23 engines started the event. After FRC3, I analyzed engine runtime data for the first time in a pair of posts:-

  • An Engine Iceberg (November 2021) • 'TCEC FRC3 was a 50 game match won by KomodoDragon over Stockfish on a final score of +2-1=47.'
  • The Engine Iceberg Looms Larger (ditto) • 'Final match of the CCC C960 Blitz Championship (October 2021).'

It might be useful to repeat the exercise for FRC4, although I should be clear on objectives for the exercise. What can be learned by looking at only a small subset of the 960 possible start positions?

The TCEC wiki page mentions 'unbalanced books'. It's an interesting concept, but perhaps too heavy on the human manipulation: 'following work is done by hand'; 'then (by hand and eye) I choose'; '[sequences] that don't look crazy to me'; 'eliminated lines that looked too drawish or too busted'; 'some looked too artificial, some looked a bit too similar to others'.

Traditional A/B engines have never been particularly good at evaluating the long term consequences of opening decisions. A comprehensive analysis extends well beyond their search horizons. Maybe the AI NNUE engines are better at this, but that hasn't been studied anywhere (that I know of).

A red flag goes up when I see a phrase like 'lines that looked ... too busted'. In the years of writing about and playing chess960, I haven't seen any start positions that were 'too busted'. To the contrary, Black always has resources to counteract White's various initiatives. Perhaps the researcher behind the analysis (Bastiaan) should make available his full analysis showing which positions were eliminated for which reasons.

Another phrase caught my attention: 'not a single position favours Black in my analysis'. This is what one would expect to see in a position between opponents having exactly the same resources, except one gets to move first. Otherwise we would talk about 'first move disadvantage' or 'zugzwang in the start position'.

As for DFRC (FRC squared? chess921.6K?), this is a new area for analysis. It's another example of Gene Milener's idea that I covered in Chess960 Phase Zero (November 2018). A first action might be to examine runtime data from the DFRC games.

25 December 2021

The CCRL Is Unreliable (Not!)

I currently have 21 posts in the category Showing posts with label CCRL. One of those posts lists six early posts from my main blog making a total of 26 posts (21-1+6) for that label. Only one of those posts was written in the five years since I restarted this chess960 blog, indicating that the CCRL (see the right navigation bar for a link) has declined in importance for me. In recent years I rarely consulted it for opening advice.

This month I started a new eight-game (four different start positions) event and decided to take another look at the CCRL data. One of the games, SP777 QRKBBNRN, is pictured below.


SP777 QRKBBNRN

For the game where I had white, I chose 1.Nhg3 as the first move, and the game continued...

1.Nhg3 [x89] b5 [x2] 2.e4 [x12]

...reaching the position in the diagram. The numbers in brackets ('[]') are the number of games in the CCRL file for SP777, out of 378 games total. After Black's first move, only two CCRL games remained, but the number increased after White's second move, thanks to a simple transposition of White's first and second moves.

The first block of text in the image, statistics calculated by SCID, shows Black's moves after 2.e4. The numbers are terrible for White -- in the 12 games, White scored only 25%. When I first saw the stats, I told myself, 'This line is unplayable!', but couldn't see any other reason why it shouldn't be played. It's the most logical move in the position.

The second block of text shows some basic info about the 12 games, listed in chronological order. The last two columns are the most interesting. The WLD scores total +2-8=2 (25%), but the last column shows the 'Length' (number of moves) for each game. Of the 12 games, only four lasted longer than 15 moves.

The first game, Hiarcs - Movei (played 2006), ended '1-0' after Black's 6th move, when the opponent's are still developing their forces. The last game, Stockfish - Dragon (played this year), ended after White's 15th move with the comment 'Black wins by adjudication', although the position is at best unclear.

My conclusion? The CCRL statistics are completely misleading and can't be relied on. My own SP777 game hasn't yet reached the 10th move, so I can't say anything else, because it's still in progress.

***

Later: Re 'Of the 12 games, only four lasted longer than 15 moves', after I wrote the post, I realized I had made a serious error. SCID doesn't know anything about chess960. When it encounters an illegal move, it stops processing the game. For chess960 games, most castling moves look like errors, so SCID stops. Thus the low move counts.

For the first three games in my list, the real plycounts were 99, 511, and 122, where the number of 'moves' in the game (more accurately called 'move pairs', i.e. a move for White plus a move for Black) would be half that. This reminded me of an equally serious error I made on my main blog in the first few months of chess960 blogging, when I miscalulated CCRL statistics:-

My conclusion now? (1) I'm the one that's unreliable, not the CCRL; (2) I shouldn't blog on Christmas day, the date of this post; (3) I should be careful when I find a simple answer to a complicated question.

Re the original observation, 'The numbers are terrible for White -- in the 12 games, White scored only 25%', I'll have to look at it again. The game that provoked the analysis has reached the 17th move, so it is still too early to comment.

18 December 2021

Chess940 in 'Chess Life'

Chess960 finally makes the cover (sort of) of a major chess magazine and what happens? They spell it wrong!

I knew this was going to happen. It was just a question of time. Not everyone is comfortable with Roman numerals, and of those who are, not everyone knows what 'L' stands for.

I already flagged one example in 2020 Champions Showdown, Lichess (September 2020), from the well-respected Leonard Barden of The Guardian. If the dean of chess journalism can blunder like this, anyone can.

What am I talking about? The built-in confusion between 9LX and 9XL, of course. The image above is from the table of contents (TOC) of the December 2021 Chess Life (CL). The small print says,

[Page 36] COVER STORY • MY AMERICAN TOUR • GM Maxime Vachier-Lagrave [MVL] on his victory at the 2021 Sinquefield Cup and his tie for second place at the 9XL Showdown.

That text was repeated in the introduction to the ten page article, making it a double blunder. To make matters worse, '9XL' sounds like a fast food menu item -- 'I'll have one no.9, Xtra Large, hold the mayo.'

I suppose I should stop waving my arms around and just be happy that chess960, aka FRC, aka 9LX, etc. etc., featured in two full pages of the MVL article. Earlier this year I covered the '9XL Showdown' event in two posts:-

There was one encouraging development in the MVL article. In that second post, 'Live', I noted,

Just as they did for past events, the 9LX organizers used a non-standard numbering system to identify the start positions (SPs).

This was fixed in the MVL article, where he discussed three of his chess960 games, vs. GMs Mamedyarov, Caruana, and Nakamura. MVL wrote,

At the Showdown, the players were given 15 minutes before the rounds to analyze the new starting positions with each other. I often looked at variations with Levon Aronian and Fabiano Caruana. To be completely honest, this kind of preparation is often just blindly groping around, but sometimes you find one or two ideas that work.

The Frenchman, currently rated world no.12, once posted a piece on his blog My Chess960 Debut (mvlchess.com), which was at the 2018 Champions Showdown at St.Louis. I covered the event on this blog in Champions Showdown, St.Louis (September 2018). In his blog post, MVL observed,

The idea of "confining" players, without computer access, one hour before the games, was very interesting. In this friendly atmosphere, allowing the players to get acquainted with the position and analyse together, generally by pairs, was worthwhile. [...] Analysis of a brand new position was clearly adding some spice to our daily routine. First of all, by depriving us of the morning prep burden... But also because we rediscovered the pure analytic work, as it was done by the Ancients, before the computer era!

It's been alomost two years since I last mentioned a CL article on this blog; see An Alternative to a 'Boring, Mind-Draining Process' (January 2020). Let's hope the next opportunity won't take another two years.

27 November 2021

The Engine Iceberg Looms Larger

Last week's post, An Engine Iceberg (November 2021), looked at runtime data from 'TCEC C960 FRC3', which I covered on this blog in March 2021. I wrote,
Since the data covers only the first move of 25 SPs (50 games) out of the full set of 960 SPs, it's obviously just scratching the surface. Suppose we had data for the first few moves of all 960 SPs from many different engines played over a long period of time. What might we learn from this?

For this current post I repeated the exercise on the final match of the CCC C960 Blitz Championship (October 2021). I wrote,

In the final match Stockfish beat Dragon +10-1=589. Yes, more than 98% of the final games were drawn.

I loaded the PGN for all 600 games into my database and ran a preliminary analysis. There were two small surprises.

The first surprise was that the data for individual moves was not the same for both the TCEC and the CCC. Here are examples for the first move of the first game in both events.

TCEC: 1. e4 {d=36, sd=36, mt=147236, tl=1657764, s=81363821, n=11979602252, pv=e4 Nb6 Nb3 e5 g3 g6 Ne3 c6 f4 exf4 gxf4 f5 exf5 gxf5 Bf2 Qf6 c3 Nd5 Bc2 Nxe3 Qg1 Ne6 Bxe3 Bc7 O-O-O O-O-O Nd4 Bf7 Rf1 Bh5 Rde1 Nxd4 Bxd4 Qf7 b3 Be2 Rf2 Bg4 Kb2 Rxe1 Qxe1 Rg8 Qf1 Bb6 Bxb6, tb=0, h=99.9, ph=0.0, wv=0.26, R50=50, Rd=-11, Rr=-1000, mb=+0+0+0+0+0,}

CCC: 1. d4 {+0.45/32 9.6s, ev=0.45, d=32, pd=g6, mt=00:00:09, tl=00:04:55, s=148396 kN/s, n=1415409674, pv=d4 g6 e3 d5 g4 c6 c4 dxc4 f4 g5 fxg5 Na6 Nf2 e5 Bxc4 Nb4 Na3 exd4 Qf3 Be7 exd4 Qxd4 O-O O-O Bb3 Ne6 h4 Qg7 Be3 Nd5 Ne4 Nxe3 Qxe3 h6 Nc4 hxg5 Ncd6, tb=0, R50=50, wv=0.45}

Fortunately, the important 'wv' and 'pv' fields are available for both events. Any other fields I decide to use might require some sort of conversion.

The second surprise was that the CCC start positions (SPs) were not repeated for a second game, colors switched, between the engines. Instead, a new SP was assigned to each game. The left table in the following chart shows that some SPs were nevertheless repeated up to five times.

In addition to the six SPs shown in the table, 24 SPs were repeated three times and 90 were repeated twice. I assume that the SPs are chosen randomly for both the TCEC and the CCC, perhaps with the exception of SP518 RNBQKBNR, but I know from past investigations that several bad algorithms are in use elsewhere; see Start by Placing the Bishops (September 2017) for examples.

The center table in the chart shows the number of times a certain first move was chosen across all 600 games. For example, the initial moves 1.a4 and 1.b3 were both chosen 19 times. Just as in SP518, advancing a center Pawn two squares (1.c4, 1.d4, ...) is the most popular opening strategy. Although any single SP has a maximum of four initial Knight moves, sometimes only two or three moves, all eight moves are possible across the 960 SPs.

There are a number of questions for further exploration. When is the advance of an edge Pawn -- 19 x 1.a4 or 5 x 1.h4 -- desirable? I suspect these are position where the Queen starts in the corner behind the Pawn. Why the large difference between the counts on the two edge Pawns? Perhaps this is because of castling O-O/O-O-O considerations. Also worth noting is that O-O/O-O-O was never chosen for the first move.

The rightmost table in the chart gives a rough distribution of initial 'wv' values, i.e. what value did the engine calculate for its first move? These are truncated values, e.g. the CCC 'wv=0.45' shown above is counted in the table as 'wv=0.4'. I could have used roundoff and a bar chart to display the counts more accurately, but I ran out of time.

One big question presents itself here. Why are there so many 'wv' greater than 0.5, but so few decisive results during the match? I also need to determine if Stockfish and Dragon calculate values in the same statistical range. I doubt that they do.

The three tables in that chart lead to many questions and few answers. I'll take this up again some other time.

20 November 2021

An Engine Iceberg

In the previous post, CCC C960 Blitz Championship (October 2021), I wrote,
Given that engines' evaluations for every move are available in the event's PGN game scores, perhaps there is something to be learned about the 960 different start positions. That investigation would make a good follow-up post.

Make that two good follow-up posts. The first post was on my main blog, Evaluating the Evaluations (November 2021), where I concluded,

Now that I have a tool for rapidly evaluating the engine evaluations, what can I do with it? The first task will be to put it to work on the 960 start positions used in chess960.

The second post is this one. I had already downloaded a few PGN files from recent engine vs. engine events, so the first question was which one to use. I decided to continue with the games from an event that I covered earlier this year in another post on this blog, TCEC C960 FRC3 (March 2021). At that time I noted,

Except for an occasional CCRL game, I can't remember ever looking at an engine vs. engine chess960 game. Is there anything to be learned from such an exercise, or is the play of the engines beyond comprehension?

TCEC FRC3 was a 50 game match won by KomodoDragon over Stockfish on a final score of +2-1=47. The seven mandatory tags in the PGN header for the first game look like this:-

[Event "TCEC Season 20 - FRC3 Final"]
[Site "https://tcec-chess.com"]
[Date "2021.03.14"]
[Round "1.1"]
[White "KomodoDragon 2671.00"]
[Black "Stockfish 20210226"]
[Result "1/2-1/2"]

I loaded the file into my database, added the concept of SP, and produced the following chart. It covers the first 22 games of the match. Each start position (SP) was played twice, where KomodoDragon always had White in odd-numbered games. In a match between humans, this pattern would risk giving an advantage to one of the players, but in games between engines, it's harmless.

The last two columns show the first move, as chosen by White, and the value ('wv') calculated by the engine for that move. I could have also shown the principal variation ('pv') calculated by White, but that wouldn't add much to an initial understanding of the data. The same data is available from the PGN file for all moves by both sides in a game.

Since the data covers only the first move of 25 SPs (50 games) out of the full set of 960 SPs, it's obviously just scratching the surface. Suppose we had data for the first few moves of all 960 SPs from many different engines played over a long period of time. What might we learn from this? I would want an answer to that question before spending too much effort collecting more data.

30 October 2021

CCC C960 Blitz Championship

There's one more idea left from the recent post, Crossover Ideas from my Main Blog (October 2021):-
Review the recent CCC chess960 tournament • The semifinal finishes this weekend. Can we expect a final? Short answer: Probably.

Change that 'Probably' to 'Yes'. In the most recent post in the ongoing TCEC/CCC engine saga, TCEC Cup 9, CCC C960 Blitz Final : Both Underway (October 2021), I continued,

In the 'Chess960 Blitz Semifinals', Stockfish finished a point ahead of Dragon as both engines qualified for the final match. Only one game of their 40-game [semifinal] minimatch was decisive, with Stockfish winning. Lc0 lost three games to each of the two engines, winning none. The other three engines were far behind.

In the final match Stockfish beat Dragon +10-1=589. Yes, more than 98% of the final games were drawn. Earlier this year, in TCEC C960 FRC3 (March 2021), I reported,

In the 'FRC 3' final, KomodoDragon beat Stockfish by a score of +2-1=47. A 94% draw rate echoes the sort of result we expect from a traditional chess match (SP518 RNBQKBNR) between engines.

Note that the CCC's Dragon and the TCEC's KomodoDragon are the same engine. It's also worth noting that Stockfish switched to NNUE evaluation last year, while Dragon is also an NNUE engine, as I noted a year ago on my main blog in Komodo NNUE (November 2020). Is the high percentage of draws because they both use the same technology for evaluating positions?

The following chart shows the result of the CCC semifinal round. Stockfish and Dragon finished 1st and 2nd, ahead of 3rd place Lc0 and three other engines. I know the black background makes the chart hard to read, but the individual game results, especially the losses in red, are clearly discernible.

Stockfish didn't lose a single game during the event, while Dragon lost only one game, to Stockfish. As mentioned above, both engines beat Lc0 three times, which itself lost only a single game to the three engines in the bottom half of the crosstable. The bottom half is a sea of red.

Given that engines' evaluations for every move are available in the event's PGN game scores, perhaps there is something to be learned about the 960 different start positions. That investigation would make a good follow-up post.

23 October 2021

GM Carlsen's Online Chess960

Continuing with last week's post, Crossover Ideas from my Main Blog (October 2021), the second idea was to 'Review Carlsen's chess960 activity':-
So far I've identified two chess960 events for the [Carlsen] TMER. Were there others? Short answer: Yes.

The TMER is Magnus Carlsen's Tournament, Match, and Exhibition Record (2000-), currently up-to-date only through summer 2018. The two chess960 events were tournaments played on Lichess:-

GM Carlsen played using the handle DrNykterstein (lichess.org). From there we find three more events. The first was another 'Titled Arena' tournament, although he played only five games:-

The others ('Fischersjakk'!) were restricted events: 'Must be in team Offerspill Sjakklubb':-

Although the Lichess events are missing, the TMER records three other chess960 events, the last two currently marked 'In preparation':-

2018-02 Fischer Random Rapid/Blitz 2018; Baerum NOR
[...]
2019-10 World Fischer Random 2019; Hovikodden NOR
2020-09 Champ Showdown 9LX 2020; Lichess.org INT

All three have been covered on this blog. The last two 'In prep' events were:-

The 'Champions Showdown' is worth special mention because it was played on Lichess, but doesn't show up on the DrNykterstein account. If we follow a page for the three day event...

...we see that Carlsen played on the account of STL_Carlsen (lichess.org). The other players in the elite event also played under 'STL' (St.Louis) names. Before playing on Lichess as DrNykterstein, GM Carlsen had another account, DrDrunkenstein (lichess.org). There are no chess960 games recorded on that account.