Among the confirmed participants are last season’s champion Houdini, the finalist Stockfish, as well as Komodo, Arasan, Gaviota, Hiarcs, Gull, Equinox, Hannibal, Shredder, Alfil, Exchess, Rybka, Tornado, Octochess, Critter, Naum, Crafty, Spike, Junior, etc.
Stockfish is back for a title search, this time with the new release of Stockfish 4. Here is an interview with three key members of the Stockfish team – Marco Costalba, Gary Linscott and Joona Kiiski.
Congratulations on the release of Stockfish 4. Let us know more about the people behind the engine. What is your role in the Stockfish team?
Joona: Developer. I used to be much more active than what I’m nowadays though. In the early days it was easy to develop many big features and get huge ELO gains with a single patch. Currently it’s mostly just little tweaks here and there.
Gary: Marco and I developed the distributed testing framework, and I have had a few successful patches committed to SF as well.
Marco: I am a Stockfish developer and the one that, together with Tord Romstad and Joona Kiiski, started this Stockfish thing out of Tord’s Glaurung sources. I am also the maintainer, it means the one that commits the changes that turn out to be good to the Stockfish master development branch.
Now that Stockfish has an open framework for testing it means that anyone that has a computer can lend processing power to you. How significant do you think this has proved to be?
Joona: Very significant. Lots of computing power means that the error bars of each test are much lower than before and we can detect small improvements with great accuracy.
Gary: It has been successful beyond my initial hope! There have been so many people testing interesting ideas, and people contributing their CPU time to validate the tests. It’s a great collaborative effort, all happening out in the open, which makes it much more fun :). I would be remiss without mentioning the role that Github plays in the testing framework as well. Their collaborative model is the foundation for the testing framework
Marco: According to our internal tests we have increased about 55 ELO in just the last 4 months: this means impact of Gary’s distributed testing framework has been huge. But it is not only the hardware. It is the people. You said “anyone that has a computer can lend processing power to you” I would add “anyone that has an idea and is able to write some C++ code can queue up a test and verify his idea in a rigorous and statistically sound way”. Currently there are 16 people registered as “developers”, it means with the credentials to run tests: this is what will make the difference in the long term, IMHO even more than the raw computing power. Processing power for tests is a must in modern chess engine development because each change requires tens of thousand of games to be validated, and more the change is subtle and small, more games are needed. What happens in today engines is that the biggest amount of improvements are subtle and small, there are no silver bullets, but a series of little steps that at the end of the day, summed together, make the difference. Extensive test is needed to verify that each small step goes in the right direction, not backward. And this requires huge amounts of power that only a network of distributed machines can provide in an efficient and timely way.
Has any new ideas been implemented in the code compared to Stockfish 3, or does the expected strength increase come from bug fixes and tweaks of existing code?
Joona: I’d say, mostly it’s small tweaks or small new features.
Gary: There have been some very interesting new ideas. Counter moves and pruning based on evaluation trends are two big search related ones off the top of my head. There have also been a bunch of great evaluation features added, along with good tweaks and bug fixes! So a mix of everything :).
Marco: Both. There have been some new ideas, and many tweaks and little improvements. Also the new ideas, at this level of engine maturity, are actually refinements, for instance to move ordering or to pruning. Definitely there have been new people that contributed for the first time to Stockfish.
In the previous Season of TCEC Stockfish 3 lost narrowly to Houdini in the Superfinal with a score of 23 to 25. Have you looked at, and learned from, any of these games – and what are your expectations for Stockfish 4 in this second Season of nTCEC?
Joona: I’ve looked at most of the games between SF and Houdini. Even though I can occasionally see that SF misevaluates some positions, trying to fix the issue usually makes the program weaker overall. Just out my hat, I’d say that SF has around 20% chance of winning the tournament.
Gary: Yes, definitely. Those high quality games give inspiration when searching for ideas on improving the engine. But, everything has to be validated, and the bar for a successful patch is really high, so it’s rare to have the improvement directly go into the code. Usually many iterations are required. My expectations for the upcoming season are high, but there is incredibly strong competition out there. It will be an exciting season :).
Marco: I am not a strong chess player, so for me it is very difficult to understand looking at the games what can be improved, especially at this level. For a given position a human and an engine look at very different things. Given a position a GM will analyze it, pawn structure, mobility etc. But for the engine it is very different, it will completely ignore the given position and will analyze instead tens or hundreds of million of different positions that are, say 20 or 30 moves ahead of the current one, and the engine’s “best move” will be based on the best far ahead position it will find. So saying for instance that “engine misevaluated that this pawn is weak” is really comparing apple with oranges: the engine will evaluate the pawn structure but at the end of thousands of variations, tens of moves long. This makes “learning from games” a very slippery art, sometime more superstition than art :-) In the previous TCEC season Stockfish reached the Superfinal, and this was a surprise for me. I think we took advantage of the fact that Komodo at the time was still single thread, now that Don has finished the SMP version I expect one of the two Superfinal engines will be Komodo….I don’t know for the other one, should be Houdini according to statistic. :-)