Online Barbu Ladder
Links, Trivia, Photos
How the rating system works
Summary of how it works
Each player has a Playing Strength. An average player would have a Playing Strength of zero.
Your Playing Strength is supposed to represent the score that you would expect to get if you played in a game with three average players. More generally, in any game, you would expect to score the difference between your playing strength and the average playing strength of the other three players.
For example, if you have a Playing Strength of +20, that means that:
· In a game with three average players (ie all three have Playing Strength zero) you would expect to score +20.
· In game with a +10, a zero, and a 10, you would expect to score +20.
· In game with a +20, a +80, and a 10, you would expect to score 10.
Your Playing Strength is recalculated after every game. If you do well, your playing strength goes up; if you do badly, your playing strength goes down.
For Playing Strength calculations:
· Your recent results count more than your older results
· Every game you have ever played counts (but each time you play another game, the older results become less significant)
Each ladder covers a fixed period. All the games you play during that period count equally towards your ladder score.
After each game, we add your score from the game to the average Playing Strength of your three opponents. That gives us a score which is adjusted to take account of how strong your opponents were. For example, if you got +300, but your opponents had Playing Strengths of 60, -40 and +10, we would give you an adjusted score of +270.
Then we take the average of all your adjusted scores for the period.
Finally, we adjust this average using a formula which reduces your score if you have only played a small number of games. The fewer games you have played, the more your score is reduced. This is because its possible to get a very high average over a small number of games, but this isnt as great an achievement as getting a lower, but still large, average over a lot of games.
For Ladder Score calculations:
· All the games you have played during the ladder period count
· All those games count equally
· Games that you played before the ladder period dont count
· The opponents playing strengths do affect your score
· Your own playing strength doesnt affect your score (but it does affect your opponents scores)
Playing Strength Calculation
When a player first starts playing online Barbu, he is assigned a "playing strength" of 0.
After each game, a score is calculated for each player using this formula:
adjusted_score = score + average_playing_strength_of_opponents
Each player's playing strength is then calculated like this:
Sn + KSn-1 + K2Sn-2 + ... + Kn-1S1
1 + K + K2 + ... + Kn-1
n Number of games played
Rn Playing strength after n games
Sn Adjusted score for nth game
The idea is to make every result count, but to make results progressively less significant the less recently they occurred. The value of K gives a result a "half life" of 100 games.
The new playing strength can also be expressed as a function of its previous value:
K Rn-1 (1 - Kn-1) + (1 - K) Sn
1 - Kn
The playing strength calculation uses all the data we have for the player, including results from previous ladders.
For a players first four games, his playing strength, for the purpose of calculating the adjusted_score for each other player, is converted to
adjusted_playing_strength = playing_strength x games_played / 5
The reason for this is that new players often start with extreme results, so until we have more reliable information about their ability we assume that their playing strength is closer to average than their early results suggest.
Adjustment of playing strengths
The playing strengths calculated by this method are not zero-sum, because the total change in playing strengths for a given result depends on the initial playing strengths of the four players. Hence, for each result, we make a small adjustment to the playing strengths of each of the four players so as to make the net change in the total playing strength be zero. (Until January 2013, we did this by adjusting the playing strength of every active online player, not just the four involved in the game, but this had undesirable consequences.)
Ladder Score Calculation
At the beginning of the period covered by the ladder (or when he first starts playing, if he starts part way through a ladder period) each player is assigned a "rating" of 0.
After each game, each player's rating is recalculated. First an adjusted score is calculated like this:
adjusted_score = score + average_playing_strength_of_opponents
This is the same figure as is used in the playing strength calculation.
Then the new rating is calculated as:
(((games_played - 1) * existing_rating) + adjusted_score)
These calculations use the data just for the period of the current ladder (but, as explained above, the playing strength figures are based on historical data as well).
Adjustment for number of games played
Next, the ratings are adjusted according to the number of games played using the formula:
adjusted_rating = new_rating x erf(games_played / 20) + 1000
where "erf" is the Error Function
The purpose of this formula is to make a very high average over a small number of games equivalent to a lower, but still high, average over a larger number of games.
The rationale for this is that a high average over a small number of games may occur through good luck, but a high average over a large number of games is much less likely to occur. With many players playing between 10 and 15 games a month, one of them is likely to have a good run, so a player who plays a large number of games has almost no chance of winning. Looking at it another way, scoring +50 over 40 games is harder than scoring +80 over 10 games.
The multiplier starts at zero, climbs steeply to about 20 games, levels off between 20 and 30 games, and becomes almost a constant of 1 at around 40 games (see graph below).
The formula expresses the degree of confidence one can have in a rating for a given number of results. A rating derived from only ten results might be very lucky, so it is reduced by about 50%. A rating from 20 games is more reliable, but not completely so, and so is reduced by about 15%. 40 games is enough to make a rating very reliable, so almost no reduction occurs.
In the above formulae, existing_rating is the previous value of new_rating, rather than of adjusted_rating. (That is, the adjustment is made only to the final rating.) The figure shown on the ladder page, and used for determining the winner, is adjusted_rating.
The addition of 1000 makes every rating positive, in order to provide consistency with earlier methods of calcutlating ratings.
Previous methods of calculating playing strengths and ratings
April 2005May 2005
The adjustment for the number of games played was calculated as:
adjusted_rating = + 1000
This formula excessively favoured players who had played a very large number of games, and was replaced by the current system after two quite absurd monthly ladder results.
June 2004 March 2005
There were four differences:
1. For scores used in playing strength caclulations, and for positive scores used in rating calculations, the adjusted score was calculated as
adjusted_score = (400 * erf(score / 400)) + average_playing_strength_of_opponents
"erf" is the Error Function and takes a value between -1 and 1. Its effect is to reduce large scores, in a similar way to the IMP scale at bridge. The disadvantage is that it causes the playing strengths to converge, so that in a game between two players of disparate playing strength it predicts a much lower difference in their scores than would be expected.
2. The half-life for playing strength calculations was 20 rather than 100.
3. There was no adjustment of the playing strengths to make them zero-sum.
4. There was no adjustment of the ratings to take account of the difficulty of obtaining a high average with a large number of games.
January 2004 May 2004
The playing strength was calculated as
(n - 1) Rn-1 + J Sn
n + J - 1
J = 1.04
This method produces reasonable results for a player who has played a small number of games. However, the decay in the contribution of a score is non-linear, and for a player with a large number of games this decay becomes tiny. Hence in May 2004 the current method was introduced, and the ratings for May-June 2004 were recalculated using the new method. The scores for previous competitions were left unchanged.