Here is a proposal for a new performance statistic, based on win rate. Bear with me here, after a couple of paragraphs of preamble, it will get to the point.
First of all, why have a new statistic? Currently there is win rate (WR) and other statistics (WN7, WN8, EFF). Each of these can be used per tank or per group of tanks (overall, by tank-type, etc). Consider unadjusted win rate – it is especially useful per tank, where it gives the overall expected player contribution to win probability, given the various advantages they chose to use (premium ammo, platoons, mods, etc.). If averaged over groups of tanks it is much less useful, as it depends on the specific tanks and their rate of play. The other statistics are intended as measures of inherent player skill, compensating for average performance of the various tanks so as to control for a player’s tank-mix and their relative rates of play. While these may offer good measures of skill in some ways, they are hampered by complicated formula, and can incentivize play to maximize their value, but not team wins.
In the abstract, to measure skill you might assign players the same tank, same crew, same map, or perhaps they would randomly be assigned a one of a number of pre-configured tanks, etc. In reality, in addition to choosing their tank, the player outfits the tank, chooses crew, and may arrive in a platoon of good (or bad) players. To measure skill, the goal is then to control for the effects of these player choices, after having identified likely influences. Unfortunately, WOT does not provide data for most of these (tank/crew configuration, platoons, etc), although there is some information available through the post-game files and VBADDICT. This can be used to improve on the basic win percentage (as per WN7, WN8, EFF) as a measure of skill.
The new statistic can be called a relative win rate (RWR, following relative risk in finance). For a specific tank it is the average win percentage by the player, divided by the expected win rate, as available through VBADDICT. If RWR>1 then it is better than average performance, RWR<1 is worse.
A difficulty with RWR (shared with the WR) is that for small samples it will be highly variable, as it takes many 0s and 1s to measure percentages with precision. And the situation is worse for the RWR, as it is about twice as variable as the percentage it is based on (since 100/typical expected ~2) There are a number of ways to deal with this, including the simplest approach (as with WR right now) of doing nothing. Preferable would be to report it as an interval estimate, so that the variability is clear (because it still lacks precision, even after 100 battles).
RWRs can be combined across different tanks by averaging. Like relative risk, you would use a geometric mean, so log transform, then average, then back-transform. The small sample size problem is an issue here, since 0% and 100% are found in small samples. Simple approaches would be to wait a certain number of battles (30?), or to wait for the first win and loss before including a tank in the average. Also simple (and preferable I think) would be to calculate the win percentage as (y+1)/(n+2)*100%, where y is the win count and N is the total number of games (this has justifications in statistics; it stabilizes the estimate and performs better than the direct proportion in small samples).
A weighted average can be used to adjust the averages for the different frequencies that tanks are played on an account. So, on the log scale, term i in the sum is multiplied by n_i/N where n_i is the number of games in the tank and N is the number of games in the group (groups such as all tanks; just spgs; etc.).
As an example calculation, suppose a player has two tanks, Rudy and KV-5. They win 60 out of 100 in the Rudy and 45 out of 100 in the KV-5. From VBADDICT, the expected win rate of Rudy is 56% and that of KV-5 is 52%. Their RWR in the Rudy is 1.07 and in the KV-5 is 0.87 (no small sample size adjustment here for clarity). Combined over both it is 0.96. If instead there were only 20 games in the Rudy with 12 wins, then the individual tank RWRs are the same, but the combined is now 0.90, down-weighting the fewer Rudy games. The expression for this is RWR=exp((N1/N *log(RWR1)+N2/N*log(RWR2))), where N=N1+N2.
I ran a little simulation to check its performance on a larger size sample. Suppose the player has 50 tanks, and plays 10, 20, …, 400 games in each (total = 8200 games) where each of these tanks has (from VBADDICT, say) a respective 40%, 41%, …,.60% win rate. The player is more skilled than average and has relative win rate of 1.04 on each tank. The simulation gave an estimated combined relative win rate of 1.035. An approximate 95% confidence interval for this is (1.014,1.055).
Here is a plot of the RWRs of the individual tanks in this simulation showing the decreased variability with tank number (red line is 1.04)
Running the simulation 100 times gave a distribution of the combined (over all 40 tanks) RWR
so the point estimates are pretty close, with means and medians where expected.
For reporting purposes, since the first few digits are often the same, it may be more interesting to report as a percentage. For example RWR 1.038 is 3.8% or 0.95 is -5%.
In conclusion, strengths are that it is simple to understand, is an improvement over WR for measuring skill (as opposed to predicting match outcome), and that it only goes up when the player wins games (as opposed to by damage, spotting, etc.), thus focusing directly on team play. Measures of uncertainty such as confidence intervals can also be calculated ether per tank or for group of tanks. Some weaknesses are that it cannot control for unmeasured effects such as platoons etc, and it does not control for skill differences among players in different tank types (although this can probably be done using existing data). That said, given what it does do, it can act as a complement to the various stats already available (with their own strengths and weaknesses).