"But the stats should be telling me what would happen in an equal match up, where both players have access to the same weapons and are at the same health."
The rank is no longer valid because it does not reflect Goldeneye Source As Played On This Server, but rather some fantasy land where everyone stands back-to-back before walking ten paces. You're focusing on one element of gameplay (the all-things-even duel situation) at the cost of ignoring the whole of the game.
"it's easy to only join the server when completely new players are on."
My formula does compare the players' current stats. A high-ranked player defeating a low-ranked player is under-weighted. That's part of getting away from the HLstatsX points system; if your rank is high and you beat a first timer, you don't get much/anything from it because that player has little/no record to prove that you actually overcame a challenge.
Let's look at the very first graph again, but at 10:1 responsiveness, which really shows real-time hot/cold-streaks instead of long-term performance.
My line (purple) quickly approaches Perfect at first, because it was basement and I was simply not dying against newbie players. (I got my no-deaths acheivement and two 6-Awards in this session.)
Someone (red) showed up who knew how to fight, and my inflated -by-newbies rank is quickly normalized until I adapt my strategy and start winning my fights again. Then a third player (olive) entered and started losing against me but winning against Red until the end when he adapted to Red.
Beating up newbies to inflate rank is impossible to avoid since, if the population of gameplay events is one guy beating up newbies, then he's the best player in the pool by a wide margin. You seem to want to force all players to compete against a hypothetical metric, a perfectly matched opponent HP/AP/Arsenel-wise but with flawless gameplay as the standard. But that both does not exist and is not seen in the actual game.
Anyway, I'm tired of arguing this. Your vision of how players should be ranked and my vision of a way that they can be ranked practically and usefully differ.
If you wish to participate further, you could throw me some server logs. If you have a large number of regulars with whom you are well-familiar, you could look over my results and provide feedback to tell if my evaluation agrees with how you would sort them, despite not taking AC-10 and spawnkilling into account.