An introduction to fancy hockey stats

 Eric at the Flyers blog Broad Street Hockey has recently published a couple nice advanced stats primers as well: Team | Individual

As many here have seen here hockey is starting to warm up to fancier statistics. A ton of really smart, advanced statistical analysis has been done by many people over the last 5-8 years that has seen a great deal of new useful statistics (Corsi, OZone%, PDO) replace more traditional statistics (±) in effective player evaluation. I've had a number of people lately ask me to explain some of them to them, and when a Wild fan finally broke down and asked I figured that it was time to put it in a post, instead of individual e-mails. I'll keep the Wild fan's name anonymous so she doesn't get ostracized from her collective. We'll call her E. Wiener for anonymity's sake. Wait that opens way too many childish jokes, I'll call her Emilie W instead.

One of the key points that really gets lost is that the only math you need to know to understand these statistics is: +, -, x ,÷. Yes, there was a lot of advanced statistical mathematics that went into the development and checking of these stats, but that math is completely unnecessary to put them to use in day-to-day player and team evaluation.  I will put some links to some posts that show the math at the bottom, so if you really want you can check it out, but knowing the math behind the stat isn't necessary for knowing what the stat measures.

And that's a key point, a statistic is a measurement. The better the statistic (and the more events that make a statistic), the better the value. Many have heard me say ± is useless, and it's because using ± is like using a sundial to measure a second, or an unmarked yardstick to mark off an inch. The fidelity of ± is so poor that there's no fidelity to the measurement. It's worthless. Luckily for us, some people have developed some statistics that provide a ton more fidelity.

This is not a post where I go out and prove the things I say below. There's a ton of data and thought and math that has gone into a lot of it. I am mentioning conclusions based on those (which I have read, examined a TON of the last 4-5 years, by the way.) So if I say something like "It turns out that teams aren't able to affect their shooting %  all that much" it's not just some opinion I'm throwing out willy-nilly. It's based on a lot of math and stats. Not to say there can't be disagreement, but I'm not saying anything lightly here.

So here's a list of the most important fancy stats out there, in my opinion.

What does it measure:
Puck Possession

By far the most used and most talked about stats of the advanced stats crowd, Corsi and Fenwick are like fraternal brothers to one another. Corsi is total shots on goal: [shots on goal(including goals) + missed shots + blocked shots)]. (notice only + signs there). Fenwick is the same thing without the blocked shots counted.

Both stats are normally displayed either in a ± fashion (such as on the Behind the Net website) or as a ratio in a % fashion (such as on Hockey Analysis website)

Why it's useful: Both these stats heavily correlate with puck possession. It also turns out Puck Possession correlates with winning, pretty well actually.
Team Corsi is extremely useful for measuring how well a team is playing (given a large enough sample size, by the way).  Individual Corsi is good too, but it needs to be put into context a little more (on a team level things like competition level and zone starts even out a lot more than at an individual level)

 So if it's similar to ±, why is it useful but ± useless: sample size, sample size, sample size. Sample Size is the key to any good statistic, and there just aren't enough goals scored in any given season, while a player is on the ice, for ± to tell you anything. This sample size thing is hugely important, and even though it is hugely important, many in the advanced stats world can forget it from time to time (myself included, as I'll explain in a later post). You need hundreds of individual events to form a significant sample size.

What it measures:
Even Strength Save % of a goalie/goalies

Why it's useful: Wins and GAA are too dependant on team play to give an accurate judgement of goalies. But Save % is something a goalie has almost, almost, exclusive control over. It turns out that teams aren't able to affect their shooting % all that much For example last season. The ES comes in handy because teams PKs vary, mainly in # of instances (and a goalies Sv% does go down on the PK). (Yes that means players do play a hand in shooting quality, but it turns out that most players are about the same on defense, so it pretty much normalizes out with little variation in the quality of chances being given up at the NHL level.).

So ESSV% is the best evaluator of goaltender play. It's why astute Hurricanes fans weren't that worried about Cam Ward being hurt. His ESSV% was only .917, which is pretty pedestrian, so subbing in a decent backup wasn't as big as a downgrade as some feared. Dan Ellis has one recent season where his ESSV% was lower than Cam Ward's current ESSV%. Carolina's play hasn't suffered.


What it measures: Number of time a player starts (takes a faceoff) in the Offensive zone vs starting in the Defensive Zone

Why it's useful: It helps build a picture of how a coach uses a player. In general a player used in the Offensive zone more often is going to score more points than one used in the defensive zone more often. The Sedins took a massive jump in points produced around the same time their Ozone usage went up. In 07-08 they were started in the Ozone in the mid 50% range.

Then the Canucks started shiting them heavily in the Offensive zone starting around 09-10 (check out 10-11 for sure) and their points jumped from the mid 70-low 80 point range to Mid 90-100 point range.

Conversely players in the low %s tend to play more defensive game. and are being used as defensive players. It's why I have defended Stastny's point production this year especially: he's being used very heavily in his own zone this season, the way a 3rd line center would/should be used.

As you can imagine this zone start has a large effect on Individual Corsi ratings, and should almost always be used when looking at individual corsi of players.


What it measures: Corsi of the on-ice competition

Why it's useful: This is, by far, the fuzziest of the stats I have listed here. I don't really trust it as a number, per say, but I usually compare rank on the team. Player X plays the 2nd toughest competition on the team. Player Y plays the softest competition on the team. I usually only use rank on team and not really look at the number too much, unless a major discrepancy shows up.

That said, it's still useful to see who guys are playing against. John Mitchell and Paul Stastny have similar point totals (15, and 17 respectively) but Stastny is playing, by far, the harder competition.  Is 17 points playing against the Toews, Datsyuks, Sedins and Parise's of the world more impressive than 15 points whoever is on those teams third lines? Yeah it is. Those points aren't created in a vacuum, and Stastny regularly goes against the other teams toughest competition.

What it measures: Fortune

Why it's useful: PDO is actually two other useful stats combined together: ESSV% + ES shooting percentage. ESSV% we covered, and ESsh% is a cousin of it. It's the on ice shooting percentage of a team. For individuals it's the ESsh% & ESSV% while a player is on the ice. It turns out that teams don't have much control over how well they shoot, and have a lot more control over how much they shoot. Shooting %'s tend to be in the 6-9% range for teams, and those %'s aren't repeatable from year to year. So a team with a giant shooting percentage will regress, and a team with a terrible one will egress to the mean.

a player or team with a PDO far away from 1000 is typically having a streak of luck (good or bad depending on the direction away from 1000).  And teams with shooting percentages much higher than 9.5% are said to be shooting well above their true talent level.

The exception here is that really good goalies can maintain a very high ESSV%, this is why teams like Vancouver and Boston in the Thomas years routinely have PDOs over 1000, their goaltenders are head and shoulders better than average.

These 5-6 stats are a really good place to start when it comes to Advanced stats. There's a lot more, and a lot of ways to represent these, but If you start to get a handle on these 5-6 it will really provide a lot of insight to your hockey experience.


If you want to look more in depth into these stats, here's some articles I found interesting and illuminating:

Using Adv Stats to evaluate a players performance - Cam Charron: Canucks Army, Leafs Army, The Province, others.
Randomness of ± - N Greenburg: Japer's Rink
Shot Quality correlation to Corsi - Eric T NHL Numbers/Broad Street Hockey
Shots, Fenwick, & Corsi - JLikens Objective NHL
Forrest vs Trees - Vic Ferrari: Irreverrant Oilers Fans

Saving the best for last:

Zone Time - Vic Ferrari Irreverant Oilers Fans