Explanation of Statistics
**Back to Reference Index**
### Sabermetrics

### Normalization

### Glossary / Formulas

By Larry Macdonald

As everyone reading this is aware, baseball has lots and lots of statistics. It seems like there are statistics for everything. People like to joke that there is a statistic for hitting the most home runs in afternoon games on Tuesdays against left-handed relief pitchers on Astroturf. It’s certainly true that somebody must have that distinction, and that it may be possible to find out, and the reason that you can find out is because baseball does a great job in collecting statistics. It’s up to the serious students of the game to sift through everything and figure out what is meaningful and what isn’t, and that’s what we are going to start with today.

There are many ways of categorizing statistics. The first is to classify a statistic as either a counting statistic or a rate statistic. Counting statistics are those that take on the values 0, 1, 2, 3, 4, etc. You start at 0 and each time you accomplish it, you increase by 1 (usually). Examples are the number of hits that you have, or runs scored, or wins for a pitcher. Rate statistics are usually figured out by dividing two counting statistics. People that do well in counting statistics have done something a lot, while those who do well in rate statistics have done it frequently given their opportunities, and a player with fewer opportunities can rate as well as players who have played more, and that’s where the distinction lies.

You need both counting and rate statistics to measure the abilities of players. Perhaps the best example is when voters choose the best pitcher in each league, to award the Cy Young Award. The two statistics that they give the most weight to are their wins (a counting statistic) and their earned run average (a rate statistic, calculated by dividing the number of earned runs allowed by their innings pitched divided by 9). When a pitcher leads the league in both categories, it is rare that there is a close vote, but if two different pitchers lead those categories, there could be very interesting balloting.

Another important consideration in evaluating the statistics of a player is his breakdowns. Games are played in different circumstances, and some players are better at taking advantage of some situations than others. The classic example is taking advantage of the platoon situation. In general, right-handed batters hit better against left-handed pitchers, while left-handed hitters have the advantage against right-handed pitchers. This doesn’t mean that all left-handed hitters are better than right-handed hitters whenever there is a right-handed pitcher facing you, but if you have two players of relatively equal quality, you may get an edge by using the one that gives you the platoon advantage against the starting pitcher that day.

Another important breakdown is how well a player does in a ballpark. A player who hits a lot of fly balls may lose a number of home runs when he plays in a large stadium, and may pick up a number of cheap home runs in a bandbox. When you are setting your line-up, it may be that you would want to take advantage of that. There are also some parks where there is an advantage based on that handedness of your players. If there is a small left field, for example, you may prefer a right-handed hitter, even when there is a right-handed pitcher, to take aim at the left-field wall. In managing your pitching staff, you may want to have mostly right-handed pitchers going in that park, knowing that most of the hitters that you will face will be right-handed, giving the platoon advantage back to the pitcher, and also forcing the switch-hitters to bat lefty.

It’s also good to understand the park breakdowns for evaluating players in this way – players who accumulate their statistics in parks that are advantageous to the offense are not as valuable as those who accumulate similar statistics in pitchers parks. Parks like Coors Field increase offense by so much that each hit or home run has less impact on winning a game, so they have less value. Similarly, since it is easier to accomplish a high batting average in those parks, players who play there a lot are not as good as their statistics appear to be. A player who hits .300 in an average park might hit .330 in Coors; he might hit only .270 in Comerica.

Not all of the possible breakdowns are important. A player who hit .500 in afternoon games on Tuesdays against left-handed relief pitchers on Astroturf might have only two at bats in those situations, and there isn’t a lot of specific skill to many of those breakdowns. Hitting on specific days of the week is not generally a skill, but a result, and players who do well in these breakdowns is a matter or trivia rather than something that you can use to take advantage of situations.

In the late 1970’s and early 1980’s, baseball writer Bill James popularized the use of statistics for the analysis of baseball topics. One definition that he used was, “*Sabermetrics is the field of knowledge which is drawn from attempts to figure out whether or not those things people say about baseball are true.*” Along the way, James developed formulas that he used to evaluate players. Many of those formulas are so useful that we have decided to include them for you to use in your research.

The first is called Runs Created. Each player accumulates his counting statistics over the course of a season or career. At any point, we can see the statistics of each member of a baseball team, and we know how many runs that the team scored, but we do not know how many runs scored as a result of the performance of each player. What Runs Created tries to do is answer this question: If a team accumulated the same batting statistics as this player over the course of a number of games, how many runs would that team score during those games? The Runs Created formula estimates that, and says that the player has created so many runs. One of the nice things about Runs Created is that most baseball fans don’t need a lot of exposure to it in order to understand what is a good number to have, since it is on the same scale as runs scored or RBI. A player will often have a Runs Created figure close to the number of runs he has scored or driven in, but Runs Created may be a better estimate than the other two since it is influenced less on the accomplishments of his teammates.

Three other offensive formulas that we include are OPS, ISO, and Secondary Average. OPS stands for “On Base Plus Slugging”, and adds together a player’s on base and slugging percentages. Both On Base Percentage and Slugging Percentage tell more about a player than their batting average, and by adding together these two measures, we get a better idea of which players are more effective offensively. ISO is “Isolated Power” and is the total number of extra bases per plate appearances. It’s a truer measure of power than slugging percentage, since a player who has little power but gets a lot of singles can still have a reasonably high slugging percentage. Secondary Average is meant to combine everything except what is counted in his batting average, and includes extra based on hits plus walks, plus stolen bases divided by his at bats.

For pitchers, we have only one sabermetrics formula, ERC, or Component ERA. ERC tries to estimate the pitcher’s ERA given the number of hits, walks, home runs and strikeouts that he has allowed in his innings pitched.

The complete formulas that we use follow in the Glossary section.

When you are evaluating and choosing your players for your team at *Imagine Sports*, you are choosing between players who have played baseball over the course of more than a hundred years of history. As with players who have played in different stadiums, players who accumulated their statistics in different eras can only be compared by taking into account the offensive levels in the seasons when they played.

Changes in rules, equipment, stadiums, styles of play, and the evolution of strategies have been such that the offensive and defensive statistics of players in different decades are dramatically different. Better gloves, stadiums, and grounds keeping (plus more lenient official scorers today) have led to higher fielding percentages; this may be the most dramatic and consistent evolution over time. For offense, there were three dramatic sudden increases in scoring. The first was in 1893, when the pitching mound was moved back to 60’6”. The second was in 1920, responding to some combination of (a) the emergence of Babe Ruth and the excitement that his hitting brought to the game, (b) the need for more excitement to take people’s attention away from the Black Sox scandal, and (c) the emphasis on using new, white baseballs after Ray Chapman was killed by a dirty grey ball that he probably couldn’t see. The third was in 1969, when owners over-reacted to Denny McLain’s 31 wins, Bob Gibson’s 1.12 ERA, and the American League finishing with only one .300 hitter, Carl Yastrzemski, who only reached that mark late in the season and won the title at .301. The 1990’s also saw an offensive explosion that was more of a gradual ramp-up of offense that perhaps started in 1987, whose cause has been blamed on different causes including expansion, steroids, new ballparks, and a simple lack of enough good pitchers.

Imagine Sports codes players so that accomplishments in different eras are treated fairly. When Carl Yastrzemski batted .301 to lead the American League in 1968, that was a difficult task and he deserves a lot of credit for that. In 1930, the National League batted .303, so Bob O’Farrell, who batted .301 for the Giants, was a below average hitter that year.

We provide career normalized batting, fielding, and pitching statistics for each major league player. A value of 100 is league average, while a value of 105 means that the player is 5% better than average in that category. Normalization (example for batting average) is done by comparing the number of hits a player had in each season to the number that the average hitter in the league had in the same number of opportunities. It is easiest to understand this by use of an example. Let’s use Carl Yastrzemski in the American League 1968, since we have already discussed him.

There were 12,359 hits in 53,709 AB in the league that year, a batting average of .230. Carl Yastrzemski had 539 AB in 1968, so the average AL hitter would have had about 124 hits. Yaz had 162 hits. His AVG+ is 162/124*100, or 130.6. We calculate this for every season is his career – the number of hits that the average hitter would have had in Yastrzemski’s number of at bats each season, then add them all up and compare to the total that Yaz had. Yaz had 3,419 hits, while the league would have had 3,026, giving him an AVG+ figure of 113.

**For batters:**

G – Games Played. Total number of games played during his career.

AB – At Bats.

R – Runs. Total runs scored.

H – Hits.

2B – Doubles.

3B – Triples.

HR – Home Runs.

RBI – Runs Batted In.

BB – Bases on Balls. Also known as walks.

K – Strike outs.

SB – Stolen Bases.

CS – Caught Stealing.

HBP – Hit by Pitch.

SH – Sacrifice Hits. This is sacrifice bunts, but the SB abbreviation is taken.

SF – Sacrifice Flies.

GIDP – Grounded into Double Plays. Note that double play line-outs and fly outs are not counted here.

RC – Runs Created. Our formula is:

* Let A = (H + BB + HBP – CS – GIDP)

* Let B = .24x(BB – IBB + HBP) + .62xSB + .50x(SH + SF) + TB - .03xK

* Let C = AB + BB + HBP + SH + SF

Then

* RC = (2.4xC + A) x (3xC + B) / (9xC) – 0.9xC

Where IBB is intentional walks and TB is total bases (H + 2B + 2x3B + 3xHR)

Runs Created are calculated each season and then added up over the player’s career to get career runs created, and may not equal the same total that you’d get if you used the formula on his career line. For players who played on more than one team in a season, his runs created was calculated for each team before summing.

PA – Plate Appearances. Total of AB, BB, HBP, SH, and SF.

OUT – Outs. Calculated as (AB – H) plus GIDP, CS, SH and SF.

AVG – Batting Average. Calculated as H divided by AB.

SLG – Slugging Percentage. Calculated as (H + 2B + 2x3B + 3xHR) divided by AB.

OBP – On Base Percentage. Calculated as (H + BB + HBP) / (AB + BB + HBP + SF). Note that the inclusion of SF in the denominator means that a player can have a lower OBP than AVG if he has no walks or HBP.

ISO – Isolated Power. Calculated as SLG – AVG.

SEC – Secondary Average. Calculated as (2B + 2x3B + 3xHR + BB + SB) divided by AB.

RC/650 – Runs Created per 650 Plate Appearances. Calculated as RC/PA x 650.

HRF – Home Run Factor. Calculated as AB/HR. A low HR factor implies more power.

BBF – Base on Balls Factor. Calculated as PA/BB. As with HRF, low implies a player who walks more often.

KF – Strikeout Factor. Calculated as PA/K. Players who strike out often have a low KF.

RC27 – Runs Created Per 27 Outs. Calculated as RC divided by OUT then multiplied by 27.

BA+ – Normalized Batting Average. This is the player’s H divided by the league average hits in his at bats each season, added up over his career, multiplied by 100.

SLG+ – Normalized Slugging Percentage. This is the player’s total bases divided by the league average total bases in his at bats each season, added up over his career, multiplied by 100.

OBP+ – Normalized On Base Percentage. This is the player’s (H + BB + HBP) bases divided by the league average total of (H + BB + HBP) in his (AB + BB + HBP + SF) each season, added up over his career, multiplied by 100.

2B+ – Normalized Doubles. This is the player’s 2B divided by the league average doubles in his at bats each season, added up over his career, multiplied by 100.

3B+ – Normalized Triples. This is the player’s 3B divided by the league average triples in his at bats each season, added up over his career, multiplied by 100.

HR+ – Normalized Home Runs. This is the player’s HR divided by the league average home runs in his at bats each season, added up over his career, multiplied by 100.

BB+ – Normalized Walks. This is the player’s BB divided by the league average walks in his plate appearances each season, added up over his career, multiplied by 100.

SB+ – Normalized Stolen Bases. This is the player’s SB divided by the league average stolen bases in his estimated number of times on first base (H – 2B – 3B – HR + BB + HBP) each season, added up over his career, multiplied by 100.

CS+ – Normalized Stolen Bases. This is the player’s CS divided by the league average caught stealing in his estimated number of times on first base (H – 2B – 3B – HR + BB + HBP) each season, added up over his career, multiplied by 100.

K+ – Normalized Strikeouts. This is the player’s K divided by the league average strike outs in his plate appearances each season, added up over his career, multiplied by 100.

GIDP+ – Normalized GIDP. This is the player’s GIDP divided by the league average GIDP in his at bats each season, added up over his career, multiplied by 100.

XB+ – Normalized Extra Base Hits. This is the player’s (2B+3B) divided by the league average (2B+3B) in his at bats each season, added up over his career, multiplied by 100.

ISO+ – Normalized Isolated Power. This is the player’s (2B + 2x3B + 3xHR) divided by the league average (2B + 2x3B + 3xHR) in his at bats each season, added up over his career, multiplied by 100.

SEC+ – Normalized Secondary Average. This is the player’s (2B + 2x3B + 3xHR + BB + SB) divided by the league average (2B + 2x3B + 3xHR + BB + SB) in his at bats each season, added up over his career, multiplied by 100.

RC+ – Normalized Runs Created. This is the player’s RC divided by the league average RC in his plate appearances each season, added up over his career, multiplied by 100.

OPS+ – Normalized OPS. Calculated as OBA+ plus SLG+ then subtract 1.

RC27+ – Normalized Runs Created Per 27 Outs. This is the player’s RC divided by the league average RC in his outs each season, added up over his career, multiplied by 100. A player will have a higher RC27+ than RC+ if he had fewer outs than average in his plate appearances – in other words, if he had an above average OBP.

**For fielders (by position):**

G – Games.

INN – Defensive Innings. Where defensive innings were not available, we used INN = 9xG.

PO – Put Outs.

A – Assists.

E – Errors.

FA – Fielding average. Calculated as (PO + A) divided by (PO + A + E).

PO9 – Put Outs per 9 Innings. Calculated as PO / INN x 9.

A9 – Assists per 9 Innings. Calculated as A / INN x 9.

RF – Range Factor. Calculated as (PO + A) / INN x 9.

PO+ – Normalized Put Outs. This is the player’s PO divided by the league average PO in his innings each season, added up over his career, multiplied by 100.

A+ – Normalized Assists. This is the player’s A divided by the league average A in his innings each season, added up over his career, multiplied by 100.

E+ – Normalized Errors. This is the league average number of errors in the player’s innings each season, added up over his career, divided by his career errors, multiplied by 100. Note that a high value means that the league makes more errors than he does, so a high number implies a low error rate.

RF+ – Normalized Range Factor. This is the player’s (PO + A) divided by the league average (PO + A) in his innings each season, added up over his career, multiplied by 100.

FA+ – Normalized Fielding Average. This is the player’s ((PO + A) / (PO + A + E)) divided by the league averages of the same, in his innings each season, added up over his career, multiplied by 100.

DER – Defensive Efficiency Record: The percent of times a batted ball is turned into an out by the teams' fielders, not including home runs. The formula we use is (BF-H-K-BB-HBP-0.6*E)/(BF-HR-K-BB-HBP).

PK – For pitchers and catchers, the number of times the player made a pickoff throw that successfully retired a runner.

**For pitchers:**

W – Wins.

L – Losses.

Win% – Winning Percentage, calculated as W / (W + L).

G – Games Pitched.

GS – Games Started.

CG – Complete Games.

Saves – Saves.

IP – Innings Pitched.

H – Hits Allowed.

R – Runs Allowed.

ER – Earned Runs Allowed.

BB – Walks Allowed.

K – Strikeouts.

HR – Home Runs Allowed.

HBP – Hit Batsmen.

WP – Wild Pitches.

Balk – Balks.

ERA – Earned Run Average. Calculated as ER divided by IP multiplied by 9.

ERC – Component ERA. For each season, we calculate:

* If BFP (Batters Facing Pitcher) is not available, then BFP = 2.9xIP + H + BB + HBP.

* If IBB (Intentional Walks) is available, then PTB (pitcher’s total bases) = .89x (1.255x(H – HR) + 4xHR) + .56x (BB + HBP – IBB)

* If IBB (Intentional Walks) is not available, then PTB (pitcher’s total bases) = .89x (1.255x(H – HR) + 4xHR) + .475x (BB + HBP)

Then

* ERC = (H + BB + HBP) x PTB / (BFP x IP) x 9 - .56.

If this is less than 2.24, then

* ERC = (H + BB + HBP) x PTB / (BFP x IP) x 9 x .75.

CER – Composite Earned Runs. The number of runs needed to produce the pitcher’s calculated ERC each season. It is ERC x (IP/9). *Note that a pitcher’s career ERC is calculated by adding up the CER over his career and dividing by his career IP/9.*

H9 – Hits per 9 Innings. Hits divided by IP/9.

BB9 – Walks per 9 Innings. Walks divided by IP/9.

BR9 – Base Runners per 9 Innings. Calculated as (H + BB + HBP) / (IP/9).

K9 – Strikeouts per 9 Innings, Calculated as K divided by (IP/9).

CG+ – Normalized Complete Games. This is the player’s complete games divided by the product of the player’s games started by the league complete game percentage. Then multiply by 100.

H+ – Normalized Hits. We take the league average number of hits per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of hits he allowed in his career, and multiply by 100.

R+ – Normalized Runs. We take the league average number of runs per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of runs he allowed in his career, and multiply by 100.

ERA+ – Normalized ERA. We take the league average number of earned runs per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of earned runs he allowed in his career, and multiply by 100.

ERC+ – Normalized ERC. We take the league average number of composite earned runs per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of composite earned runs he allowed in his career, and multiply by 100.

BB+ – Normalized Walks. We take the league average number of walks per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of walks he allowed in his career, and multiply by 100.

K+ – Normalized Strikeouts. We take the league average number of strikeouts per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide his career strikeouts by this, and multiply by 100.

HR+ – Normalized Home Runs. We take the league average number of home runs per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of home runs he allowed in his career, and multiply by 100.

HBP+ – Normalized Hit by Pitch. We take the league average number of HBP per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of HBP he allowed in his career, and multiply by 100.

WP+ – Normalized Wild Pitch. We take the league average number of WP per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of WP he allowed in his career, and multiply by 100.

Balk+ – Normalized Balks. We take the league average number of balks per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of balks he allowed in his career, and multiply by 100.

BR+ – Normalized Base Runners. We take the league average number of (H + BB + HBP) per inning, multiplied by the pitcher’s innings, and add that up over his career. Then we divide by the number of (H + BB + HBP) he allowed in his career, and multiply by 100.

OAVG – Opponents Batting Average. Opponents AB is estimated as BFP less BB and HBP. Then OAVG is H divided by Opponents AB.

OSLG – Opponents Slugging Percentage. Opponents AB is estimated as BFP less BB and HBP. Opponents Total Bases is estimated using the formula given above in the CER calculation. Then OSLG is Opponents Total Bases divided by Opponents At Bats.

OOBP – Opponents On Base Percentage. Calculated as (H + BB + HBP) divided by BFP.

OAVG+ – Normalized Opponents Batting Average. We take the league total number of hits divided by the league total of the estimated opponents at bats, multiplied by the pitcher’s estimated opponents at bats, and add that up over his career. Then we divide by the number of hits he allowed in his career, and multiply by 100.

OSLG+ – Normalized Opponents Slugging Percentage. We take the league total number of estimated opponents total bases divided by the league total of the estimated opponents at bats, multiplied by the pitcher’s estimated opponents at bats, and add that up over his career. Then we divide by the number of estimated total bases he allowed in his career, and multiply by 100.

OOBP+ – Normalized Opponents On Base Percentage. We take the league total number of hits plus walks plus hit batsmen divided by the league total of the BFP, multiplied by the pitcher’s estimated opponents at bats, and add that up over his career. Then we divide by the number of (hits plus walks plus hit batsmen) he allowed in his career, and multiply by 100.

**For injuries:**

INJ:DAYS – The total number of days a player has been out injured.

INJ:IP/DAYS – The number of innings a player has played or pitched per day out injured (so the higher the number, the less frequent have been the player’s days out injured).

INJ:IP/NUM – The number of innings a player has played or pitched per injury (regardless of length) he has had (so the higher the number, the less frequent have been the player’s injuries).

INJ:MAXDAYS – The longest injury (in days) a player has had.

INJ:NUM – The total number of injuries (regardless of length) that the player has had.

©2024 Imagine Sports