Presenting the Model, Part II: Scoring and a First Look at Results
In the previous post I presented the five data inputs to a model of 2020 Democratic presidential candidate ratings: national polls, early state polls, political pundit power rankings (a lot of alliteration...), and political betting markets. Now I'll show how I scale the data and average them for an overall rating. It's pretty simple but I'll show the details for transparency.
To start with I put each of the five inputs on a 10 point scale, with 10 being the theoretical best score a candidate could achieve on that metric. In some cases 10 represents more of a practical/feasible best score than theoretical, as I'll explain below.
National Polls: poll numbers are scored as the candidate's polling average (as calculated by RealClearPolitics) divided by 60, then multiplied by 10. I chose 60 because any candidate that reaches 60% in polling pretty well has the nomination sewn up. We aren't likely to see a number higher than that until the end game in summer 2020. If I'm proven wrong, well it's easy enough to adjust that limit. Example: a candidate at 20% in the national polling average would score (20/60)*10 = 3.33.
Early State Polls (IA, NH, SC, NV): the latest poll from each state (if less than about a month old) is included in a simple average across the polls. The average number is then divided by 60 and multiplied by 10, as with national polling. Example: a candidate with recent poll numbers of 18% in Iowa, 14 in New Hampshire, 21 in South Carolina, and 11 in Nevada, would have an Early State Average of 16.0%. Their Early State Score would be (16/60)*10 = 2.67.
Pundit Power Rankings: CNN only releases a Top 10 list, while the Washington Post gives us a Top 15, so that complicates matters a bit. My method (which isn't perfect) is to give a candidate 15 points for a first place ranking, 14 points for second, 13 for third, etc. A candidate gets 0 points for not making a list (so no candidates score points for ranking 11th-15th in CNN's list). The points from each list are combined for a maximum of 30 points possible (for two first place rankings). These "pundit points" are then divided by 30 and multiplied by 10. Example: a candidate is ranked 12th by WaPo and not ranked by CNN. They get 4 pundit points so (4/30)*10 = 1.33.
Candidate Endorsements: 538 estimates that there are 917 potential endorsers they're tracking, for an estimated total of 2,256 possible endorsement points. No candidate is going to hit that limit however. Many potential endorsers may not end up officially endorsing any candidate, and others will stick with a losing candidate until the end. My guess is a candidate who approaches even half that number will have the nomination in hand, so I'm using 1100 points as a practical maximum (again, time will tell). Example: a candidate has 55 endorsement points according to 538, so (55/1100)*10 = 0.5.
PredictIt Betting Market: PredictIt reports candidate values as a purchase cost of a bid, on a scale of 0-100 cents, effectively a percentage likelihood. Unlike polls however, the maximum here is actually 100, as bettors are likely to coalesce around the leader much more than voters will (at least during primary season). Example: a candidate's bid price is 21 cents, so (21/100)*10 = 2.1.
Final Score: real complex math here ... for the final score I'm simply averaging the five input scores. So again a score of 10 is the maximum, and will represent a candidate who has the nomination in the bag. Why a simple average? One could argue that some of these inputs are more significant, or more reliable, or more predictive, than others. But frankly at this early stage, I just don't see any clear rationale to weight one more than the others. The value of averaging the five is that the "crowd" of input data are likely to be wiser than any single input alone. We all know polls, while informative, are suspect and far from predictive at this early stage. The Establishment (in the form of endorsements), may very well miss the mark, as they did with Trump in 2016. And so on. The average balances the five disparate inputs and gives what I feel is a nice round score that seems to have face validity, at least so far. So, a candidate who has the example scores we've outlined above for each of the five inputs would have an average of (3.33+2.67+1.33+0.5+2.1)/5 = 1.99, meaning they have a long road ahead to win the nomination. But mainly that number is useful in comparison with the other candidates' scores.
So enough methods talk, how do the results look? Let's look at the scores for the major/serious Democratic candidates, as of today, Friday, March 29.
For now I'm just going to let this table speak for itself, I'll save the analysis for a future post. But I think it looks pretty good, and as I'll show in my next post, having a score now allows us to start watching trends as they happen. Stay tuned!
To start with I put each of the five inputs on a 10 point scale, with 10 being the theoretical best score a candidate could achieve on that metric. In some cases 10 represents more of a practical/feasible best score than theoretical, as I'll explain below.
National Polls: poll numbers are scored as the candidate's polling average (as calculated by RealClearPolitics) divided by 60, then multiplied by 10. I chose 60 because any candidate that reaches 60% in polling pretty well has the nomination sewn up. We aren't likely to see a number higher than that until the end game in summer 2020. If I'm proven wrong, well it's easy enough to adjust that limit. Example: a candidate at 20% in the national polling average would score (20/60)*10 = 3.33.
Early State Polls (IA, NH, SC, NV): the latest poll from each state (if less than about a month old) is included in a simple average across the polls. The average number is then divided by 60 and multiplied by 10, as with national polling. Example: a candidate with recent poll numbers of 18% in Iowa, 14 in New Hampshire, 21 in South Carolina, and 11 in Nevada, would have an Early State Average of 16.0%. Their Early State Score would be (16/60)*10 = 2.67.
Pundit Power Rankings: CNN only releases a Top 10 list, while the Washington Post gives us a Top 15, so that complicates matters a bit. My method (which isn't perfect) is to give a candidate 15 points for a first place ranking, 14 points for second, 13 for third, etc. A candidate gets 0 points for not making a list (so no candidates score points for ranking 11th-15th in CNN's list). The points from each list are combined for a maximum of 30 points possible (for two first place rankings). These "pundit points" are then divided by 30 and multiplied by 10. Example: a candidate is ranked 12th by WaPo and not ranked by CNN. They get 4 pundit points so (4/30)*10 = 1.33.
Candidate Endorsements: 538 estimates that there are 917 potential endorsers they're tracking, for an estimated total of 2,256 possible endorsement points. No candidate is going to hit that limit however. Many potential endorsers may not end up officially endorsing any candidate, and others will stick with a losing candidate until the end. My guess is a candidate who approaches even half that number will have the nomination in hand, so I'm using 1100 points as a practical maximum (again, time will tell). Example: a candidate has 55 endorsement points according to 538, so (55/1100)*10 = 0.5.
PredictIt Betting Market: PredictIt reports candidate values as a purchase cost of a bid, on a scale of 0-100 cents, effectively a percentage likelihood. Unlike polls however, the maximum here is actually 100, as bettors are likely to coalesce around the leader much more than voters will (at least during primary season). Example: a candidate's bid price is 21 cents, so (21/100)*10 = 2.1.
Final Score: real complex math here ... for the final score I'm simply averaging the five input scores. So again a score of 10 is the maximum, and will represent a candidate who has the nomination in the bag. Why a simple average? One could argue that some of these inputs are more significant, or more reliable, or more predictive, than others. But frankly at this early stage, I just don't see any clear rationale to weight one more than the others. The value of averaging the five is that the "crowd" of input data are likely to be wiser than any single input alone. We all know polls, while informative, are suspect and far from predictive at this early stage. The Establishment (in the form of endorsements), may very well miss the mark, as they did with Trump in 2016. And so on. The average balances the five disparate inputs and gives what I feel is a nice round score that seems to have face validity, at least so far. So, a candidate who has the example scores we've outlined above for each of the five inputs would have an average of (3.33+2.67+1.33+0.5+2.1)/5 = 1.99, meaning they have a long road ahead to win the nomination. But mainly that number is useful in comparison with the other candidates' scores.
So enough methods talk, how do the results look? Let's look at the scores for the major/serious Democratic candidates, as of today, Friday, March 29.
For now I'm just going to let this table speak for itself, I'll save the analysis for a future post. But I think it looks pretty good, and as I'll show in my next post, having a score now allows us to start watching trends as they happen. Stay tuned!
Comments
Post a Comment