Previous post: Serving Series 9: Implementing an Approach
Analytics-forward people are often accused of failing to understand the human element of sports. We are also often accused of being too rigid in our approach and unwilling to deviate. In this blog, I'll talk about some times when an adjustment to one's approach is warranted, and explain why some conventional wisdom about when to change your approach is wrong.
“We can’t miss 3 in a row,” “you can’t miss a serve in that situation,” “she’s already missed 2 this match, she can’t afford to miss another.” You or someone you have played or coached with has probably said these words or something like them on a volleyball court. These are all claims about changing your approach to fit a situation. Changing your approach to fit a situation is important. I talked a little about it last post. But, I think many volleyballers implement changes to their approach in situations that do not warrant it.
When choosing an approach, what you’re doing is making a decision based on the probabilities of certain outcomes and the values of those outcomes and aggregating them. Whether you use the method I talked about in the last blog or if you’re eyeballing it, that’s what you’re doing. The only difference between a data-driven approach and a feel approach is the data you’re using to generate those expectations. This means that if something doesn’t change one of those two things (the probabilities of outcomes and the values of outcomes), it doesn’t merit a change in approach. In essence, ask yourself: "Do I expect the path to winning this point to be different than the path to winning other points?" and "Do I have new data that suggests I should expect something different than the data I used to create my approach would lead me to expect?" Let’s apply this standard to some various situations:
The eye color of your opponent’s outside hitter:
I just have this here as a clear example of information we clearly intuit is not relevant. We know that this doesn’t matter so we don’t let it affect our decision making.
How many serves we’ve already missed
“We can’t miss 3 in a row”
This is a common one. Many volleyballers seem to think that we should modify our approach to be more conservative after a certain number of consecutive misses. If you have an intentionally developed serving approach that you trust, small samples of in-match data should not affect your expectations. Unless you’re serving in at close to 100%, some number of missed serves were bound to cluster, such is the nature of semi-random events. It is not meaningful data for updating expectations of how often your athletes will miss with a given approach.
But, missing a bunch of serves in a row is data that we can use nonetheless. Maybe it’s an indication that athletes have strayed from the approach that they've practiced. Maybe they're struggling to focus at the service line. Or, if we’re at practice and taking a lot of serving reps, it might mean that athletes are struggling to be intentional. Maybe it's nothing and some missed serves just happened to cluster. Knowing your athletes and judging when to re-focus them is different than changing your approach and is a reasonable response to data.
Leverage
“You can’t miss a serve in that situation”
Leverage is a measure of how pivotal a situation is. If a situation is high-leverage, the win probability of each team can shift wildly point-to-point. For example, in a 23-23 game, the receiving team is slightly favored to win. But, whoever wins that point to serve on 24-23 is significantly favored. If we do some math based on the (admittedly fraught) assumption that the serving team is .4275 to win and the receiving team is .5725 to win, we can know that at 23-23, the serving team is .4662 to win and the receiving team is .5338 to win. In that same game, whoever wins that point to go up 24-23 and start serving is .7331 to win and the team down and receiving is .2669 to win. That’s a huge jump.
Contrast the 23-23 situation to a 0-0 situation. The receiving team on 0-0 is slightly favored in this game but the proportions are likely very close to .5, and there is certainly not a 20 percentage point jump between 0-0 and 1-0. That difference is what leverage is.
(I’ve attached a math appendix to this post to explain how I got these numbers)
I hear a lot that a serving approach needs to change in high-leverage situations. Particularly, the sentiment “you can’t miss a serve in that situation, they just gave the game away” is one people say a lot. But, there’s nothing about a leverage situation that inherently changes the probabilities of each outcome, nor is there anything that changes the value of those outcomes relative to each other. Presumably you believe that how you serve in lower-leverage situations also gives you the best chance to win the point, you should continue to do that in high-leverage situations.
An individual athlete’s performance on the day
“She’s already missed 2 this match, she can’t afford to miss another”
When an athlete misses their first serve on two serving turns in a row, oftentimes people say that they need to change their approach on the next one to get it in. This one is perhaps a more interesting claim. If someone misses their first two serves, is it time to update our expectations for the probabilities of each outcome? I think if the missed serves are your only data, probably not. It’s not that out of the ordinary for an event with a 10% likelihood of occurring to happen twice in a row. Rolling a 1 on a 10-sided die twice in a row is not a strong indication that the die is unfair. If you’ve rolled that die hundreds of times and the results have been fair, you shouldn’t change your mind just because the two most recent rolls were costly.
But, coaches are allowed to talk to athletes during matches so observing the serving outcomes is not the only data we have available. In your own playing days, you probably had days where you had trouble performing at your best at a skill or where you were hot at a certain skill and couldn’t miss. Your athletes can probably tell when they’re having one of these days. If they’re saying “coach, I’m having trouble doing X today” or “coach, I’m hot today, I can’t miss,” and you trust your athletes, that’s meaningful data that should change how often you expect a certain outcome to occur, and therefore how you serve.
Today's craiyon prompt: "markov chain of volleyballs"
Math appendix:
In this section I’ll talk about how I arrived at those win probability numbers in the section on leverage. Special thanks to Dr. Erin Ellefsen at Earlham College for finding this solution and teaching me my first math class in close to 6 years.
She solved the puzzle of the win probabilities in the game by using a Markov chain. A Markov chain is when you make a matrix, multiply it by a vector, multiply the resulting vector by the original matrix, and then repeat this process. You can approximate the win probability by repeating the chain a large amount of times. The way that looks is this:
We start by listing the possible states across the top and down the side:
ES= even and serving, ENS=even and not serving, 1US=1 up and serving, 1DNS=1 down and not serving, W=win, L=loss.
These are all of the possible states once a game has reached 23-23. From there, we fill in the table with the probability of moving from the state on the top to the state on the side. I’ll describe the cells using ordered pairs. For example, (ES, ES) is the top left cell, (1US, ENS) is the cell in the 1US column and the ENS row, etc.
(ES,ES) gets a 0 because if a game is tied and your team is serving, the game can’t still be tied and your team serving on the next point.
(ES, ENS) also gets a 0 because the game can’t stay tied point-to-point.
(ES, 1US) gets a .4275 because that’s the probability of holding serve. When you hold serve you go up 1 point and continue serving.
(ES, 1DNS) gets a .5725 because that’s the probability of your opponent siding out. If they side out, you’d be down 1 point and not serving.
(ES, W) gets a 0 because you can’t go from even to winning in one point.
(ES, L) gets a 0 because you can’t go from even to losing in one point.
This is what the matrix looks like with one column filled in:
You can follow this procedure to fill in the rest of the columns and the matrix looks like this:
From there, you write a vector that represents your starting position:
Let's imagine that we're going to have a computer play a bunch of games with the stipulation I gave, that on every point the serving team has a 42.75% chance to win the point. Each vector represents a time-period. This vector represents the initial time period t=0, at which all the games are starting at 23-23.
Then, you multiply the matrix by this vector to get the breakdown for the next time period. Or, in volleyball terms, all the games will play one point. I used the mmult() function in Excel, but you can also do this in matlab or python, but I don’t know how.
The resulting vector for time-period t=1 is this:
Now, there is uncertainty. In some of the games, the serving team held serve, in some games the receiving team sided out. The numbers in each cell represent the proportion of games that are in that situation after playing one point. From there, we multiply the initial matrix by the vector for the distribution in time period t=1 to get the distribution in t=2.
After playing 2 points, some games have ended. The W column represents the probability that the initial serving team held serve twice in a row and the game ended 25-23. The L column represents the probability that the initial receiving team sided out and then held serve and the game ended 25-23. Some games moved back to tied 24-24. We multiply this vector back in to get to t=3. I’m going to skip t=3 because it’s not as interesting and look at t=4.
In t=4, now more games have ended. From the set of games that were tied in t=2, some number of those had a team win 2 in a row and end the game. As we progress through time, all the games that are finished stay finished, while more unfinished games find a winner and loser and some keep going forever. Eventually, the proportion of games still going in late time periods becomes small enough to be a rounding error, and we have an approximation of the win percentage of each team at 23-23.
This example stabilized at the serving team’s win percentage being .4662 around t=60.
As you can see, some number of games will still be going in this time period but it’s a very small number.
You can use this method to model other situations where there is uncertainty. You can change the side out and hold serve percentages by changing the matrix and you can change the starting situation by changing what vector you use to represent the state in t=0.
Comentários