Research article - (2006)05, 503 - 508 |
Stochastic Dominance and Analysis of ODI Batting Performance: the Indian Cricket Team, 1989-2005 |
Uday Damodaran |
Key words: Bayesian, utility function, batting average, conditional average, geometric distribution |
Key Points |
|
|
|
As a game, cricket is a statistician’s delight. Each game of cricket throws up a huge amount of performance related statistics. As other games have evolved and developed, they too have become richer in the use of performance statistics. For example, use of statistics like ‘unforced errors’ in lawn tennis or ‘assists’ in basketball is increasingly becoming popular. However in cricket these statistics have always been part and parcel of the game. Cricket is one of the few games in which a ‘scorer’ is required to continuously maintain statistical data on key game/player-specific performance statistics. It is one of the few games that have detailed ‘scoring sheets’. These scoring sheets were maintained manually in the pre-digital age and are maintained electronically today. In spite of this legacy and long history of maintaining statistical data, two aspects associated with cricketing data are striking. The first is the idiosyncrasy that has persisted in the treatment of the ‘not out’ scores of a player. The second is the lack of effort in exploiting the richness of data to improve the representation of player performance. The batting average of player i, Ri, is computed as: The second aspect of cricketing data is the scant attention that has been focused by researchers on certain aspects of cricket. A substantial portion of the work has focused on devising optimal playing strategies. The strategies studied have either focused on batting strategies (Clarke, 1988; Clarke and Norman, 1999; Preston and Thomas, The third stream of work, on the understanding and development of player-specific performance statistics, (Kimber and Hansford, This paper seeks to develop methods to assess the performance of batsmen in cricket that (i) makes use of more information than current methods do and (ii) can be converted into visually appealing graphics for the television medium. The method is demonstrated using player statistics for the some of the key members of the Indian One Day International (ODI) cricket team between 1989 and 2005. The names of the players included in the study are listed in |
Adjusting the raw data |
The raw data used in the development of any method for representing a batsman’s performance are the innings-by-innings runs scored by the player. However, using this raw data poses a problem. In some of the innings the batsman would not have been dismissed. In such cases the score would not reflect the number of runs the player could potentially have gone on to score. The scores for these innings (the ‘not out’ situations) have thus to be replaced by a number that is a good estimate of the number of runs the player would have scored had he batted on. In an early work Wood, Kimber and Hansford, Assume that in his The estimate of the number of runs that the ‘not-out’ batsman would have gone on to score is then given by: In other words, the estimator used for the runs that the ‘not out’ batsman would have gone on to score is the conditional average of the batsman at that point of time, given that he has already scored a certain number of runs. In every instance of a ‘not out’, the batsman’s score in that innings j is replaced by the estimate Eij. This approach has the advantage of handling deviations from the geometric distribution assumption. It is also information efficient, with the posterior values of the conditional average incorporating more information on the batsman’s performance. |
Stochastic Dominance |
The adjusted raw data is now used to arrive at an analytical representation of the player’s batting performance. The approach adopted draws from methods normally used for the analysis of securities and portfolios in investment management. The focus in investment management is on wealth creation. The problem of portfolio choice is that of selecting a portfolio that maximizes the utility for the investor. The utility function for the investor attaches a utility to various levels of wealth. The utility function can be constrained to have certain properties like non-satiation (more wealth is always preferred to less) or risk aversion (diminishing marginal utility for incremental units of wealth). In mathematical terms the first constraint requires the first derivative of the utility function to be positive. Again, in mathematical terms the second constraint requires the second derivative of the utility function to be negative. Consistent with some of the above-listed features of utility functions, the traditional approach to the portfolio selection problem has been the mean-variance approach. Amongst the alternative approaches to the portfolio selection problem suggested in the investment management literature is the set of stochastic dominance rules (Ali, Analogous to the portfolio selection problem, a similar approach is adopted in this paper to represent the batting performance of cricketers. Using this approach we can say that a batsman A’s performance is better than another batsman B’s if, for any level of score, the probability of batsman A getting a score greater than the given score is never lesser, and sometimes greater, than the probability of batsman B getting a score greater than that given score. This rule corresponds to the first-order stochastic dominance rules and assumes that more runs are always preferred to less. The cumulative probability charts of various batsmen can now be charted with runs on the X-axis (with the origin as zero) and the probability of scoring more runs than the X-axis value of the score (that is one minus the cumulative probabilities of scoring lesser than the X-axis value of score) on the Y-axis. Visually this would mean that a batsman whose stochastic dominance curve envelops another’s curve stochastically dominates the other batsman. |
|
|
The method is demonstrated using data for the Indian ODI cricket team spanning the years 1989 (the year one of India’s most highly rated players, Sachin Tendulkar, made his debut) to 2005. This period was chosen because this was a period during which the compositional changes in the Indian ODI team were very few. A sample batting performance stochastic dominance chart output for five Indian players is given in Four of the five players represented are essentially specialist batsmen (Tendulkar, Dravid, Sehwag and Laxman) and one a specialist bowler (Khan). The results are interesting and have intuitive appeal. They are consistent with popular notions regarding the batsmen whose performances were studied. For example, the curve for Sachin Tendulkar, who is considered an icon of Indian cricket, almost completely envelops the curves for other players. And the curve for Rahul Dravid, who is referred to as ‘the wall’ because of his perceived consistency, does indeed dominate the curves for other players till the 20 run point. In other words, the chances of Rahul Dravid getting a score less than 20 is lesser than the chances for any other player in the Indian team getting a score lesser than 20. Finally, the curves for the specialist batsmen very clearly dominate the curves for the specialist bowlers, as should be the case. |
|
|
The method that has been developed only provides an alternative approach to represent the batting performance of cricket players. This alternative approach is visually and intuitively appealing. The attempt in this paper is not to arrive at a model to rank the utility of players. Nor is the goal to develop a model to assist in team selection. The utility of a player goes far beyond the runs scored by him. Factors like tactical skills, passive support to the partner batsmen, etc. cannot be gauged by looking at the runs scored. Even if we use runs scored as the sole measure of utility, first-order stochastic dominance rules alone cannot be used to rank players in terms of their utility. And if we go on to second-order stochastic dominance rules the utility function might not have a negative second derivative. In other words, there could be potentially match winning situations in which a batsman who is batting on a very high score (say, 108) has to score one more run in order for the team to win the match. In this situation the incremental one run (from 108 to 109) might be much more valuable than the incremental one run the batsman scored while he was on a lower score (say, 23) during the same innings. |
Conclusions |
Within the limits of this study, the paper seeks to highlight the tremendous scope that exists to improve and develop on the measures currently used to describe the performances of cricket players in general, and batsmen in particular. The measures used today do not adequately capture the richness of the underlying data. Similar approaches can be adopted to represent the performances of bowlers too. |
AUTHOR BIOGRAPHY |
|
REFERENCES |
|