Stability of a performance variable refers to the repeatability of that variable across repeated trials (observed performances) over time and can be evaluated using test-retest reliability methods (Portney and Watkins, 2000). The stability of a variable across trials influences the stability of the mean value of the group of trials. When the mean value is not stable, both the reliability of the mean and its ability to represent a more generalized performance (validity) are limited. The number of trials obtained from an individual in an experiment is thought to influence stability (Bates et al., 1983; Salo et al., 1997) and thus is an important methodological consideration in the design of landing experiments. Except for unique circumstances (e.g., a single trial is the subject of interest) several trials are thought to provide a more stable and representative mean value (Bates et al., 1983). Because variability is present in all human movement, using too few trials may not represent the individual's long-term performance. A single trial protocol has been suggested to be both invalid and unreliable (Bates et al., 1992) because of the potential inability of the single trial to represent the generalized performance. By chance the single trial could represent an average performance but also might be atypical. Greater movement variability results in less stable data and a greater likelihood of sampling an atypical performance from the population of all possible performances. Stability may be particularly important when trials are obtained in non-continuous activities (e.g., a discrete movement such as a jump or landing) or in a nonconsecutive manner in continuous activities (e.g., nonconsecutive strides in running). While, increasing the number of trials is thought to increase performance stability (Bates et al., 1983; Salo et al., 1997), how many trials are necessary to provide stable data? Although a few studies have examined this issue for nonconsecutive trials during the activities of running (Bates et al., 1983; 1992), walking (Hamill and McNiven, 1990), hurdling (Salo et al., 1997) , and vertical jumping (Rodano and Squadrone, 2002), little information is available about the number of trials necessary to achieve performance stability for nonconsecutive trials during landing. Moreover, different studies have used either different arbitrary criteria or different methods for determining stability, making comparisons among studies difficult. Running (Bates et al., 1983), walking (Hamill and McNiven, 1990), and vertical jumping (Rodano and Squadrone, 2002) all have been examined for performance stability of nonconsecutive trials using a sequential averaging estimation technique (see Methods). For running, results of the sequential averaging technique (using 10 reference trials and a 0.25 standard deviation criterion value) demonstrated that eight nonconsecutive steps (trials) were necessary to obtain stable data in 43 ground reaction force variables (Bates et al., 1983). Similar results were found when increasing the number of reference trials from 10 to 20 (Bates et al., 1983). For walking, the sequential averaging technique (using 20 reference trials and a 0.25 standard deviation criterion value) was used to determine that 10 nonconsecutive trials were necessary to reach performance stability of selected ground reaction force variables (Hamill and McNiven, 1990). For vertical jumping, the sequential averaging technique (using 25 reference trials and a 0.30 standard deviation criterion value) was used to determine that 12 trials were needed to establish performance stability of selected joint kinetic variables (Rodano and Squadrone, 2002). A limitation of the sequential averaging technique is that the number of reference trials and the standard deviation criterion value both influence the results, yet the values selected are arbitrary. Other investigators have used a variety of methods for examining the reliability, stability, and variability of gait variables both within and between days (Belli et al., 1995; Kadaba et al., 1989; Owings and Grabiner, 2003; Winter, 1984) and for consecutive (Belli et al., 1995; Owings and Grabiner, 2003) and nonconsecutive (Kadaba et al., 1989; Winter, 1984) trials. For example, Kadaba and colleagues calculated the coefficient of variation (CV) both within and between days to estimate the repeatability of spatiotemporal gait parameters, while the repeatability of kinematic, kinetic, and electromyographic wave forms were examined using an adjusted coefficient of multiple determination method (Kadaba et al., 1989). They suggested that data obtained from nonconsecutive trials from subjects walking at their preferred speeds were sufficiently repeatable (Kadaba et al., 1989). However, a limitation of their method was that the number of trials used to calculate repeatability was selected arbitrarily (three per session and nine per day). Conversely, Owings and Grabiner used running mean and standard deviation functions similar to the sequential averaging technique to examine the stability of selected gait variables over consecutive trials during treadmill walking for the purpose of calculating step variability (Owings and Grabiner, 2003). They suggested that at least 400 steps were required for accurate estimation of step kinematics (Owings and Grabiner, 2003). However, a limitation of their method was that many criteria used to establish stability across multiple steps of data also were selected arbitrarily. Belli and colleagues examined the absolute variability of total body vertical displacement and step time for consecutive trials during treadmill running at different velocities (Belli et al., 1995). They demonstrated that variability was relatively low at sub-maximal velocities, but increased at higher velocities (Belli et al., 1995). The absolute variability of each parameter was calculated as the standard deviation of each mean value, and was expressed as a percentage of the mean. They suggested that 32-64 steps were required to obtain better than 1% accuracy on the mean value (Belli et al., 1995). However, a limitation of their method was that the percentage value used to represent a desired accuracy was selected arbitrarily. Using a more traditional statistical method for examining performance stability, Salo and colleagues utilized the intra-class correlation coefficient (ICC) to examine the stability of selected kinematic variables in nonconsecutive trials during sprint hurdling (Salo et al., 1997). They predicted that as few as one to as many as 78 trials were necessary to reach a reliability of 0.90, depending on the specific kinematic variable examined. However, a limitation of this study was that only eight trials were actually collected from subjects and evaluated for reliability. Moreover, the value eight (i.e., eight trials) was selected arbitrarily. Additionally, the number of trials predicted to reach pre-determined reliability values was determined using the Spearman-Brown Prophecy formula, which likely overestimated the reliability values for large numbers of trials. While the number of trials necessary to achieve performance stability has been examined for a number of different locomotor tasks, the activity of landing has not been evaluated. Landing is an activity that has recently received much attention in the literature because of its implicit link to many lower extremity injuries, especially in female athletes (Griffin et al., 2000). Because the number of trials necessary to achieve performance stability during landing has not been established, the reliability (and consequently validity) of many landing studies which have used too few trials could be in question. Moreover, the method for establishing performance stability should be objective and not based on arbitrary criteria. While several statistical methods have been used to determine stability during gait and other activities, there appears to be no comparisons between methods. Therefore, the purpose of the current study was to answer the following questions: (1) How many trials are necessary to achieve performance stability during landing? (2) How do the results obtained from different methods of calculating performance stability compare to one another? It was hypothesized that similar to other locomotor tasks several trials would be necessary to achieve performance stability during landing. Additionally, it was hypothesized that different methods for determining stability would provide dissimilar results. |