The results of this investigation partially support our a priori hypotheses. More specifically, the results appear to indicate that the commonly used 3 familiarization trials, produce stable BSS scores over 3 subsequent test trials within the same test session in both dual and single limb stance on resistance level 1. However, the test-retest reliability of level 1 BSS scores over a 10-week period was inconsistent between stances and across outcomes (OSI, APSI, MLSI) based on the best or mean score. Further, all significant differences observed between the first and second test session, indicate a continued improvement at the second test session regardless of best or mean scores. This trend was more prominent in dual limb stance but improvements, although not always statistically significant, were also observed by the single limb stance data. Measurement reliability is the level of steadiness displayed by a device and/or outcome when repeated under identical conditions (Emery, 2003). High test-retest reliability is needed to determine if changes (preferably improvement) in outcome scores were caused by a therapeutic intervention or the result of high variability within the outcome score (Gribble and Hertel, 2003). However, it should be noted that an acceptable test-retest reliability score does not guarantee that learning effects are not occurring within the testing protocol. Indeed, learning effects have been found within a test session for the Star Excursion Balance Test (Gribble and Hertel, 2003) and among test sessions for the Sensory Organization Test (Wrisley et al., 2007). Specific to the Star Excursion Balance Test, Hertel et al. (2000) found a consistent improvement with practice until a plateau appeared during trials 7 through 9. Therefore, Hertel et al. (2000) recommended having participants perform 6 familiarization trials in each direction before recording test scores that would be used for further analysis. There are several plausible explanations of the generally poor test-retest reliability and the high MDC scores observed in the current investigation. One possible explanation is the extreme instability of a level 1 resistance on the BSS. Given the difficulty of the task and high MDC scores observed, three practice trials per stance may be insufficient to allow the participants to generate adequate motor programs that will persist over long periods of time (e.g. 10-weeks). The significant improvements observed between the first and second test session in multiple outcomes provide evidence which supports this hypothesis. The literature clearly indicates that balance is not only an innate ability but also a learned and gained skill (Tjenstrom et al., 2002; Ruiza and Richardson, 2005). The more novel and challenging the task, the greater the time needed to overcome the associated learning effect (Valovich et al., 2003). In addition, Hansen (2000) has shown that it takes greater practice time when learning a dynamic, relative to static, balance task due to the inconsistent proprioceptive input and subsequent increase in difficulty with coordinating correctly timed movements. Indeed, practice has a profound effect on the development of efficient postural control strategies (e.g. increasing the stiffness in the ankles and knees) (Tjenstrom et al., 2002; Wrisley et al, 2007). Similar to the results of the current investigation, a recent investigation (Pickerill and Harter, 2011) demonstrated low to moderate reliability of BSS limits of stability scores. Based on the findings, researchers did not recommend using the LOS measures from BSS as the gold standard. Regardless of the reliability estimates, the very high MDC scores strongly suggest that clinicians should not use level 1 BSS scores as an objective tool to monitor rehabilitation progress or intervention effectiveness. Indeed, the high MDC scores indicate that a substantial, and impractical, change in level 1 BSS scores are needed to exceed the error of the measurement. For example, all of the outcomes (best and mean score for OSI, APSI, and MLSI) had MDC scores larger than the mean score for the 1st test session. Further, some outcomes suggest that a change of up to 150% of the recorded mean is needed to be confident that inter-session change (Table 2) is due to the intervention delivered and not the measurement error. We are confident that our sample is representative of the larger population based on the favorable comparison between our current data and those published previously. For example, the scores of Sherafat et al, using a dual limb stance on stability level 3 observed slightly higher (worse) OSI (3.33°), APSI (2.56°), and MLSI (2.24°) than those observed in the current study (Table 2) (Sherafat et al., 2013). However, current participants were given real time visual feedback during 20-second trials while Sherafat et al. (2013) denied visual feedback to their participants during 30-second trials. Our single limb stance data (Table 2) is consistent with those recorded during 20-second trials [OSI (1.28°), APSI (0.98°), and MLSI (0.66°)] (Arifin et al., 2013). However, Malliou et al. (2004) reported extremely high mean OSI (~7.9°), APSI (~6.7°), and MLSI (~3.9°) scores during a single limb stance on level 1 in young soccer players with eyes open but no information was provided about visual feedback. The extreme variability between our current data and those reported by Malliou et al. (2004) cannot be easily explained but Cachupe et al. (2001) has reported that OSI scores fluctuate between 2.2° to 17.7° on level 2 of the BSS. Our reliability estimates are also similar to those observed in the literature. For example, Sherafat et al. (2013) found good reliability with OSI scores despite significant improvements from pre to post test. Our data also observed significant improvements from the first to second test session while recording poor ICC values in dual limb stance when using a mean score. Single limb stance ICC values (OSI: 0.90, APSI: 0.86, MLSI: 0.76) reported by Cachupe et al. (2001) and are very similar to those observed in the current study for single limb stance (Table 2). Given the cumulative evidence amongst the results of the current study and the literature, it appears that lower stability levels on the BSS may not be appropriate to be used as an objective marker of progression or consistency over time. However, it is important to note that the current results do not condemn the use of a level 1 resistance of the BSS as a training tool. A limitation of the current investigation was the relatively small sample size of young sedentary but otherwise healthy adults which may affect the generalizability of the findings. Another limitation was the consistent test order that participants underwent (dual limb following by single limb stance). This specific test protocol order may explain the higher reliability of single limb stance scores, relative to dual limb stance scores. While speculative, this pattern could suggest that perhaps as many as 9 familiarization trials (i.e. 6 dual limb and 3 practice single limb trials conducted before the 3 single limb test trials during the first test session) are needed to become proficient at maintaining single limb stance on a level 1 resistance of the BSS over prolonged periods of time (i.e. 10-weeks or greater). Finally, participants were given real-time feedback regarding their center of pressure on the BSS computer interface and allowed to see the balance score associated with each trial. These factors may have artificially shortened the learning curve associated with lower stability levels on the BSS. In other words, without this feedback additional familiarization trials may have been needed. Future research should attempt to address these limitations in a large and more diverse sample size to better capture the true reliability, precision, and MDC score associated with level 1 BSS scores in multiple stances. Future research should also determine the optimal number of practice trials that would result in acceptable test-retest reliability as well as acceptable MDC scores as well as the amount of retention that occurs from different amounts of familiarization trials. |