The primary objective of this investigation was to assess the validity of heart rate measurements from two commercially available, wrist-worn activity trackers incorporating proprietary reflective photoplethysmographic biosensors, i.e. the Basis Peak™ and Fitbit Charge HR™. The accuracy of the BPk and FB was evaluated in reference to the criterion measure ECG. When examining the data in aggregate (n = 87,340), the BPk met the proposed validity criteria for heart rate detecting and monitoring devices (r ≥ 0.90 and mean bias < 3 bpm) (Table 1, Figure 1). We observed a strong correlation (r = 0.92) between the BPk and ECG with an acceptable mean bias of -3 bpm and an absolute differential from criterion measurements of approximately 5%. On the basis of the Bland-Altman analysis and the 95% limits of agreement (+19 to -24 bpm), the bias of the BPk may be reasonably described as systematic (Figure 1) and therefore, may be used interchangeably with ECG for accurate HR measurements. In comparison, the BPk demonstrated a similar level of performance to the Apple Watch which has been previously validated by data indicating a strong correlation (r = 0.95) with ECG and a small mean bias of -1 bpm (Wallen et al., 2016). Interestingly, the accuracy of the BPk is slightly compromised with increasing physical effort (> 116 bpm per ECG). This was observed in the analysis of data pairs associated with an ECG HR above 116 bpm (mean ECG HR) (n = 41 315). Specifically, the correlation between BPk and ECG weakened per coefficient of 0.77, and the mean bias was slightly exacerbated to -5 bpm. Nonetheless, our data set above the ECG mean showed strong agreement between the heart rates derived from the BPk and the criterion measure of the ECG. When considering resting or recovery conditions and each individual mode of exercise, the BPk performed relatively accurately with very strong agreement to ECG (especially during rest/recovery) (Table 3). Only during intense cycling and the isometric plank did the BPk demonstrate a minor decrease in performance. These outcomes are consistent with previous results for other PPG-based devices (Mio Alpha and Scosche myRhythm) that revealed a task-specific variation in device performance (Parak and Korhonen, 2014). For instance, the Mio Alpha showed the greatest error (-4.8% error) during cycling while the Scosche myRhythm was during the walking exercise (-3.1% error). The FB presented with weaker correlation (r = 0.83) and less agreement (mean bias= -9 bpm, 95% LoA 24, -42) to ECG than the BPk when examining the aggregated data set (n = 87 340). Despite a moderately strong correlation, the resulting r-value and mean bias score failed to meet the proposed validity criteria for accurate heart rate measurements. Furthermore, the Bland-Altman analysis for the aggregated data set reflects a large tendency for HR underestimation by the FB especially at higher ends of the mean HR spectrum. Thus, the FB may not be considered interchangeable with ECG for accurate measurement of HR. These findings corroborate previous FB results by Wallen et al. (2016) which demonstrated a -9 bpm bias, 95% LoA (7, -26 bpm), and a correlation coefficient of 0.81. Moreover, we observed severely diminished performance during physical activities eliciting higher ECG heart rates (i.e. >116 bpm). The very weak correlation (r = 0.58) together with the large mean bias (= -13 bpm) and high standard error (= 20.1) strongly suggest the FB to be an inaccurate means of monitoring HR with increasing physical exertion. The FB appeared to perform with improved accuracy during conditions corresponding to lower ECG heart rates based on a small average bias of -5.3 bpm. However, even during lighter physical exertion, per mean ECG HR, the substandard correlation strength (r = 0.73) may not suffice to substantiate accuracy. Specifically during resting or recovery situations, the FB demonstrated a moderately strong correlation with ECG (r = 0.83) and underestimated HR by only 3.7 bpm on average (Table 3). The FB performed with moderate accuracy and agreement to ECG during the walk, jog, and run as reflected by strong correlations and relatively low mean bias scores (~ -4 bpm). During the isometric plank, resisted lunges, and cycling, however, the FB demonstrated very weak to moderately weak correlations and large mean bias scores. Thus, in addition to activities of higher physical exertion, specific exercise tasks also appear to dramatically attenuate the performance of the FB for accurate heart rate tracking. It is evident based on our data as well as others’ (Parak and Korhonen, 2014) that PPG-based HR monitors experience reduced accuracy during elevated physical exertion and specific exercise tasks, two plausibly interrelated factors. Moreover, the degree of detriment and the exercise type related to poor performance varies among the different PPG devices. To speculate on the potential factors impeding accurate measurements during exercise, especially for the FB, we examine the commonalities of the exercise tasks eliciting the worst device performance, i.e. cycling, resisted arm raises, resisted lunges, and isometric plank. Each of these exercises involve sustained or repetitive contractions of forearm skeletal muscles which may influence the efficacy by which the optical sensors acquire sufficient photoplethymographic signals for accurate HR computation. Previous evidence suggests that the contact or compression force between the sensor and the measurement site (i.e. skin surface of the wrist) significantly affects the waveform and thereby the quality of the photoplethysmographic signal (Allen, 2007; Rafolt and Gallasch, 2004; Teng and Zhang, 2004). Moreover, increased compression of the wrist against the PPG sensor may exacerbate contact-related noise artifact, ultimately disrupting signal quality. This in turn would impede heart rate detection and preclude an accurate measurement (Teng and Zhang, 2004). Accordingly, manufacturer instructions for wrist-worn PPG devices often recommend the user to refrain from overtightening the strap as to avoid large sensor to skin contact force or compression. During specific exercises involving sustained or repeated forearm muscle contractions as noted above, the contact force between the device and skin is likely increased. Elevated contact force between the FB and the skin may be a plausible explanation for the lack of performance evident during these specific exercises. However, this contention is merely speculative as the effects of contact force were not the scope of the present investigation. Nonetheless, each device was worn by the subjects of the present investigation according to manufacturer recommendations. Thus, the activity trackers under investigation, particularly the FB, may require further scrutiny as it relates to 1) whether or not contact force between the skin and sensor varies across different exercises, and 2) if so, whether or not changing contact force with varying exercises alters the PPG signal quality and device accuracy. Moreover, skin color has also been previously suggested as a factor affecting characteristics of PPG signals and thus algorithm performance (Allen, 2007; Butler et al., 2016). Although, it remains uncertain as to the level of technical control these devices incorporate to address skin color-related artifact, prior evidence suggests that PPG-based devices may detect pulsation across all skin types and that a greater signal resolution is obtained using a green light wavelength at rest and during exercise (Fallow et al., 2013). Regardless, the current data may be limited as skin color was not accounted for within our methodology. This, however, may be balanced by the ample size of our data pool resulting from comparatively high frequency sampling rate. Future validation work should incorporate separate analyses specific to subject skin color as determined systematically by, for example, the Fitzpatrick Scale (Fitzpatrick, 1988). Overall, heart rate monitoring on the dorsal wrist using reflective PPG sensors has apparent and evident limitations (Rafolt and Gallasch, 2004; Teng and Zhang, 2004). Traditional methods of personal heart rate tracking via electrocardiac detection using chest strap sensors with external monitors (e.g. Polar technology) have consistently demonstrated accuracy and high agreeability to ECG (Terbizan et al., 2002). Nonetheless, industry continues to lay focus on streamlining personal biometric and activity monitoring into a single wrist-worn device to enhance practicality and versatility. Thus, efforts to improve PPG-based HR tracking in multi-sensor, multi-function activity trackers require continual focus on improving the overall control of extrinsic factors that have shown to interrupt PPG-signals and accurate HR computation. |