The Particle Filter was developed to address the problem of tracking contour outlines through heavy image clutter (Isard and Blake, 1996; 1998). The filter’s output at a given time-step, rather than being a single estimate of position and covariance as in a Kalman filter, is an approximation of an entire probability distribution of likely joint angles. This allows the filter to maintain multiple hypotheses and thus be robust to distracting clutter. With about 32 DOFs for joint angles to be determined for each frame, there is the potential for exponential complexity when evaluating such a high dimensional search space. MacCormick (2000) proposed Partitioned Sampling and Sullivan (1999) proposed Layered Sampling to reduce the search space by partitioning it for more efficient particle filtering. Although Annealed Particle Filtering (Deutscher et al., 2000) is an even more general and robust solution, it struggles with efficiency which Deutscher (2001) improves with Partitioned Annealed Particle Filtering. The Particle Filter is a considerably simpler algorithm than the Kalman Filter. Moreover despite its use of random sampling, which is often thought to be computationally inefficient, the Particle Filter can run in real-time. This is because tracking over time maintains relatively tight distributions for shape at successive time steps and particularly so given the availability of accurate learned models of shape and motion from the human-movement-recognition (CHMR) system. Here, the particle filter has: 3 probability distributions in problem specification: one probability distribution in solution specification: 1. Prior density: Sample s′t from the prior density p(xt-1|zt-1) where xt-1=joint angles in previous frame, zt-1. The sample set are possible alternate values for joint angles. When tracking through background clutter or occlusion, a joint angle may have N alternate possible values (samples) s with respective weights w, where prior density, p(x) ≈ St-1 = {(s(n),w(n)), n=1..N} = a sample set (St-1 is the sample set for the previous frame, w(n) is the nth weight of the nth sample s(n) ) For the next frame, a new sample is selected, s′t = st-1 by finding the smallest i for which c(i) ≥ r, where c(i) = ∑tw(i) and r is a random number {0,1}. 2. Process density: Predict st from the process density p(xt|xt-1= s′t(n)). Joint angles are predicted for the next frame using the kinematic model, body model & error minimisation. A joint angle, s in the next frame is predicted by sampling from the process density, p(xt|xt-1 = s′t(n)) which encompasses the kinematic model, clone-body-model and cost function minimisation. In this prediction step both edge and region information is used. The edge information is used to directly match the image gradients with the expected model edge gradients. The region information is also used to directly match the values of pixels in the image with those of the clone-body-model’s 3D colour texture map. The prediction step involves minimizing the cost functions (measurement likelihood density): edge error Ee using edge information (see Equation 2 in Appendix): region error Er using region information(see Equation 3 in Appendix): 3. Observation density: Measure and weigh the new position in terms of the observation density, p(zt|xt). Weights wt = p(zt|xt = st) are estimated and then weights ∑nw(n) = 1 are normalized. The new position in terms of the observation density, p(zt|xt) is then measured and weighed with forward smoothing: Smooth weights wt over 1..t, for n trajectories Replace each sample set with its n trajectories {(st,wt)} for 1..t Re-weight all w(n) over 1..t Trajectories tend to merge within 10 frames O(Nt) storage prunes down to O(N) In this research, feedback from the CHMR system utilizes the large training set of skills to achieve an even larger reduction of the search space. In practice, human movement is found to be highly efficient, with minimal DOFs rotating at any one time. The equilibrium positions and physical limits of each DOF further stabilize and minimize the dimensional space. With so few DOFs to track at any one time, a minimal number of particles are required, significantly raising the efficiency of the tracking process. Such highly constrained movement results in a sparse domain of motion projected by each motion vector. Because the temporal variation of related joints and other parameters also contains information that helps the recognition process infer skill boundaries, the system computes and appends the temporal derivatives and second derivatives of these features to form the final motion vector. Hence the motion vector includes joint angles (32 DOF), body location and orientation (6 DOF), centre of mass (3 DOF), principle axis (2 DOF) all with first and second derivatives. |