Model Parameters and The HMM Formulation Goals 1) Confirm the validity of the UW-toolkit by means of example models, hand calculation, and comparison of toolkits. 2) Apply parallel comparison amongst UMD-HMM and UW-HMM toolkits by inputting equivalent observation sequences with known model parameters. Abstract In the interest of objective skill assessment in surgical training, the use of Hidden Markov Modeling has been proposed to numerically analyze and assess skill level of surgical trainees. This is a statistical method that can robustly characterize a variety of systems and has enjoyed substantial success in applications of speech recognition. If valid, this approach would significantly augment surgical education, which, in its current state, leaves assessment and accreditation undesirably subjective. The BioRobotics Lab has developed a Hidden Markov Modeling Toolkit in the MATLAB programming environment as well as an extensive database of compatible surgical data. In order to establish the legitimacy of their use, the accuracy of the toolkit must be validated before its application. To this end, a three-phase approach has been developed and, to date, the first stage of operation has been successfully verified. Further more, similar toolkits in other programming environments have been obtained to add credibility to the final analysis. After toolkit verification is complete, an analysis of the surgical data is planned. Initial results using standard Markov modeling were confirmed by a tandem evaluation carried out by accredited surgeons, thus establishing proof-of- concept for this general approach. By implementing Hidden Markov Modeling on the same data sets, an objective measure of efficacy is possible. This affords a reliable optimization of the Markov parameters. In particular, the goal is to establish the minimum number of states in order to effectively analyze surgical skill. Compare P(O| ) Verification of Problem 1 To verify problem 1 of HMM’s, simplified example models were created to perform hand calculation versus code calculation comparisons. P(O| ) was calculated by hand with and without the use of scaling. Scaling must be used for significantly large t (> 100) because of the finite precision limits of computers. But to test the scaling algorithm for small t, P(O| ) was code calculated with and without scaling both coming to the same value as the hand calculations. Shown here is the comparison of of the UW-HMM calculations with scaling and hand calculations without scaling, Model MATLAB T = 1000 P ( O | ) Log(P(O| ) = less probable than result 1 UW-HMM Result UMD-HMM Result 1UMD-HMM Result 2 P ( O | ) Log(P(O| ) = less probable than result 1 P ( O | ) Log(P(O| ) = *.seq *.txt Two distinctive lambda generated by identical training method. Error in the code resulted in unfinished training. Lambda Parameter Training Comparison (Problem 3) MATLAB Uniformly distributed Lambda Uniform Model UMD-HMM Result 3 Unfinished training. Out of bound parameters. Parallel comparison between UW-MATLAB toolkit and UMD toolkit was performed to obtain their degree of accuracy. This was done using a small Lambda model, which contained limited state and observation transitions, to create an observation sequence to train the two toolkits. Surprisingly, both toolkits appeared to be rather defective. Further analysis must be completed to isolate the cause of error. Applications Surgical Task Decomposition Surgical Skill Evaluation Synthesis of Artificial Surgical ‘Skill Intelligence’ Use in Simulation and Robotics for Surgical Training and Assessment Conclusion Both HMM toolkits analyzed in this experiment proved to be invalid. This suggests that either the toolkits themselves are problematic or the current formulation of the Hidden Markov model is not amenable to the complexity of surgical analysis. Because HMM’s have been proven to work on problems of similar complexity in speech recognition, it is most likely that the toolkits themselves are at fault. Thus, in order to realize surgical application of Hidden Markov Models, the problem within these toolkits must be isolated and remedied. In order to establish the minimum number of states to resolve surgical skill, a validated tool must first be implemented. Background The standard (non-hidden) Markov model is a statistical method applied to time-sequenced observations. For example, in the case of three urns each holding a particular distribution of colored balls, an urn is selected at random and from that urn a colored ball is chosen. The standard Markov model thus observes the sequence of colored balls (observations) and the sequence of urns (states) which produced them. In the case of the Hidden Markov model, the urns are ‘hidden’ behind a curtain and only the observation of ball color is available. However, using a statistical approach the model can find the most optimal conditions ‘behind the curtain’ which yield the given observation. This black-box approach is favorable in the modeling of complex and highly varying tasks such as those found in surgery. Instead of implementing highly complex and expensive modeling algorithms, a block box is simply trained with expert vs. novice data and converges on a quantitatively accurate ‘hidden’ model of skill. Then an incoming trainee’s data can be measured against the trained models to quantitatively determine their position on the surgical learning curve. The BioRobotics lab has conditioned an extensive surgical data set compatible with the Hidden Markov paradigm. Certain, currently unknown model parameters (number of states, frequency of observation, initial conditions, etc) must first be established before an optimal HMM analysis can be directly implemented. Current Status References: Rabiner LR. A tutorial on hidden Markov models and selected application in speech recognition. Proc IEEE 1989;77(2): 257–286. J. Rosen, M. Solazzo, B. Hannaford, M. Sinanan, 'Task Decomposition of Laparoscopic Surgery for Objective Evaluation of Surgical Residents' Learning Curve Using Hidden Markov Model,' Computer Aided Surgery, vol. 7, pp , July The Three Basic Problems of HMM’s Problem 1: Given the observation sequence O = {Red, Blue, Green, Green, …Obs T }, and a model = (A, B, ), how do we efficiently compute P(O| ), the probability of the fit. … i.e., how ‘close’ is this surgical resident to the surgical expert, quantitatively? Problem 2: Given the observation sequence O = {Red, Blue, Green, …Obs T }, and a model, how do we choose an optimal state sequence Q = {Urn 1, Urn 2, Urn2, … q T (final state)}? … i.e., once the surgical model is trained, can we synthesize surgical intelligence? Problem 3: How do we adjust the model maximize P(O| )? … i.e., how do we ‘train’ the black box to model surgery? The Relevant Problems: Validation of Problems 1 and 3 confirms Problem 2, thus only they require verification. 1) The surgical database has been more extensively re-conditioned with vector quantization (VQ) which has proven to already distinguish the ‘vocabularies’ of surgical skill levels. 2) The length of the observation sequence has been isolated as the source of toolkit instability, especially during the execution of model training (Problem 3). 3) To date, six alternate toolkits have been obtained which may be able to overcome the problems encountered with the first two. Form of the HMM Fumihiko Nagahisa, Jacques Le, Timothy Kowalewski Blake Hannford, Jacob Rosen Dept. of Electrical Engineering and Biorobotics Lab, University of Washington, Seattle, WA Support: UW Center for Videoendoscopic Surgery, National Science Foundation, ITR Program Example Model UW-HMM “ -matrix” Calculations w/Scaling Hand “ -Matrix” Calculations w/o Scaling P(O| ) P(O| ) = = E-04 P(O| ) = (1.71E E E-04) -1 = E-04