Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks


Half of all surgical complications are estimated to be preventable, many of which are attributed to poor individual and team performance. Yet, surgeons often receive inadequate training and feedback on their performance, as the manual assessment process is time-consuming and requires expert supervision. We introduce a deep learning approach to track and recognize surgical instruments in cholecystectomy videos, which enables us to gain rich insight into tool movements and usage patterns to efficiently and accurately analyze surgical skill. We approach this task of tool detection and localization by leveraging region-based convolutional neural networks, and we collect a new dataset, m2cai16-tool-locations, extending the existing m2cai16-tool dataset with spatial bounds of tools. We then apply our model over time to extract tool usage timelines, motion heat maps, and tool trajectory maps, which we validate as effective performance indicators, demonstrating the ability of spatial tool detection to facilitate operative skill assessment.

    title={Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks},
    author={Jin, Amy and Yeung, Serena and Jopling, Jeffrey and Krause, Jonathan and Azagury, Dan and Milstein, Arnold and Fei-Fei, Li},
    journal={IEEE Winter Conference on Applications of Computer Vision},



The m2cai16-tool-locations dataset contains spatial tool annotations for 2,532 frames across the first 10 videos in the m2cai16-tool dataset, which includes 15 videos in total. Our dataset consists of 3,141 annotations of 7 surgical instrument classes, with an average of 1.2 labels per frame and 7 instrument classes per video. Examples of the spatial tool annotations are shown below, along with each class.

Operative Skill Assessment

With the added spatial annotations in m2cai16-tool-locations, we are able to perform tool localization in addition to classification, which enables higher level analysis of surgical performance. Applying our model's results over time, we extract assessment metrics that are found to effectively reflect key aspects of surgical skill, such as motion economy and bimanual dexterity. Examples of our qualitative assessment metrics are shown below.

Timelines (top), heat maps (middle), and trajectories (bottom) of tool usage for our testing videos. In the timelines, (a)-(g) correspond to Grasper, Bipolar, Hook, Scissors, Clipper, Irrigator, and Specimen Bag, respectively. These metrics enable us to efficiently examine back and forth switching of instruments, movement range, and motion patterns of tools. We find that testing video 2 correlates with the most well-executed surgery, reflecting focused and skillful execution of each step of the surgical procedure. In contrast, the surgeons in the other testing videos have much less economy of motion, handle the instruments with less dexterity, and struggle with certain parts of the procedure.