CN114005022B

CN114005022B - Dynamic prediction method and system for surgical instrument

Info

Publication number: CN114005022B
Application number: CN202111636026.1A
Authority: CN
Inventors: 王昕�; 李昂; 赵颖; 刘杰
Original assignee: Chengdu Yurui Innovation Technology Co ltd; West China Hospital of Sichuan University
Current assignee: Chengdu Yurui Innovation Technology Co ltd; West China Hospital of Sichuan University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-03-25
Anticipated expiration: 2041-12-30
Also published as: CN114005022A

Abstract

The invention relates to a dynamic prediction method and a dynamic prediction system for surgical instruments, which comprise the following steps: s1, inputting the images acquired in real time into an operation stage identification model and an instrument head key point detection model to obtain an identification result and a detection result; s2, inputting the real-time collected images and the coordinate information of the key points of the head of the instrument into an image feature extraction model to obtain the feature vector of the scene information around the head of the instrument; s3, inputting the fused surgical elements into an instrument prediction model, obtaining the use possibility sequence of the instruments in a future period of time, and transmitting the instruments by nurses according to the use possibility sequence of the instruments; and S4, repeating the steps and arranging the sequence list, and transferring the instruments by the nurse according to the new sequence. The invention shortens the reaction time of the instrument assistant and reduces the probability of the occurrence of matching errors, accelerates the running-in progress between the instrument assistant and the main surgeon, enhances the matching degree of an operation team, improves the efficiency of the operation and increases the satisfaction of the medical care personnel.

Description

Dynamic prediction method and system for surgical instrument

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a dynamic prediction method and a dynamic prediction system for surgical instruments.

Background

The surgeon needs good matching in the operation process, an instrument assistant effectively provides surgical instruments for the surgeon in time, and the good matching can effectively improve the smoothness of the operation, reduce the operation time and integrally improve the operation quality; thus, the instrument assistant is well aware that the use of surgical instruments by surgeons is one of the most fundamental requirements and is an important part of ensuring efficient surgical completion and achievement of a tacit fit.

The cooperation between the surgical instrument assistant and the main surgeon in the current surgical process has the following problems, so that the running-in period is long and the cooperation capability is low; the first is that the types of surgical operations and surgical styles are greatly different, and different departments and different types of surgical styles all require different instruments to operate according to different sequences at each stage; secondly, in the actual operation, the assistant of the instrument is often taken by nurses, but the nurses have larger mobility, and the abilities and the understanding of different nurses and the familiarity of different operations are greatly different; thirdly, each surgeon has own appliance using habit and using characteristics; based on the problems, different instrument assistants and main surgeons need to spend a longer running-in period to achieve fit in a qualified and effective mode, and the contradiction existing due to the long running-in period is particularly obvious and urgently needed to be solved in the early working life of the instrument assistant.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a dynamic prediction method and a dynamic prediction system for surgical instruments, and solves the problem of the cooperation between a surgical instrument assistant and a main surgeon in the current surgical process.

The purpose of the invention is realized by the following technical scheme: a dynamic prediction method for use with a surgical instrument, the dynamic prediction method comprising:

s1, inputting the images acquired in real time into an operation stage identification model and an instrument head key point detection model to obtain an operation stage identification result and an instrument head key point coordinate information detection result;

s2, inputting the real-time collected images and the coordinate information of the key points of the head of the instrument into an image feature extraction model to obtain the feature vector of the scene information around the head of the instrument;

s3, inputting the fused surgical elements into an instrument prediction model, obtaining the use possibility sequence of the instruments in a future period of time, and transmitting the instruments by nurses according to the use possibility sequence of the instruments;

the surgical elements comprise a surgical stage, a currently used instrument type and scene characteristics around the head of the instrument.

And S4, repeating the steps S1-S3 to update the instrument use possibility sequence in real time, and delivering the instruments according to the new sequence by the nurse.

The dynamic prediction method further comprises a model construction step; the model building step is executed before step S1 when the dynamic prediction method is implemented for the first time, and the model building step is not executed in the later operation, specifically including:

uniformly transcoding the surgical video through transcoding software, and extracting surgical pictures at equal time intervals;

marking operation stages and scenes in an operation video, marking key points of the head of the surgical instrument on an operation picture, and constructing a corresponding database;

and constructing an operation stage identification model, an instrument head key point detection model, an image feature extraction model and an instrument prediction model through the labeled operation stage data and the instrument head key point data.

The method comprises the steps of respectively training an operation stage identification model by using a time sequence identification algorithm, training an instrument key point detection model by using a key point detection algorithm, training an image feature extraction model by using a self-coding algorithm, and training an instrument prediction model by using a time sequence identification algorithm according to instrument head key points and operation stage data marked in an operation picture and an operation video.

Inputting the fusion surgical element into an instrument prediction model, and obtaining a ranking of instrument usage likelihood for a future period of time comprises:

respectively sending the images of the past N seconds into an operation stage identification model and an instrument head key point detection model to obtain the current operation stage, the type of the currently used instrument and instrument head scene data;

inputting the instrument head scene data into an image feature extraction model to obtain a feature vector, wherein the feature vector represents the information features of the scene near the head of the surgical instrument;

fusing the operation stage type, the operation proceeding time, the stage proceeding time, the surgical instrument type and the scene information characteristics near the head of the surgical instrument to obtain an operation element characteristic vector;

and inputting the characteristic vector of the surgical element into an instrument prediction model to obtain a ranked list of the possible use probability of the instrument in m periods in the future.

A dynamic prediction system for surgical instruments comprises a first data acquisition module, a second data acquisition module, an instrument sequencing module and an iteration module;

the first data acquisition module is used for inputting the images acquired in real time into the operation stage identification model and the instrument head key point detection model to obtain an operation stage identification result and a detection result of instrument head key point coordinate information;

the second data acquisition module is used for inputting the real-time acquired images and the coordinate information of key points of the head of the instrument into the image feature extraction model to acquire the feature vector of the scene information around the head of the instrument;

the instrument sequencing module is used for fusing surgical elements and inputting the surgical elements into the instrument prediction model to obtain instrument use possibility sequencing in a future period of time, and nurses transmit instruments according to the instrument use possibility sequencing;

the iteration module is used for repeating the first data acquisition module, the second data acquisition module and the instrument sequencing module in sequence, updating the instrument use possibility sequencing in real time, and enabling a nurse to transfer instruments according to the new sequencing.

The model building module is used for uniformly transcoding the surgical video through transcoding software and extracting surgical pictures at equal time intervals; marking operation stages and scenes in an operation video, marking key points of the head of the surgical instrument on an operation picture, and constructing a corresponding database; and constructing an operation stage identification model, an instrument head key point detection model, an image feature extraction model and an instrument prediction model through the labeled operation stage data and the instrument head key point data.

The instrument head key point detection model comprises a first down-sampling, a plurality of hourglass structures and a classifier; the first down-sampling comprises a 2D convolution kernel, batch data normalization and a nonlinear activation function, and is mainly used for fusing and compressing original image data to obtain a feature map;

the hourglass structure comprises an encoding part and a decoding part, wherein the encoding part is used for carrying out feature fusion on the feature map so as to extract image semantic information, and the decoding part is used for resampling deep semantic information to a feature image with higher resolution so as to realize that the semantic information is embodied at the position of an original image;

the classifier is used for calculating the information extracted from the feature map and classifying each pixel of the original image.

The operation stage identification model comprises a combination of a residual error neural network and a long-time and short-time memory network, wherein the residual error neural network is used for extracting semantic information of a single image and continuously calculating image characteristics by using tensor calculation and nonlinear transformation; the long-time and short-time memory network is used for calculating and obtaining stage information of a period of continuous operation through characteristics of a plurality of continuous images so as to identify the stage of the operation in the past period.

The image feature extraction model comprises a second down-sampling and an up-sampling; and extracting image semantic features through the second down-sampling, restoring the original image according to semantic information during up-sampling, and finally taking the final output result of the second down-sampling as an image feature extraction result.

The invention has the following advantages:

1. the method dynamically predicts the surgical instruments to be used according to different scenes of the operation and the use condition of the current instruments, sequentially outputs a list of the surgical instruments to be used, assists to prompt instrument assistants, instrument nurses and the like to be fast and accurate surgical instruments for doctors, provides effective prediction, shortens the reaction time of the instrument assistants (instrument nurses) and reduces the probability of matching errors of the instrument assistants, and enables the instrument assistants to perform and be familiar with the instrument transmission work under different types of operations more quickly and better, thereby accelerating the running-in progress between the instrument assistants and the main doctors, enhancing the matching degree of a surgical team, further improving the efficiency of the operation and increasing the work satisfaction of medical staff.

2. Compared with the traditional case teaching and limited on-site teaching, the instrument assistant, the instrument nurse and the like can learn and familiarize the instrument transfer work under different types of operations on site under the effective prompt of the model, so that the time cost and the labor cost involved in learning and adaptation are reduced, and more convenient and richer opportunities for the practice nurse to familiarize the post work are provided. Meanwhile, the system is also suitable for operation teams with strong personnel mobility, such as teaching hospitals and culture bases, and the running-in and adaptation time among team members is shortened through the method so as to keep the matching effect of the operation teams and the completion quality of the operation on the premise of strong mobility.

3. Can be combined with a robot assistant to automatically prepare the surgical instrument for the main surgeon, effectively improve the working efficiency and reduce the working pressure of the instrument assistant. Meanwhile, the instruments are subjected to auxiliary counting according to the types, times and frequency of the instrument use, and therefore the safety and the further development of the operation are prepared.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of a data annotation process;

FIG. 3 is a schematic diagram of an artificial intelligence computer model construction;

FIG. 4 is a schematic view of a surgical stage identification model;

FIG. 5 is a schematic structural diagram of a key point detection model of the head of the instrument;

FIG. 6 is a schematic diagram of an image feature extraction model structure;

FIG. 7 is a schematic diagram of an instrument prediction model.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the present invention specifically includes the following:

collecting laparoscopic surgery videos including, but not limited to, laparoscopic cholecystectomy and laparoscopic pancreaticoduodenectomy, and labeling the pictures and the videos respectively to obtain corresponding surgery labeling databases containing surgery stage data and surgical instrument head key point data; an artificial intelligence computer model for identifying surgical elements, a surgical instrument identification model based on clinical application and a surgical stage and scene identification model are obtained through labeled data training, and the models are established based on the surgical elements obtained by the artificial intelligence computer model;

and respectively obtaining data of the surgical instrument category, the surgical phase identification and the scene characteristic vector through a surgical instrument head key point detection model, a surgical phase identification model and an image characteristic extraction model obtained in the artificial intelligence computer model construction step, inputting the data into the constructed surgical instrument prediction model, and finally outputting a probability list of the surgical instruments possibly used at each time point within the future M seconds.

Further, the operation videos to be collected and labeled are collected according to the requirement that the resolution is not lower than 720 × 560 and not lower than 25 frames per second.

As shown in fig. 2, a database required by the instrument head key point detection model is constructed, pictures to be labeled are extracted from all videos by FFmpeg at equal intervals according to the frequency of 1 frame in 1 second, pictures which are clear and contain surgical instruments are selected, and then Labelme software is used for manually labeling the selected pictures.

Further, the surgical instruments include, but are not limited to, 26 kinds of surgical instruments such as stab cards, atraumatic graspers, electric hooks, ultrasonic knives, clip appliers, mariannd forceps, intestinal forceps, straight separation forceps, scissors, cutting closers, nail cartridges, needle holders, large graspers, separation forceps, puncture needles, aspirators, electrocoagulation, drainage tubes, needle holders, wave forceps, etc. The key segment of the head usually refers to the far point of the functional part of the apparatus away from the hand-held end as the marked point.

In the labeling of the operation stage and the scene, all operation videos are uniformly transcoded into an mp4 format by FFmpeg; labeling of the surgical stage was manually performed using the software of Anvil Video Annotation Research Tool (Anvil for short).

For example, in laparoscopic cholecystectomy, laparoscopic pancreaticoduodenectomy, and laparoscopic segmental pulmonary resection, the surgical stages and scenes include the following:

the middle stage of laparoscopic cholecystectomy consists of: 1. establishing pneumoperitoneum, 2, releasing adhesion, 3, separating liver and gall triangle, 4, dissociating gall bladder bed, 5, cleaning operation area and 6, taking out six parts of gall bladder; the scene comprises the following steps: 1. clamping the cystic duct, 2, clamping the cystic artery, 3, disconnecting the cystic duct, and 4, disconnecting the cystic artery;

the middle stage of the laparoscopic pancreaticoduodenectomy comprises: 1. establishing pneumoperitoneum, 2, descending transverse colon, 3, establishing Kocher's incision, 4, treating the lower edge of pancreas, 5, peeling off gallbladder and separating common hepatic duct, 6, treating the upper edge of pancreas, 7, separating stomach or duodenum, 8, separating jejunum and free mesentery, 9, separating pancreatic neck, 10, performing uncinate resection, 11, performing pancreatic and intestinal anastomosis, 12, performing biliary and intestinal anastomosis, and 13, performing gastrointestinal anastomosis;

the laparoscopic stage of the lung resection comprises: 1. establishing an access, 2, dissociating lung sections, 3, cleaning lymph nodes, 4, dissociating pulmonary artery and vein, 5, taking out a specimen, and 6, cleaning a surgical field;

the definition and the decomposition of the stages and scenes of different operation types are carried out according to the common knowledge, the monographs, the treatises and other documents of experts, and the marked contents are trained by frame extraction of FFmpeg software every second;

the content of training the surgery stage recognition model includes, but is not limited to, the above three surgical stages.

As shown in fig. 3, the artificial intelligence computer model for dynamically predicting the usage of the surgical instrument is mainly composed of the following four models: the system comprises an instrument head key point detection model, an operation stage identification model, an image feature extraction model and an instrument prediction model. Inputting an original image into an instrument head key point detection model and a stage identification model to obtain an instrument category, instrument head vicinity image information and stage identification information; inputting image information near the head of the instrument into an image feature extraction model to obtain a feature vector;

combining the operation duration and the stage duration, fusing the information of instrument category information, operation stage information, feature vectors and the like, and inputting an instrument prediction model to obtain the probability of each instrument used in the continuous process at each time period.

As shown in fig. 4, the instrument head keypoint detection model mainly uses a 2D convolutional neural network, and combines batch data normalization, a nonlinear activation function and a nearest neighbor interpolation algorithm into a network population. The network model is composed of a down-sampling part, a plurality of hourglass structures and a classifier.

The above formula is a convolution calculation formula, wherein y represents convolution calculation output, n represents the number of neurons, and w_iWeight, x, of the ith neuron_iRepresenting the input data of the ith neuron, and b adding an offset to the calculation result.

The above formula is a batch normalization layer calculation formula, where BN represents the batch normalization calculation output, x represents the input data, ex represents the mean of the x tensor, Var x represents the variance of the x tensor, ξ represents a very small parameter to ensure that the denominator is not 0, and γ and β are learnable coefficients.

The down-sampling part is composed of 2D convolution, batch data normalization and a nonlinear activation function, and has the function of fusing and compressing original image data to obtain a feature map.

The hourglass structure is mainly divided into two parts: an encoding portion and a decoding portion. The coding part performs feature fusion on the feature map to extract image semantic information and comprises 2D convolution, batch data normalization and a nonlinear activation function; the decoding part up samples the deep semantic information to a characteristic image with a larger resolution ratio, aims to embody the semantic information at the position on the original image, and consists of 2D convolution, batch data normalization, a nonlinear activation function and a nearest neighbor interpolation algorithm. Because the deep semantics can lose part of position information, the regression of the segmentation coordinates is not accurate enough, so that a part of feature maps with the same size as that of a decoding process can be extracted in the encoding process and the feature maps in the decoding process are subjected to data fusion, and the accuracy of the segmentation positions is improved.

The classifier is to calculate the information extracted by the feature map, classify each pixel of the original image and is composed of 2D convolution, batch data normalization and a nonlinear activation function.

The use sequence of each structure in the instrument head key point detection model is as follows: the original image is sent to a downsampling part to obtain a feature map A, a feature map B is obtained by enabling the feature map A to pass through a plurality of hourglass structures connected in series, and the feature map B is input into a classifier to achieve pixel-level segmentation of the original image.

Further, when the instrument head key point model is constructed, the segmentation result and the labeled data are compared, and a gap updating model is calculated.

As shown in fig. 5, the surgery stage recognition model is composed of a residual error neural network and a long-time and short-time memory network; the medium residual error neural network is composed of a 2D convolution function, a batch data normalization function and a nonlinear activation function and is used for extracting semantic information of a single image. The model continuously computes image features using tensor computations as well as nonlinear transformations.

The stage information of the continuous operation is calculated by a long-time memory network through the characteristics of a plurality of continuous images, and the network is used for identifying the stage of the operation in the past period.

(h_t, c_t)=f(h_t-1, c_t-1, x_t)

The above formula is a calculation formula of the long-time and short-time memory network, wherein h_t，h_t-1Representing the recognition results of the model at the t moment and the t-1 moment, generally speaking, if the recognition is carried out on the first t seconds, only h is removed_tAnd finally outputting the model; c. C_t，c_t-1Representing the cell states at the t moment and the t-1 time, namely memorizing information for the information at the past t moment and t-1 time; x is the number of_tRepresenting the feature vector information input at time t.

In the using process, all video frames needed for t seconds in the past are required to be extracted from the image characteristic information of a single picture through a residual error neural network, and then the image information of the t seconds is input into a long-time and short-time memory network according to the time sequence to obtain the final model identification result. When the model is constructed, the identification result and the labeled data are compared, and the gap updating model is calculated.

As shown in fig. 6, the image feature extraction model is a net model like UNet used, where the UNet is modified by downsampling not using maximum pooling but using 2D convolution and using the surrounding complement; instead of using 2D deconvolution, a nearest neighbor interpolation algorithm is used in the upsampling process.

The image feature extraction network consists of a down-sampling part and an up-sampling part, and also consists of 2D convolution, batch data normalization and a nonlinear activation function. And the down sampling is used for extracting the semantic features of the image, and the original image is restored according to the semantic information during the up sampling.

The final output result of the down sampling part is used as the image characteristic extraction result. And comparing the up-sampling result with the original image by jointly using the down-sampling part and the up-sampling part, and calculating a gap updating model.

As shown in fig. 7, the instrument prediction model uses a long-time and short-time memory network, and a phase recognition result, an operation progress time, a phase progress time, an instrument type result, and an instrument head vicinity image feature that are output by the model in the past t seconds are fused and input to the instrument prediction model, and a list of the possibility of use of various instruments in each time m periods in the future is predicted.

As shown in fig. 7, in the surgical procedure, based on the staged Houglass Network and CNN + LSTM keypoint detection model, the surgical stage identification model, and the image feature extraction model, continuous endoscopic surgical image frames within t seconds from the current surgical time are obtained by the endoscopic surgical video sampling device. In the process of continuously acquiring the endoscopic surgery field images in the surgery process, the model analyzes and extracts the instrument types appearing in the current image sequence, the scene characteristics around the instruments (such as the color tones around the surgical instruments and the outlines of tissues and organs) and the current surgery types and the corresponding surgery stages respectively based on corresponding algorithms. Meanwhile, a timer arranged in the endoscopic surgery video device also synchronously times the duration time of the surgery, and the duration time of the current surgery stage is timed by combining the label identification result of the surgery stage identification model. Therefore, in the operation, according to the above process, the surgical instrument and the image information nearby the surgical instrument, the current surgical stage and the operation time of the surgical instrument, and the operation time of the surgical instrument are obtained according to the picture sequence content and the duration of the endoscopic surgical lens.

The prediction of the contents of the instruments to be used, such as an ultrasonic knife and titanium clips, clip appliers, absorbable clips and the like which can be used subsequently, and the predicted use time, sequence and the like of the instruments is completed through a long-time memory network. In the operation process, the model receives the stage identification result, the operation proceeding time, the stage proceeding time, the instrument type result, the image characteristic near the head of the instrument and other contents output in the past t seconds, integrates the stage identification result, the operation proceeding time, the stage proceeding time, the instrument type result, the image characteristic near the head of the instrument and other contents obtained by the identification of each model in the past t seconds according to the formula and the principle, and predicts a use possibility list of various instruments in each period of time in the future m periods.

Another embodiment of the present invention relates to a semi-automatic instrument delivery assistance system for laparoscopic pancreaticoduodenectomy, characterized in that:

the system consists of an image sequence acquisition module, a surgical instrument detection module, a surgical stage detection module, a surgical instrument surrounding scene classification module and a subsequent instrument prediction module;

an image sequence acquisition module configured to acquire a laparoscopic pancreaticoduodenectomy video stream resulting in a sequence of consecutive video pictures over a past period of time during an operation;

the surgical instrument detection module is configured to utilize a staged Houglass Network to extract features, obtain a feature map as the input of a subsequent detection Network, finally obtain the type of an instrument in a corresponding image sequence and the position of the tip of the instrument, and provide information support for the operation of a subsequent instrument prediction module;

the operation stage detection module is configured to utilize the CNN + LSTM network to extract features, obtain a feature map as the input of a subsequent detection network, finally obtain an operation stage corresponding to the picture sequence and provide information support for the operation of a subsequent instrument prediction module;

the surgical instrument surrounding scene classification module is configured to perform scene classification by using unsupervised learning, finally obtain a corresponding scene in a corresponding picture sequence and provide information support for the operation of a subsequent instrument prediction module;

and the subsequent instrument prediction module is configured to predict the subsequent appearing content by utilizing the LSTM network in combination with the image sequence and a plurality of information in the past time period, and the subsequent appearing content prediction module receives the output results of the surgical instrument detection module, the surgical stage detection module and the surgical instrument surrounding scene classification module in the past time period and outputs the subsequent appearing instrument type, the appearing time and the subsequent appearing sequence in combination with the time statistical content in the synchronous timer.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for dynamic prediction of surgical instrument use, comprising: the dynamic prediction method comprises the following steps:

s3, inputting the fused surgical elements into an instrument prediction model, obtaining the use possibility sequence of the instruments in a future period of time, and transmitting the instruments by nurses according to the use possibility sequence of the instruments; the fusion operation element comprises the combination of operation duration and stage duration, fusion instrument category information, operation stage information and feature vector information, and input of an instrument prediction model to obtain the probability of each instrument in each period in the continuing process; the method specifically comprises the following steps:

inputting the characteristic vector of the surgical element into an instrument prediction model to obtain a ranked list of possible use probabilities of the instrument within m periods of time in the future;

2. The method of claim 1, wherein the dynamic prediction method comprises: the dynamic prediction method further comprises a model construction step; the model building step is executed before step S1 when the dynamic prediction method is implemented for the first time, and the model building step is not executed in the later operation, specifically including:

3. A dynamic prediction system for use with a surgical instrument, comprising: the device comprises a first data acquisition module, a second data acquisition module, an instrument sequencing module and an iteration module;

the instrument sequencing module is used for fusing surgical elements and inputting the surgical elements into the instrument prediction model to obtain instrument use possibility sequencing in a future period of time, and nurses transmit instruments according to the instrument use possibility sequencing; the fusion operation element comprises the combination of operation duration and stage duration, fusion instrument category information, operation stage information and feature vector information, and input of an instrument prediction model to obtain the probability of each instrument in each period in the continuing process; the method specifically comprises the following steps:

4. A dynamic prediction system for use with a surgical instrument as claimed in claim 3, wherein: the model building module is used for uniformly transcoding the surgical video through transcoding software and extracting surgical pictures at equal time intervals; marking operation stages and scenes in an operation video, marking key points of the head of the surgical instrument on an operation picture, and constructing a corresponding database; and constructing an operation stage identification model, an instrument head key point detection model, an image feature extraction model and an instrument prediction model through the labeled operation stage data and the instrument head key point data.

5. The dynamic prediction system for use with a surgical instrument of claim 4, wherein: the instrument head key point detection model comprises a first down-sampling, a plurality of hourglass structures and a classifier; the first down-sampling comprises a 2D convolution kernel, batch data normalization and a nonlinear activation function, and is used for fusing and compressing original image data to obtain a feature map;

6. The dynamic prediction system for use with a surgical instrument of claim 4, wherein: the operation stage identification model comprises a combination of a residual error neural network and a long-time and short-time memory network, wherein the residual error neural network is used for extracting semantic information of a single image and continuously calculating image characteristics by using tensor calculation and nonlinear transformation; the long-time and short-time memory network is used for calculating and obtaining stage information of a period of continuous operation through characteristics of a plurality of continuous images so as to identify the stage of the operation in the past period.

7. The dynamic prediction system for use with a surgical instrument of claim 4, wherein: the image feature extraction model comprises a second down-sampling and an up-sampling; and extracting image semantic features through the second down-sampling, restoring the original image according to semantic information during up-sampling, and finally taking the final output result of the second down-sampling as an image feature extraction result.