WO2020101108A1 - Artificial-intelligence model platform and method for operating artificial-intelligence model platform - Google Patents

Artificial-intelligence model platform and method for operating artificial-intelligence model platform Download PDF

Info

Publication number
WO2020101108A1
WO2020101108A1 PCT/KR2018/015476 KR2018015476W WO2020101108A1 WO 2020101108 A1 WO2020101108 A1 WO 2020101108A1 KR 2018015476 W KR2018015476 W KR 2018015476W WO 2020101108 A1 WO2020101108 A1 WO 2020101108A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature information
performance
normalization
model
combination
Prior art date
Application number
PCT/KR2018/015476
Other languages
French (fr)
Korean (ko)
Inventor
송중석
권태웅
최상수
최윤수
이윤수
박진학
신익수
이혁로
박학수
박진형
Original Assignee
한국과학기술정보연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국과학기술정보연구원 filed Critical 한국과학기술정보연구원
Publication of WO2020101108A1 publication Critical patent/WO2020101108A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • G06F11/263Generation of test inputs, e.g. test vectors, patterns or sequences ; with adaptation of the tested hardware for testability with external testers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present invention relates to a technology for generating artificial intelligence models for security control.
  • the Science and Technology Cyber Safety Center provides real-time security control services for public research institutes based on the TMS.
  • the real-time security control service is provided as a service structure that provides analysis and response support by security control personnel based on security events detected and collected by the intrusion threat management system (TMS).
  • TMS intrusion threat management system
  • an artificial intelligence model platform capable of generating an artificial intelligence model for security control.
  • the present invention is intended to provide an artificial intelligence model platform that enables an ordinary user who is not familiar with security control technology to generate an optimal artificial intelligence model for security control.
  • the object to be reached in the present invention is to provide a method (technology) for implementing an AI model platform that enables generation of an AI model for security control.
  • An artificial intelligence model platform includes: a data collection module for collecting security events to be used as learning / test data by specific search conditions from source security data; A feature extraction module that extracts preset feature information for the collected security event; A normalization module that performs preset normalization on the extracted feature information of the security event; A data output module that extracts learning data or test data from the security event where the specific information normalization is completed according to a given condition; And a model generation module that applies an artificial intelligence algorithm to the learning data to generate an artificial intelligence model for security control.
  • test data it may further include a performance management module for testing the accuracy of the artificial intelligence model.
  • a UI module that provides a user interface (UI) for setting at least one of specific search conditions of the data collection module, feature information of the feature extraction module, normalization method of the normalization module, and conditions of the data output module. It may further include.
  • UI user interface
  • the data collection module sequentially stores the number of collections exceeding the maximum number of collections in a queue and sequentially proceeds.
  • the security event can be collected only for data prior to the occurrence point of the collection case in the source security data.
  • the feature extraction module may recommend a change to the feature information to increase the accuracy of the AI model based on the result of the accuracy test of the performance management module.
  • the normalization module may recommend changing the normalization method for the normalization to increase the accuracy of the artificial intelligence model.
  • the feature information recommendation apparatus is a model performance confirmation unit that checks model performance with respect to an AI model generated based on learning predetermined feature information among all feature information that can be set when generating an AI model. ;
  • a combination performance checking unit configured to set a plurality of feature information combinations from the entire feature information to check the performance of the artificial intelligence model generated based on learning for each of the plurality of feature information combinations;
  • a recommendation unit recommending a specific feature information combination having a higher performance than the model performance confirmed by the model performance checking unit among performances of the plurality of feature information combinations.
  • the combination of the plurality of feature information is a combination in which at least one of the remaining specific information excluding the preset feature information from the entire feature information is sequentially added to the preset feature information, and the specific feature information combination is: Among the plurality of feature information combinations, it may be the top N having higher performance than the model performance.
  • the predetermined specific information is the entire feature information
  • the combination performance checking unit is the maximum performance among the performances of the artificial intelligence model generated based on learning for each single feature information in the whole feature information.
  • Single feature information performance comparison process to check if the maximum performance is higher than the model performance, the single feature information of the maximum performance is reset to the feature information, and the feature information is preset to the feature information from the whole feature information.
  • An apparatus for recommending a normalization method includes: an attribute confirmation unit that checks an attribute of feature information used for learning when generating an artificial intelligence model; Determining unit for determining a normalization method according to the attribute of the feature information, from among all the settable normalization method; And a recommendation unit recommending the determined normalization method.
  • the determination unit determines a first normalization method according to the whole number pattern of the feature information, and the feature
  • a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of feature information in a vector defined as the total number of categories of the feature information, and the feature information
  • the attribute of is a combination of a number and a category
  • the second normalization scheme and the first normalization scheme may be determined.
  • the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority
  • the determining unit includes standard deviations for all numeric patterns of feature information and Based on whether there is an upper / lower limit of the normalization scaling range, a normalization scheme having the highest priority applicable among the first normalization schemes may be determined.
  • the determining unit has the highest priority applicable among the normalization method and the normalizing method of the feature scaling for the attribute attribute type field in the feature information.
  • a high normalization method is determined, and for the field of the attribute whose number of attributes is attribute in the feature information, a normalization method having the highest priority applicable among the mean normalization normalization method and the feature scaling normalization method is determined, and the attribute attribute is the ratio attribute in the feature information.
  • the normalization method is not determined and excluded from the normalization target for the field of, or the standard score normalization method is determined, and whether the attribute exists in the feature information, the normalization scheme is not determined and excluded from the normalization target. have.
  • An artificial intelligence model platform operating method includes: a data collection step of collecting security events to be used as learning / test data according to specific search conditions from source security data; A feature extraction step of extracting predetermined feature information for the collected security event; A normalization step of performing preset normalization on the extracted feature information of the security event; A data output step of extracting training data or test data according to a given condition from the security event in which the specific information normalization is completed; And a model generation step of applying an artificial intelligence algorithm to the learning data to generate an artificial intelligence model for security control.
  • a performance management step of testing the accuracy of the artificial intelligence model using the test data may be further included.
  • a user interface for setting at least one of a specific search condition of the data collection step, feature information of the feature extraction module, normalization method of the normalization module, and condition of the data output module. It may further include.
  • the number of collection cases exceeding the maximum number of collection cases is stored in a queue and sequentially performed.
  • the security event can be collected only for data prior to the occurrence point of the collection case in the source security data.
  • it may further include the step of recommending a change to the feature information to increase the accuracy of the artificial intelligence model.
  • the normalization step it is possible to recommend changing the normalization method for the normalization to increase the accuracy of the artificial intelligence model.
  • a computer program is a model that checks model performance with respect to an artificial intelligence model generated based on learning preset feature information among all feature information that can be set when generating an artificial intelligence model in combination with hardware.
  • Performance check step A combination performance checking step of setting a combination of a plurality of feature information from the whole feature information, and confirming the performance of the artificial intelligence model generated based on learning for each of the plurality of feature information combinations; And a performance of recommending a specific feature information combination having a higher performance than the model performance confirmed by the model performance checking unit among performances of the plurality of feature information combinations.
  • the combination of the plurality of feature information is a combination in which at least one of the remaining specific information excluding the preset feature information from the entire feature information is sequentially added to the preset feature information, and the specific feature information combination is: Among the plurality of feature information combinations, it may be the top N having higher performance than the model performance.
  • the predetermined specific information is the entire feature information
  • the combination performance checking step is the maximum performance among the performances of the artificial intelligence model generated based on learning for each single feature information in the whole feature information.
  • Single feature information performance comparison process to check whether it is higher, if the maximum performance is higher than the model performance, the single feature information of the maximum performance is reset to the feature information, and the predetermined feature is set in the whole feature information in the feature information.
  • the combination setting process of setting the combination of the plurality of feature information by sequentially adding one by one of the specific information except the information, each of the feature information combinations having higher performance than the model performance of the re-set feature information among the plurality of feature information combinations Resetting as the feature information, so that the combination setting process is repeatedly performed for each re-set feature information, there is no feature information combination having a higher performance than the model performance among the multiple feature information combinations.
  • a process of delivering the previous feature information as the specific feature information combination to the recommender may be performed.
  • the predetermined specific information is the entire feature information
  • the combination performance checking step is the maximum performance among the performances of the artificial intelligence model generated based on learning for each single feature information in the whole feature information.
  • Single feature information performance comparison process to check if it is higher, if the maximum performance is not higher than the model performance
  • combination setting process to set the combination of the plurality of feature information excluding one specific information from the feature information, the plurality of Among the feature information combinations, each of the feature information combinations having a performance higher than the model performance is reset as feature information, and a reset process is performed so that the combination setting process is repeatedly performed for each re-set feature information. If there is no feature information combination having a higher performance than the model performance, a process of delivering the immediately preceding feature information as the specific feature information combination to the recommendation unit may be performed.
  • a computer program comprises: an attribute checking step in combination with hardware to check the attribute of feature information used for learning when creating an artificial intelligence model; A determining step of determining a normalization method according to the attribute of the feature information from among all the settable normalization methods; And a recommendation step of recommending the determined normalization method.
  • the determining step if the same normalization method is applied to all the feature information fields, if the attribute of the feature information is a numeric attribute, the first normalization method according to the whole number pattern of the feature information is determined, and the When the attribute of the feature information is a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of the feature information in a vector defined as the total number of categories of the feature information, and the feature When the attribute of the information is a combination attribute of a number and a category, the second normalization method and the first normalization method may be determined.
  • the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority
  • the determining step includes standard deviation of the entire numeric pattern of feature information.
  • the attribute has the highest priority applicable to the normalization method of the feature normalization method and the feature scaling normalization method for the attribute type attribute field in the feature information.
  • Determines a normalization method having a high value determines a normalization method having the highest priority applicable among a means normalization method and a feature scaling normalization method for the field of the attribute number attribute in the feature information, and the attribute ratio in the feature information
  • the normalization method is not determined and excluded from the normalization target, or the standard score normalization method is determined, and whether the attribute is present in the feature information or not. Can be.
  • an artificial intelligence model platform capable of generating an artificial intelligence model for security control is implemented, in particular, a feature directly related to the performance of the artificial intelligence model.
  • the optimal artificial intelligence model suitable for the purpose and requirements for security control can be flexibly and variously generated and applied, the quality improvement of the security control service can be maximized, and large scale It can be expected to have the effect of supporting the construction of an AI-based infringement response system to efficiently analyze the signs of cyber attacks and anomalies.
  • FIG. 1 is a conceptual diagram showing an AI model platform according to an embodiment of the present invention.
  • FIG. 2 is a configuration diagram showing the configuration of the AI model platform according to an embodiment of the present invention.
  • FIG. 3 is a block diagram showing the configuration of a feature information recommendation device according to an embodiment of the present invention.
  • FIG. 4 is a configuration diagram showing the configuration of a normalization method recommendation apparatus according to an embodiment of the present invention.
  • FIG. 5 is a flowchart illustrating an artificial intelligence model platform operating method according to an embodiment of the present invention.
  • FIG. 6 is a flowchart illustrating a method of operating a feature information recommendation device according to an embodiment of the present invention.
  • FIG. 7 is a flowchart illustrating an operation method of a normalization method recommendation apparatus according to an embodiment of the present invention.
  • the real-time security control service provided by the Science and Technology Cyber Safety Center is based on security events detected and collected by the intrusion threat management system (TMS), and provides rule-based analysis and response support by security control personnel. It has a service structure that is made.
  • TMS intrusion threat management system
  • an artificial intelligence model platform capable of generating an artificial intelligence model for security control.
  • the present invention is intended to provide an artificial intelligence model platform that enables an ordinary user who is not familiar with security control technology to generate an optimal artificial intelligence model for security control.
  • the AI model platform of the present invention is based on various data collected and processed in a collection function and a collection function for collecting and processing various data necessary for generating an AI model for security control.
  • the artificial intelligence function that creates an intelligent model and manages the performance and history associated with it, and the management responsible for various settings and user management related to the collection / artificial intelligence function based on the user interface (UI) provided to system administrators and general users. It can be divided into functions.
  • the artificial intelligence model platform of the present invention includes a search engine that periodically collects the newly generated source security data from the big data integrated storage storage, and loads various data from the collection function into the search engine to search the data. Can be used as storage.
  • various modules belonging to the collection function may operate based on a search engine (data storage).
  • the artificial intelligence model platform 100 of the present invention includes a data collection module 110, a feature extraction module 120, a normalization module 130, a data output module 140, and a model generation module 150.
  • the AI model platform 100 of the present invention may further include a performance management module 160 and a UI module 170.
  • All or at least part of the configuration of the AI model platform 100 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.
  • the software module may be understood as, for example, instructions executed by a processor that controls operations within the AI model platform 100, and these instructions are in a form mounted in a memory in the AI model platform 100. Will have.
  • the artificial intelligence model platform 100 realizes the technology proposed in the present invention through the above-described configuration, that is, the technology capable of generating an optimal artificial intelligence model for security control. , Hereinafter, each configuration in the AI model platform 100 for realizing this will be described in more detail.
  • the UI module 170 has at least one of a specific search condition of the data collection module 110, feature information of the feature extraction module 120, normalization method of the normalization module 130, and condition of the data output module 140. Provides a UI (User Interface) for setting.
  • UI User Interface
  • the UI module 170 according to the operation of a system administrator or a general user (hereinafter referred to as a user) to create an AI model for security control in the AI model platform 100 of the present invention, data Provides a UI for setting at least one of a specific search condition of the collection module 110, feature information of the feature extraction module 120, a normalization method of the normalization module 130, and a condition of the data output module 140.
  • the UI module 170 based on the provided UI, various settings related to the collection / artificial intelligence function, specifically the specific search condition of the data collection module 110 for the artificial intelligence model to be generated later, feature extraction module
  • the feature information of 120, the normalization method of the normalization module 130, the conditions of the data output module 140, etc. are stored / managed in the user information / setting information storage.
  • the data collection module 110 collects security events to be used as learning / testing data based on specific search conditions, that is, predetermined search conditions previously set by the user, from the source security data.
  • a date (or period) to be used as learning / test data number of cases, IP, detection pattern name, detection pattern type, and the like may be set.
  • the detection pattern name means a representative name of security logs detected by the intrusion threat management system (TMS)
  • the detection pattern type means a group of detection patterns having similar detection pattern characteristics (property, type).
  • the detection pattern type can be divided into six types: worm virus damage, data corruption and leakage, waypoint abuse, homepage alteration, service rejection attack damage, and simple intrusion attempt.
  • the data collection module 110 may collect security events belonging to a set date (or period) from the original security data.
  • the data collection module 110 may collect the security events of the number (for example, 500,000) set at the specified time from the original security data.
  • the data collection module 110 may collect a security event in which the IP set from the source security data matches the source IP or destination IP.
  • a combination of date (or period), number of cases, IP, detection pattern name, and detection pattern type may be set.
  • the data collection module 110 may collect security events according to a combination of date (or period), number, IP, detection pattern name, detection pattern type, etc. set from the source security data.
  • the data collection module 110 in collecting security events from the original security data, as described above, may limit the maximum number of simultaneous executions to reduce the load on the system.
  • the total number of security event collection cases belonging to the set date (or period) is 1000,000, and the maximum number of concurrent collections It can be assumed that 500,000 cases.
  • the data collection module 110 determines that the total number of collections this time exceeds the maximum number of collections that can be performed simultaneously, and stores the collections that exceed the maximum number of collections in the queue in a queue After that, you can proceed sequentially.
  • the data collection module 110 collects / progresses the maximum number of collections of 500,000 according to the time sequence among the total number of collections of 1000,000, but queues for 500,000 collections exceeding the maximum number of collections of 500,000 After storing in (queue), it can be collected / progressed sequentially.
  • the data collection module 110 collects security events only for the data prior to the occurrence of the collection case from the source security data in the case of 500,000 collection cases that proceed after being stored in the queue.
  • the artificial intelligence model platform 100 of the present invention was previously mentioned that it includes a search engine that periodically collects the newly generated source security data from the big data integrated storage storage.
  • the data collection module 110 may collect security data from source security data in a search engine (data store).
  • the big data integrated storage storage is a storage utilized not only in the AI model platform 100 of the present invention, but also in other systems, when a large amount of data (security events) is collected from the big data integrated storage storage, the big data integrated storage storage Loads can also affect other systems.
  • the data collection module 110 does not collect security events directly from the big data integrated storage storage, but periodically only the source security data newly generated from the big data integrated storage storage. Since security events are collected based on the collected search engine, it is possible to avoid the big data integrated storage storage load problem described above.
  • the feature extraction module 120 extracts pre-set feature information for the security event collected by the data collection module 110, that is, pre-set feature information by the user.
  • the feature extraction module 120 is responsible for performing a feature information extraction process for security events collected by the data collection module 110.
  • the feature information of each security event extracted by the feature extraction module 120 will be used for machine learning (eg, deep learning) when creating an artificial intelligence model described later.
  • the user can set a single feature as feature information and set a composite feature.
  • the single feature means features that can be extracted from one security event.
  • detection time For example, detection time, source IP, source port, destination IP, destination port, protocol, security event name, security event type, number of attacks, attack direction, packet size, automatic analysis result, dynamic analysis result, organization number, jumbo Whether it is a payload, a payload using a word2vec conversion method, or the like may belong to a single feature.
  • the payload conversion method using Word2Vec is a method of converting a word into a vector, and is a method of determining a vector of a corresponding word through a relationship between adjacent words.
  • words can be distinguished on a space-by-space basis, but payload is very difficult to distinguish in semantic units and contains a lot of special characters, so pre-processing is required to apply word2vec.
  • the composite feature means a feature that can be extracted by using aggregate and statistical techniques between various security events.
  • a security event group is formed based on a period or the number of cases, and one feature (eg, a result of an operation) that can be extracted through intra-group operations (eg, aggregation, statistical technique, etc.) is a complex feature. Can belong to.
  • one feature eg, a result of an operation
  • intra-group operations eg, aggregation, statistical technique, etc.
  • a security event group as shown in Table 1 below is formed based on a period (8.22 to 9.3).
  • the feature extraction module 120 may extract pre-set feature information (single feature and / or composite feature) with respect to the security event collected by the data collection module 110.
  • the normalization module 130 performs predetermined normalization on the extracted feature information of the security event.
  • Normalization refers to the process of consistently matching the range of values of the extracted features. If field A has a range of 50 to 100 and field B has a range of 0 to 100, the meaning is different because even the same 50 is a value measured by different scales. Therefore, it is necessary to adjust the values of different fields to a common scale to have a certain meaning and this is called normalization.
  • the normalization module 130 performs normalization on the extracted feature information of the security event to adjust the values of different fields to a common scale according to a preset normalization method to have a certain meaning.
  • the preset normalization scheme means a normalization scheme preset by the user.
  • the following three normalization methods are provided to allow a user to pre-set.
  • Equation 1 means Feature scaling [a, b] normalization
  • Equation 2 means Mean normalization [-1,1] normalization
  • Equation 3 means Standard score normalization.
  • the normalization module 130 performs normalization on the extracted feature information of the security event according to the normalization method preset by the user among the three normalization methods described above.
  • the data output module 140 extracts training data or test data from a security event in which the normalization of specific information is completed, based on a given condition, that is, a preset (given) condition by the user.
  • the data output module 140 outputs the security event for which the specific information is normalized, to a screen or a file according to a user's desired value, order, format, learning / test data ratio, and file division method.
  • the output training data or test data is managed through database or file storage for each date and user so that they can be used immediately when creating an artificial intelligence model.
  • the model generation module 150 applies an artificial intelligence algorithm to learning data managed in the output / file storage in the data output module 140 to generate an artificial intelligence model for security control.
  • the model generation module 150 may apply an artificial intelligence algorithm to the learning data, and generate an artificial intelligence model for security control, for example, an artificial intelligence model of a function required by a user.
  • the model generation module 150 may generate an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user request, and an artificial intelligence classification model for classifying spying / falsification of a security event. You can also create an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user request, and an artificial intelligence classification model for classifying spying / falsification of a security event. You can also create an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user request, and an artificial intelligence classification model for classifying spying / falsification of a security event. You can also create an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user request, and an artificial intelligence classification model for classifying spying / falsification of a security event. You can also create an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user request, and an artificial intelligence classification model for classifying spying / falsification of a security event. You
  • the model generation module 150 based on the learning data managed in the output / file storage in the data output module 140, to an artificial intelligence algorithm, such as a machine learning (eg, Deep Learning) algorithm previously selected by the user. Accordingly, an artificial intelligence model for security control can be generated.
  • an artificial intelligence algorithm such as a machine learning (eg, Deep Learning) algorithm previously selected by the user.
  • the model generation module 150 uses a learning function (Loss function) representing a deviation between a predicted result and an actual result through a model in a machine learning technique based on Backward Propagation calculation. Accordingly, an artificial intelligence model in which the deviation of the loss function is zero based on the learning data can be generated.
  • a learning function Loss function
  • the artificial intelligence model platform 100 of the present invention by providing a platform environment that enables to create an artificial intelligence model for security control based on the UI without any programming, the security control technology Even unfamiliar general users can create artificial intelligence models suitable for their purposes and requirements for security control.
  • the performance management module 160 utilizes test data managed in the output / file storage in the data output module 140, of the generated artificial intelligence model. Test accuracy.
  • the performance management module 160 is for managing the artificial intelligence model generated by the model generation module 150, 'who' 'when' 'some data' 'some field' 'some sampling method' 'some normalization method 'Records and manages performance information on the system (file storage), such as' what model' the artificial intelligence model was created for, and how much performance (correct answer rate) the created artificial intelligence model has.
  • the performance management module 160 can compare conditions and performance for model generation at a glance based on such performance information management, so that it is easy to grasp the correlation between conditions and performance.
  • the accuracy (performance) test of the artificial intelligence model generated in the platform environment of the present invention is provided by providing a platform environment that allows an ordinary user who is not familiar with security control technology to generate an artificial intelligence model. It may be necessary.
  • the performance management module 160 utilizes test data (security events that know the actual result of detection and detection of malicious or false positives) managed in the output / file storage in the data output module 140, Test the accuracy of the AI model created above.
  • the performance management module 160 uses the test data to test the artificial intelligence model generated above, and the accuracy of the model (performance) ) That is, it can be output as a test result.
  • the feature extraction module 120 recommends a change to the feature information (Feature) to increase the accuracy of the above-described artificial intelligence model, based on the accuracy test result of the performance management module 160 Can be.
  • the normalization module 130 may recommend changing the normalization method for normalization to increase the accuracy of the artificial intelligence model.
  • FIG. 3 is a block diagram of a feature information recommendation apparatus according to an embodiment of the present invention.
  • the feature information recommendation device 200 of the present invention includes a model performance confirmation unit 210, a combination performance confirmation unit 220, and a recommendation unit 230.
  • All or at least a part of the configuration of the feature information recommendation device 200 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.
  • the software module may be understood as, for example, an instruction executed by a processor that controls an operation within the feature information recommendation apparatus 200, and these instructions may include a form mounted in the memory in the feature information recommendation apparatus 200. Will have.
  • the feature information recommendation apparatus 200 After all, the feature information recommendation apparatus 200 according to an embodiment of the present invention, through the above-described configuration, the technology proposed in the present invention, that is, the technology for recommending feature information (Feature) change to increase the accuracy of the artificial intelligence model
  • the technology proposed in the present invention that is, the technology for recommending feature information (Feature) change to increase the accuracy of the artificial intelligence model
  • the model performance checking unit 210 checks model performance with respect to the artificial intelligence model generated based on learning the preset feature information among all the feature information that can be set when the artificial intelligence model is generated.
  • the model performance checking unit 210 checks the performance (accuracy) of the artificial intelligence model generated based on learning feature information set by the user.
  • the artificial intelligence model platform learning / generated feature information (hereinafter, user set feature information) set by the user in the AI model platform 100 of the present invention.
  • the model performance checking unit 210 checks the model performance with respect to the artificial intelligence model generated by learning the user set feature information in the artificial intelligence model platform 100 as described above.
  • model performance checking unit 210 with respect to the artificial intelligence model, test data output from the artificial intelligence model platform 100 (especially the data output module 140) of the present invention (sense / false classification and malignantness) Model performance (accuracy) can be tested / verified by utilizing the security event (which knows the actual result of detection).
  • the model performance checking unit 210 targets the artificial intelligence model generated in the artificial intelligence model platform 100 (especially the data output module 140) of the present invention, and tests the artificial intelligence model using test data,
  • the matching ratio between the predicted result value and the known actual result value can be output as the model's accuracy (performance), that is, the test result.
  • the combination performance checking unit 220 sets a plurality of feature information combinations from the whole feature information, and checks the performance of the artificial intelligence model generated based on learning for each of the multiple feature information combinations.
  • the combination performance checking unit 220 sets a variety of feature information combinations by setting various feature information combinations in addition to user-set feature information learned at the time of creation of the AI model, from all feature information that can be set when the AI model is generated. You can check the performance of the artificial intelligence model generated based on learning.
  • the recommendation unit 230 among a plurality of feature information combination-specific performances confirmed by the combination performance confirmation unit 220, model performances confirmed by the model performance confirmation unit 210, that is, performances of the artificial intelligence model generated based on the user setting this time Higher performance specific feature information combinations can be recommended.
  • the combination of the plurality of feature information set by the combination performance checking unit 220 is specified in the user set feature information learned when the artificial intelligence model is generated, except for the user set feature information in the whole feature information. It may be a combination of at least one piece of information sequentially added.
  • the combination performance checking unit 220 includes user-set feature information (a, b, c, d, e) out of all feature information (n) in user-set feature information (a, b, c, d, e, f).
  • a plurality of feature information combinations may be set by sequentially adding at least one of the specific information other than, f).
  • the combination performance checking unit 220, the user-set feature information (a, b, c, d, e, f) set by the user, the user-set feature information (a, out of all the feature information n) Among the remaining specific information except b, c, d, e, f), 1 ⁇ (nk) feature information can be sequentially added to set a plurality of feature information combinations as follows.
  • the combination performance checking unit 220 performs the performance of the artificial intelligence model generated based on learning for each of a plurality of feature information combinations as described above, 82%, 80%, ... 88%, ... 85%. Can be confirmed.
  • top N are the number that can be specified / changed by the system administrator or user.
  • the combination performance checking unit 220 includes user-set feature information (a) among all feature information (n) in the user-set feature information (a, b, c, d, e, f) set by the user.
  • a plurality of characteristic information combinations can be set as follows, by sequentially adding the remaining specific information one by one except for (b, c, d, e, f).
  • the combination performance checking unit 220 may check the performance, 82%, 80%, ... 90% of the artificial intelligence model generated based on learning for each combination of feature information as described above.
  • top N are the number that can be specified / changed by the system administrator or user.
  • the performance of a single feature information comparison process may be performed to check the performance of the artificial intelligence model and to determine whether the maximum performance (Max (m 1 )) of the performance of each single feature information is higher than the model performance (m 26 ).
  • Max (m 1 )) is high
  • the maximum performance single feature information (c) is reset to feature information, and the remaining specific information except feature information (c) from the whole feature information (n) in the feature information (c)
  • By adding one by one it is possible to perform a combination setting process of setting a plurality of combinations of feature information.
  • the combination performance checking unit 220 may check the performance for each combination of the plurality of feature information as described above.
  • the combination performance checking unit 220 resets and resets each combination of feature information having a performance higher than the model performance (m 1 ) of the feature information (c) that is reset among a plurality of feature information combinations, and resets the feature information. For each feature information, a reset process may be performed so that the combination setting process is repeatedly performed.
  • the combination performance checking unit 220 deletes feature information combinations having a performance equal to or lower than the model performance (m 1 ) of the feature information (c) among a plurality of feature information combinations, and performs model performance of the feature information (c).
  • m 1 Only the combination of feature information with higher performance is left as follows, and each of them is reset to feature information to reset the combination setting process repeatedly for each feature information reset as shown in Table 2 below. You can carry out the process.
  • the combination performance checking unit 220 repeats the above-described combination setting process and resetting process, and among a plurality of feature information combinations, there is a feature information combination having a higher performance than the artificial intelligence model generated based on the previous feature information. If not, the process of selecting the previous feature information as a specific feature information combination and passing it to the recommender 230 is performed.
  • the recommendation unit 230 has higher performance than the performance of the artificial intelligence model generated by using the preset feature information from the feature information transmitted from the combination performance checking unit 220 among performances of a plurality of feature information combinations. It can be recommended as a combination of specific feature information.
  • the combination performance checking unit 220 may check the performance for each combination of the plurality of feature information as described above.
  • the combination performance checking unit 220 resets each combination of feature information having higher performance than the model performance (m 26 ) among a plurality of feature information combinations as feature information, and the combination setting process is performed for each re-set feature information.
  • a reset process may be performed to be repeatedly performed.
  • the combination performance checking unit 220 deletes feature information combinations having a performance lower than or equal to model performance (m 26 ) among a plurality of feature information combinations, and combinations of feature information having higher performance than model performance (m 26 ). It is possible to perform a reset process in which the combination setting process is repeatedly performed for each of the characteristic information that is reset by resetting each of them as characteristic information, leaving only the following as follows.
  • the combination performance checking unit 220 repeats the above-described combination setting process and resetting process, and among a plurality of feature information combinations, there is a feature information combination having a higher performance than the artificial intelligence model generated based on the previous feature information. If not, the process of selecting the previous feature information as a specific feature information combination and passing it to the recommender 230 is performed.
  • the recommendation unit 230 has higher performance than the performance of the artificial intelligence model generated by using the preset feature information from the feature information transmitted from the combination performance checking unit 220 among performances of a plurality of feature information combinations. It can be recommended as a combination of specific feature information.
  • the artificial intelligence model platform 100 recommending an optimal feature having optimal performance (accuracy) to a user generating an artificial intelligence model for security control based on UI. / By making it applicable, even an average user who is not familiar with security control technology can create an optimal AI model for security control.
  • FIG. 4 illustrates a configuration of a normalization method recommendation apparatus according to an embodiment of the present invention.
  • the normalization method recommendation apparatus 300 of the present invention includes an attribute confirmation unit 310, a determination unit 320, and a recommendation unit 330.
  • the whole or at least part of the configuration of the normalization method recommendation device 300 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.
  • the software module may be understood as, for example, an instruction executed by a processor that controls an operation within the normalization method recommendation apparatus 300, and these instructions may include a form mounted in the memory in the normalization method recommendation apparatus 300. Will have.
  • the normalization method recommendation apparatus 300 realizes the technique proposed in the present invention, that is, the technique of recommending the normalization method change to increase the accuracy of the artificial intelligence model through the above-described configuration,
  • each configuration in the normalization method recommendation apparatus 300 for realizing this will be described in more detail.
  • the attribute checking unit 310 checks the attribute of feature information used for learning when the artificial intelligence model is generated.
  • the feature information used for learning when the AI model is generated may be feature information that is directly set by a user based on a UI among all the feature information that can be set when the AI model is generated, or a specific feature that is recommended among all feature information.
  • the feature information combination may be feature information applied / set.
  • the attribute of the characteristic information can be largely divided into a number attribute and a category attribute.
  • the attribute checking unit 310 may check whether the attribute of the feature information (direct setting or recommendation application) used for learning when the artificial intelligence model is generated is a numeric attribute, a category attribute, or a number and category combination attribute. .
  • the determination unit 320 determines a normalization method according to the attribute of the feature information checked by the attribute confirmation unit 310 among all the settable normalization methods.
  • the determination unit 320 determines whether the same normalization method is applied to all the feature information fields or the normalization method for each field in the whole feature information field. It can be distinguished first whether it is applied.
  • the determination unit 320 may classify that the same normalization method is applied to the entire feature information field.
  • the determining unit 320 determines the first normalization method according to the entire numeric pattern of the feature information when the feature information attribute is a numeric attribute, and when the feature information attribute is a category attribute, the feature information is the whole of the feature information. If a second normalization method for expressing as a non-zero characteristic value is determined only at a location designated for each category of feature information in a vector defined by the number of categories, and if the attribute of the feature information is a number and category combination attribute, the second The normalization scheme and the first normalization scheme can be determined.
  • the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority (see Equations 1, 2, and 3).
  • the determination unit 320 classifies the attribute of the feature information as a numeric attribute when only numeric data exists in the entire feature information field, and determines the first normalization method according to the whole numeric pattern of the feature information.
  • the determining unit 320 determines the standard score normalization method, the mean normalization normalization method, and the feature scaling normalization method according to the priority among the first normalization methods, but the standard deviation and normalization of the entire numeric pattern of the feature information Based on the existence of the upper / lower limit of the scaling range, the normalization method having the highest priority applicable among the first normalization methods may be determined.
  • the determination unit 320 classifies the attribute of the feature information as a category attribute, and in this case, the feature information in a vector defined as the total number of categories of the feature information.
  • a second normalization method that expresses a non-zero characteristic value only at a location designated by each category may be determined.
  • the determination unit 320 has a non-zero characteristic value (eg, 1) in a location designated for each category of feature information in a vector defined as the total number of categories of the feature information. ) To determine the second normalization method _One Hot Encoding.
  • the second normalization method _One Hot Encoding briefly, assumes that feature information has a category attribute of fruit, and that apples, pears, and persimmons (expressed as a three-dimensional vector because there are three kinds of fruits) are the total number of categories.
  • each feature information having apple, pear, and persimmon as data may be expressed as follows according to the second normalization method _One Hot Encoding.
  • the determination unit 320 classifies the attribute of the feature information as a numeric and category combination attribute, and in this case, the second normalization method and the first normalization method described above. Can decide.
  • the determination unit 320 first applies the second normalization method _One Hot Encoding described above to the data of the category attribute in the feature information, and then the whole of the feature information.
  • the second normalization method and the first normalization method may be determined in order to determine the highest priority normalization method applicable among the first normalization methods based on the existence of the upper and lower limit of the standard deviation and the normalization scaling range for the numeric pattern. .
  • the feature information is a composite feature (one feature that can be extracted using aggregation and statistical techniques between multiple security events)
  • the feature information is applied to the normalization method for each field in the entire feature information field. Can be distinguished.
  • the determination unit 320 may determine a normalization method having the highest priority that can be applied among a means normalization method and a feature scaling normalization method for a field of attribute type attribute in the feature information.
  • the determination unit 320 may determine a normalization method having the highest priority that is applicable among a means normalization method and a feature scaling normalization method for a field of a number attribute whose attribute is in the feature information.
  • the determining unit 320 may determine whether to normalize the normalization method for the attribute attribute field in the attribute information and exclude it from the normalization target, or determine the standard score normalization method.
  • the determination unit 320 may determine that the normalization method is not determined and excluded from the normalization target for the field of the attribute presence or absence (for example, presence / absence of an operation result value) in the feature information.
  • the recommendation unit 330 recommends the normalization method determined by the determination unit 320.
  • the recommendation / applying the optimal normalization method with optimal performance (accuracy) to the user generating the AI model for security control based on the UI By doing so, even an average user who is not familiar with security control technology can create an optimal artificial intelligence model for security control.
  • an artificial intelligence model platform that enables the creation of an artificial intelligence model for security control is implemented, but in particular, feature information and normalization methods directly related to the performance of the artificial intelligence model are optimally recommended /
  • an artificial intelligence model platform that allows an ordinary user who is not familiar with security control technology to generate an optimal AI model for security control.
  • the optimal artificial intelligence model suitable for the purpose and requirements for security control can be flexibly and variously generated and applied, the quality improvement of the security control service can be maximized, and large scale It can be expected to have the effect of supporting the construction of an AI-based infringement response system to efficiently analyze the signs of cyber attacks and anomalies.
  • the artificial intelligence model platform 100 of the present invention periodically collects the newly generated source security data from the big data integrated storage storage (S10).
  • the artificial intelligence model platform 100 of the present invention collects / artificial intelligence functions through a UI according to the operation of a system administrator or a general user (hereinafter referred to as a user) who wants to create an artificial intelligence model for security control. It receives various related settings and stores / manages them as setting information (S20).
  • the artificial intelligence model platform 100 of the present invention collects security events to be used as learning / test data based on a specific search condition, that is, a specific search condition previously set by the user from the original security data (S30).
  • the AI model platform 100 of the present invention extracts pre-set feature information for the security event collected in step S30, that is, pre-set feature information by the user (S40).
  • the AI model platform 100 of the present invention performs normalization preset by the user on the extracted feature information of the security event (S50).
  • the above three normalization methods are provided to allow a user to pre-set.
  • the artificial intelligence model platform 100 of the present invention since the normalization scheme set by the user may not be optimal, it may recommend the optimal normalization scheme to increase the accuracy of the artificial intelligence model (S50).
  • the AI model platform 100 of the present invention extracts training data or test data from a security event in which normalization of specific information is completed, based on a given condition, that is, a predetermined (given) condition by the user (S60).
  • the artificial intelligence model platform 100 of the present invention to output a security event that has been normalized specific information, the screen or file according to the value, order, format, learning / test data ratio, file division method, etc. do.
  • the artificial intelligence model platform 100 of the present invention applies an artificial intelligence algorithm to the learning data to generate an artificial intelligence model for security control (S70).
  • the artificial intelligence model platform 100 of the present invention may apply an artificial intelligence algorithm to learning data to generate an artificial intelligence model for security control, for example, an artificial intelligence model of a function required by a user.
  • the artificial intelligence model platform 100 of the present invention may generate an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user's request, and to classify the spying / falsification of the security event. You can also create an artificial intelligence classification model.
  • the AI model platform 100 of the present invention based on the learning data managed in the output / file storage in step S60, AI algorithms, such as machine learning (eg, Deep Learning) algorithms previously selected by the user Accordingly, an artificial intelligence model for security control can be generated.
  • AI algorithms such as machine learning (eg, Deep Learning) algorithms previously selected by the user Accordingly, an artificial intelligence model for security control can be generated.
  • the artificial intelligence model platform 100 of the present invention is a learning loss function (Loss) indicating a deviation between a predicted result and an actual result through a model in a machine learning technique based on computation of backward propagation. function), it is possible to generate an artificial intelligence model in which the deviation of the loss function is zero based on the learning data.
  • Loss learning loss function
  • the artificial intelligence model platform 100 of the present invention utilizes test data (security events that know the actual result of detection and detection of malicious or false positives) managed in the output / file storage in step S60, The accuracy of the artificial intelligence model generated above is tested (S80).
  • the artificial intelligence model platform 100 of the present invention uses the test data to test the artificial intelligence model generated above, and model the matching ratio between the predicted result value and the known actual result value through the model.
  • the accuracy (performance) of ie can be output as a test result.
  • the AI model platform 100 of the present invention 'who', 'when', 'some data', 'some field', 'some sampling method', 'some normalization method' and 'some model' using the 'AI model' Performance information such as whether or not the generated artificial intelligence model has a certain performance (correct answer rate) can be recorded and managed in a system (file storage).
  • the artificial intelligence model platform 100 of the present invention based on such performance information management, can compare conditions and performance for model generation at a glance so that it is easy to grasp the correlation between conditions and performance.
  • the AI model platform 100 of the present invention may recommend a change to the feature information (Feature) to increase the accuracy of the generated AI model based on the accuracy test result of step S80 ( S90, S100).
  • the AI model platform 100 of the present invention has a combination of other feature information capable of improving the accuracy of the AI model, compared to the feature information (hereinafter, user set feature information) used for learning when the AI model is generated. If there is (S90 Yes), this is the recommended method (S100).
  • FIG. 6 is referred to as an operation method of the feature information recommendation device 200 for convenience of description. I will explain.
  • the performance (accuracy) of the artificial intelligence model generated based on the feature information learning set by the user is checked (S110).
  • the artificial intelligence model platform learning / generated feature information (hereinafter, user set feature information) set by the user in the AI model platform 100 of the present invention.
  • the operation method of the feature information recommendation apparatus 200 checks model performance with respect to the artificial intelligence model generated by learning user-set feature information in the artificial intelligence model platform 100 as described above (S110). .
  • the operation method of the feature information recommendation device 200 according to the present invention for the artificial intelligence model, the test data output from the artificial intelligence model platform (100, especially the data output module 140) of the present invention (Model performance (accuracy) can be tested / confirmed by utilizing the security event (which knows the actual result of the detection of false positives / false positives and malicious detection).
  • the operation method of the feature information recommendation device 200 is based on the artificial intelligence model generated by the artificial intelligence model platform 100 (especially the data output module 140) of the present invention, and utilizes test data.
  • the artificial intelligence model By testing the artificial intelligence model, it is possible to output the accuracy (performance) of the model, that is, the test result, as the ratio of the predicted result through the model and the known actual result.
  • the operation method of the feature information recommendation apparatus 200 sets a plurality of feature information combinations from the whole feature information, and checks the performance of the AI model generated based on learning for each combination of the feature information ( S120, S130).
  • the operation method of the feature information recommendation apparatus 200 sets a combination of various feature information in addition to user-set feature information learned at the time of creation of the AI model, from all feature information that can be set when the AI model is generated Thus, it is possible to check the performance of the artificial intelligence model generated based on learning for each combination of feature information.
  • user-set feature information e.g., a, b, c, d, e
  • the verified artificial intelligence model performance m k is 85%.
  • the operation method of the feature information recommendation apparatus 200 includes user-set feature information (a, out of all feature information (n) in user-set feature information (a, b, c, d, e, f).
  • a plurality of feature information combinations may be set by sequentially adding at least one of the specific information other than b, c, d, e, f) (S120).
  • the operation method of the feature information recommendation apparatus 200 includes the user set feature information (a, b, c, d, e, f) set by the user, among the whole feature information (n)
  • One to (nk) feature information among the remaining specific information except for the user-set feature information (a, b, c, d, e, f) can be sequentially added to set a plurality of feature information combinations as follows.
  • the operation method of the feature information recommendation apparatus 200 is the performance of the artificial intelligence model generated based on learning for each combination of multiple feature information as described above, 82%, 80%, ... 88% , ... 85% can be confirmed (S130).
  • the top N (for example, 4) having performance may be selected / recommended as a specific feature information combination (S140 Yes, S150).
  • the operating method of the feature information recommendation apparatus 200 includes user-specified feature information (a, b, c, d, e, f), and overall feature information (n) Among the user-specific feature information (a, b, c, d, e, f), the remaining specific information may be sequentially added one by one to set a plurality of feature information combinations as follows (S120).
  • the operation method of the feature information recommendation device 200 is the performance of the artificial intelligence model generated based on learning for each combination of a plurality of feature information, as described above, 82%, 80%, ... 90% It can be confirmed (S130).
  • the top N for example, three
  • the top N may be selected / recommended as a specific feature information combination (S140 Yes, S150).
  • the artificial intelligence model platform 100 recommending an optimal feature having optimal performance (accuracy) to a user generating an artificial intelligence model for security control based on UI. / By making it applicable, even an average user who is not familiar with security control technology can create an optimal AI model for security control.
  • the normalization method recommendation apparatus 300 of the present invention when generating an artificial intelligence model, the property of feature information used for learning is checked (S200).
  • the feature information used for learning when the AI model is generated may be feature information that is directly set by a user based on a UI among all the feature information that can be set when the AI model is generated, or a specific feature that is recommended among all feature information.
  • the feature information combination may be feature information applied / set.
  • the attribute of the characteristic information can be largely divided into a number attribute and a category attribute.
  • the operation method of the normalization method recommendation apparatus 300 is whether the attribute of feature information (direct setting or recommendation application) used for learning when generating an artificial intelligence model is a numeric attribute or a category attribute or a number and It can be checked whether the category is a combination attribute (S200).
  • the operation method of the normalization method recommendation apparatus 300 determines a normalization method according to the attribute of the feature information identified in step S200 among all the settable normalization methods.
  • the method of operation of the apparatus 300 for recommending a normalization method according to the present invention prior to determining the normalization method according to the attribute of the feature information, is the same normalization method applied to all of the feature information fields or this feature In the entire information field, whether a normalization method is applied for each field may be distinguished first (S210).
  • the same normalization method is applied to the entire feature information field. It can be classified as being (S210 Yes).
  • the first normalization method when the attribute of the feature information is a numeric attribute, the first normalization method according to the entire number pattern of the feature information is determined, and the attribute of the feature information is In the case of a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of feature information in a vector defined as the total number of categories of the feature information, and the attribute of the feature information is a number and In the case of a category combination attribute, the second normalization method and the first normalization method may be determined (S220).
  • the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority (see Equations 1, 2, and 3).
  • the operation method of the normalization method recommendation apparatus 300 is to classify the attribute of the feature information as a numeric attribute when only numeric data exists in the entire feature information field, and in this case, the method according to the whole number pattern of the feature information 1 Determine the normalization method.
  • the operation method of the normalization method recommendation apparatus 300 according to the present invention is determined in the order of the standard score normalization method, the mean normalization normalization method, and the feature scaling normalization method according to the priority of the first normalization method. Based on the existence of the standard deviation and the upper / lower limit of the normalized scaling range for the entire numeric pattern, the highest normalized normalization method applicable among the first normalization methods may be determined.
  • the attribute of the feature information is classified as a category attribute, and in this case, the total number of categories of the feature information
  • a second normalization method _One Hot Encoding that expresses a non-zero characteristic value (eg, 1) only at a designated location for each category of feature information in a defined vector can be determined.
  • the attribute of the feature information is divided into a number and category combination attribute.
  • the second normalization method and the first normalization method may be determined.
  • the second normalization method of the above-mentioned second normalization method_One Hot After Encoding is applied, the second normalization method is used to determine the highest normalization method applicable among the first normalization methods based on whether there is a standard deviation and a normalization scaling range upper / lower limit for the entire numeric pattern of the feature information. And a first normalization method.
  • the feature information is a composite feature (a single feature that can be extracted by using statistical and statistical methods between multiple security events)
  • the entire feature information of this time It can be classified as being applied to the normalization method for each field in the field (S210 No).
  • the operation method of the normalization method recommendation apparatus 300 includes a normalization method having the highest priority, which is applicable among the means of the normalization method and the feature scaling normalization method for the field of attribute type attribute in the feature information. It can be determined (S230).
  • the operation method of the normalization method recommendation apparatus 300 determines the normalization method having the highest priority among the normalization method, the mean normalization normalization method, and the feature scaling normalization method for the field of the attribute whose number of attributes is the feature information. It can be (S230).
  • the attribute in the attribute information determines the normalization method for the field of the ratio attribute and decides to exclude it from the normalization target or determines the standard score normalization method. It can be (S230).
  • the normalization method is not determined and the normalization target is determined for the field of the attribute presence or absence (for example, presence / absence of an operation result value) in the feature information. It may be decided to exclude (S230).
  • the operation method of the normalization method recommendation apparatus 300 recommends the normalization method determined in step S220 or step S230 (S240).
  • the recommendation / applying the optimal normalization method with optimal performance (accuracy) to the user generating the AI model for security control based on the UI By doing so, even an average user who is not familiar with security control technology can create an optimal artificial intelligence model for security control.
  • an artificial intelligence model platform that enables the creation of an artificial intelligence model for security control is implemented, but in particular, feature information and normalization methods directly related to the performance of the artificial intelligence model are optimally recommended /
  • an artificial intelligence model platform that allows an ordinary user who is not familiar with security control technology to generate an optimal AI model for security control.
  • the optimal artificial intelligence model suitable for the purpose and requirements for security control can be flexibly and variously generated and applied, the quality improvement of the security control service can be maximized, and large scale It can be expected to have the effect of supporting the construction of an AI-based infringement response system to efficiently analyze the signs of cyber attacks and anomalies.
  • the artificial intelligence model platform operating method may be implemented in a form of program instructions that can be executed through various computer means and may be recorded in a computer readable medium.
  • the computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination.
  • the program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in computer software.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks.
  • -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like.
  • Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.
  • the hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a technique, which implements an artificial-intelligence model platform capable of creating an artificial-intelligence model for security control and, in particular, can optimally recommend/apply feature information and normalization methods directly related to artificial-intelligence model performance, thereby enabling general users, who are not familiar with a security control technique, to create an optimal artificial-intelligence model for security control.

Description

인공지능 모델 플랫폼 및 인공지능 모델 플랫폼 운영 방법AI model platform and how to operate AI model platform
본 발명은, 보안관제를 위한 인공지능 모델 생성 기술에 관한 것이다.The present invention relates to a technology for generating artificial intelligence models for security control.
더욱 상세하게는, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 제공하기 위한 것이다.More specifically, it is to provide an artificial intelligence model platform that enables an ordinary user who is not familiar with security control technology to generate an optimal AI model for security control.
현재, 과학기술사이버안전센터는 침해위협관리시스템(TMS)을 기반으로 공공연구기관에 대한 실시간 보안관제 서비스를 제공하고 있다.Currently, the Science and Technology Cyber Safety Center provides real-time security control services for public research institutes based on the TMS.
실시간 보안관제 서비스는, 침해위협관리시스템(TMS)에서 탐지 및 수집하는 보안이벤트를 기반으로, 보안관제 요원에 의한 분석 및 대응 지원이 이루어지는 서비스 구조로 제공되고 있다.The real-time security control service is provided as a service structure that provides analysis and response support by security control personnel based on security events detected and collected by the intrusion threat management system (TMS).
헌데, TMS에 의해 탐지되는 보안이벤트 수가 폭발적으로 증가하고 있으며, 이와 같은 대용량의 전체 보안이벤트를 보안관제 요원이 분석하기는 현실적으로 어려운 한계 상황에 도달하고 있다.However, the number of security events detected by TMS is explosively increasing, and it is realistically difficult for security control personnel to analyze such large-capacity security events.
또한, 기존의 보안관제 서비스는, 보안관제 요원의 전문 지식 및 경험에 의존하기 때문에, 특정 보안이벤트에 대한 분석이 집중되는 업무편중 현상 발생하거나 분석 결과의 편차가 발생하는 등 분석 평준화가 실현되지 못하는 상황도 발생하고 있다.In addition, since the existing security control service relies on the expertise and experience of the security control personnel, analysis leveling cannot be realized, such as occurrence of work bias that concentrates analysis on specific security events or deviation of analysis results. Things are happening.
결국, TMS에 의해 탐지되는 보안이벤트 수가 폭발적으로 증가하고 있는 현 상황에서는, 보안관제 요원의 분석에 의존하는 기존 보안관제 서비스의 서비스 구조 자체를 혁신할 필요가 있다.As a result, in the current situation in which the number of security events detected by TMS is exploding, it is necessary to innovate the service structure of the existing security control service itself, which relies on the analysis of security control personnel.
이에, 보안관제 요원의 분석을 대체할 수 있는 인공지능 모델을 활용하는 보안관제 서비스 구조를 생각해 볼 수 있다.Accordingly, it is possible to think of a security control service structure that utilizes an artificial intelligence model that can replace the analysis of security control personnel.
본 발명에서는, 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 제공하고자 한다.In the present invention, it is intended to provide an artificial intelligence model platform capable of generating an artificial intelligence model for security control.
특히, 본 발명에서는, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 제공하고자 한다.In particular, in the present invention, it is intended to provide an artificial intelligence model platform that enables an ordinary user who is not familiar with security control technology to generate an optimal artificial intelligence model for security control.
본 발명에서 도달하고자 하는 목적은, 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현하는 방안(기술)을 제공하는데 있다.The object to be reached in the present invention is to provide a method (technology) for implementing an AI model platform that enables generation of an AI model for security control.
본 발명의 일 실시예에 따른 인공지능 모델 플랫폼은, 원천 보안데이터로부터 특정 검색 조건에 의해 학습/테스트 데이터로 사용하고자 하는 보안이벤트를 수집하는 데이터수집모듈; 상기 수집된 보안이벤트에 대하여 기 설정된 특징정보를 추출하는 특징추출모듈; 상기 보안이벤트의 추출된 특징정보에 대하여 기 설정된 정규화를 수행하는 정규화모듈; 상기 특정정보 정규화가 완료된 보안이벤트에서 학습 데이터 또는 테스트 데이터를 주어진 조건에 의해 추출하는 데이터출력모듈; 및 상기 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델을 생성하는 모델생성모듈을 포함한다.An artificial intelligence model platform according to an embodiment of the present invention includes: a data collection module for collecting security events to be used as learning / test data by specific search conditions from source security data; A feature extraction module that extracts preset feature information for the collected security event; A normalization module that performs preset normalization on the extracted feature information of the security event; A data output module that extracts learning data or test data from the security event where the specific information normalization is completed according to a given condition; And a model generation module that applies an artificial intelligence algorithm to the learning data to generate an artificial intelligence model for security control.
구체적으로, 상기 테스트 데이터를 활용하여, 상기 인공지능 모델의 정확도를 테스트하는 성능관리모듈을 더 포함할 수 있다. Specifically, by using the test data, it may further include a performance management module for testing the accuracy of the artificial intelligence model.
구체적으로, 상기 데이터수집모듈의 특정 검색 조건, 상기 특징추출모듈의 특징정보, 상기 정규화모듈의 정규화 방식, 상기 데이터출력모듈의 조건 중 적어도 하나를 설정하기 위한 UI(User Interface)를 제공하는 UI모듈을 더 포함할 수 있다. Specifically, a UI module that provides a user interface (UI) for setting at least one of specific search conditions of the data collection module, feature information of the feature extraction module, normalization method of the normalization module, and conditions of the data output module. It may further include.
구체적으로, 상기 데이터수집모듈은, 수집 건의 총 수가 동시 수행 가능한 최대 수집 건수를 초과하는 경우, 상기 수집 건의 총 개수 중 최대 수집 건수를 초과하는 수집 건을 큐(queue)에 저장한 후 순차적으로 진행하며, 상기 큐에 저장한 후 진행하는 수집 건의 경우, 상기 원천 보안데이터에서 상기 수집 건의 발생시점 이전 데이터에 대해서만 상기 보안이벤트를 수집할 수 있다. Specifically, when the total number of collection cases exceeds the maximum number of collections that can be simultaneously performed, the data collection module sequentially stores the number of collections exceeding the maximum number of collections in a queue and sequentially proceeds. In the case of a collection case proceeding after being stored in the queue, the security event can be collected only for data prior to the occurrence point of the collection case in the source security data.
구체적으로, 상기 특징추출모듈은, 상기 성능관리모듈의 정확도 테스트 결과를 근거로, 상기 인공지능 모델의 정확도를 높이도록 상기 특징정보에 대한 변경을 추천할 수 있다. Specifically, the feature extraction module may recommend a change to the feature information to increase the accuracy of the AI model based on the result of the accuracy test of the performance management module.
구체적으로, 상기 정규화모듈은, 상기 인공지능 모델의 정확도를 높이도록 상기 정규화에 대한 정규화 방식 변경을 추천할 수 있다.Specifically, the normalization module may recommend changing the normalization method for the normalization to increase the accuracy of the artificial intelligence model.
본 발명의 일 실시예에 따른 특징정보 추천 장치는, 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 기 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델에 대하여, 모델 성능을 확인하는 모델성능확인부; 상기 전체 특징정보에서 다수의 특징정보 조합을 설정하여, 상기 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인하는 조합성능확인부; 및 상기 다수의 특징정보 조합 별 성능 중 상기 모델성능확인부에서 확인한 모델 성능 보다 높은 성능의 특정 특징정보 조합을 추천하는 추천부를 포함한다.The feature information recommendation apparatus according to an embodiment of the present invention is a model performance confirmation unit that checks model performance with respect to an AI model generated based on learning predetermined feature information among all feature information that can be set when generating an AI model. ; A combination performance checking unit configured to set a plurality of feature information combinations from the entire feature information to check the performance of the artificial intelligence model generated based on learning for each of the plurality of feature information combinations; And a recommendation unit recommending a specific feature information combination having a higher performance than the model performance confirmed by the model performance checking unit among performances of the plurality of feature information combinations.
구체적으로, 상기 다수의 특징정보 조합은, 상기 기 설정된 특징정보에, 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가한 조합이며, 상기 특정 특징정보 조합은, 상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 상위 N개일 수 있다. Specifically, the combination of the plurality of feature information is a combination in which at least one of the remaining specific information excluding the preset feature information from the entire feature information is sequentially added to the preset feature information, and the specific feature information combination is: Among the plurality of feature information combinations, it may be the top N having higher performance than the model performance.
구체적으로, 상기 기 설정된 특정정보는 상기 전체 특징정보이며, 상기 조합성능확인부는, 상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정, 상기 최대 성능이 상기 모델 성능 보다 높은 경우 상기 최대 성능의 단일 특징정보를 상기 특징정보로 재 설정하고, 상기 특징정보에 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 하나씩 순차적으로 추가하여 상기 다수의 특징정보 조합을 설정하는 조합설정 과정, 상기 다수의 특징정보 조합 중 상기 재 설정한 특징정보의 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정, 상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행할 수 있다. Specifically, the predetermined specific information is the entire feature information, and the combination performance checking unit is the maximum performance among the performances of the artificial intelligence model generated based on learning for each single feature information in the whole feature information. Single feature information performance comparison process to check if the maximum performance is higher than the model performance, the single feature information of the maximum performance is reset to the feature information, and the feature information is preset to the feature information from the whole feature information. The combination setting process of setting the combination of the plurality of characteristic information by sequentially adding one of the remaining specific information one by one, each of the combination of feature information having a higher performance than the model performance of the re-set feature information among the plurality of characteristic information combinations Resetting process for resetting the feature information so that the combination setting process is repeatedly performed for each re-set feature information. When there is no feature information combination having a higher performance than the model performance among the multiple feature information combinations , It is possible to perform a process of delivering the previous feature information as the combination of the specific feature information to the recommendation unit.
본 발명의 일 실시예에 따른 정규화 방식 추천 장치는, 인공지능 모델 생성 시 학습에 이용되는 특징정보의 속성을 확인하는 속성확인부; 설정 가능한 전체 정규화 방식 중, 상기 특징정보의 속성에 따른 정규화 방식을 결정하는 결정부; 및 상기 결정한 정규화 방식을 추천하는 추천부를 포함한다.An apparatus for recommending a normalization method according to an embodiment of the present invention includes: an attribute confirmation unit that checks an attribute of feature information used for learning when generating an artificial intelligence model; Determining unit for determining a normalization method according to the attribute of the feature information, from among all the settable normalization method; And a recommendation unit recommending the determined normalization method.
구체적으로, 상기 결정부는, 상기 특징정보 전체 필드에 동일한 정규화 방식 적용되는 경우라면, 상기 특징정보의 속성이 숫자 속성인 경우, 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정하고, 상기 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정하고, 상기 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 상기 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다. Specifically, if the same normalization method is applied to all of the feature information fields, if the attribute of the feature information is a numeric attribute, the determination unit determines a first normalization method according to the whole number pattern of the feature information, and the feature When the attribute of the information is a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of feature information in a vector defined as the total number of categories of the feature information, and the feature information When the attribute of is a combination of a number and a category, the second normalization scheme and the first normalization scheme may be determined.
구체적으로, 상기 제1 정규화 방식은, 기 정의된 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식을 포함하며, 상기 결정부는, 특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로, 상기 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다. Specifically, the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority, and the determining unit includes standard deviations for all numeric patterns of feature information and Based on whether there is an upper / lower limit of the normalization scaling range, a normalization scheme having the highest priority applicable among the first normalization schemes may be determined.
구체적으로, 상기 결정부는, 상기 특징정보 전체 필드에서 필드 별로 정규화 방식 적용되는 경우라면, 상기 특징정보에서 속성이 종류 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고, 상기 특징정보에서 속성이 개수 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고, 상기 특징정보에서 속성이 비율 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키거나 또는 Standard score 정규화 방식을 결정하고, 상기 특징정보에서 속성이 존재 여부 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시킬 수 있다.Specifically, if the normalization method is applied for each field in the entire field of the feature information, the determining unit has the highest priority applicable among the normalization method and the normalizing method of the feature scaling for the attribute attribute type field in the feature information. A high normalization method is determined, and for the field of the attribute whose number of attributes is attribute in the feature information, a normalization method having the highest priority applicable among the mean normalization normalization method and the feature scaling normalization method is determined, and the attribute attribute is the ratio attribute in the feature information. The normalization method is not determined and excluded from the normalization target for the field of, or the standard score normalization method is determined, and whether the attribute exists in the feature information, the normalization scheme is not determined and excluded from the normalization target. have.
본 발명의 일 실시예에 따른 인공지능 모델 플랫폼 운영 방법은, 원천 보안데이터로부터 특정 검색 조건에 의해 학습/테스트 데이터로 사용하고자 하는 보안이벤트를 수집하는 데이터수집단계; 상기 수집된 보안이벤트에 대하여 기 설정된 특징정보를 추출하는 특징추출단계; 상기 보안이벤트의 추출된 특징정보에 대하여 기 설정된 정규화를 수행하는 정규화단계; 상기 특정정보 정규화가 완료된 보안이벤트에서 학습 데이터 또는 테스트 데이터를 주어진 조건에 의해 추출하는 데이터출력단계; 및 상기 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델을 생성하는 모델생성단계를 포함한다.An artificial intelligence model platform operating method according to an embodiment of the present invention includes: a data collection step of collecting security events to be used as learning / test data according to specific search conditions from source security data; A feature extraction step of extracting predetermined feature information for the collected security event; A normalization step of performing preset normalization on the extracted feature information of the security event; A data output step of extracting training data or test data according to a given condition from the security event in which the specific information normalization is completed; And a model generation step of applying an artificial intelligence algorithm to the learning data to generate an artificial intelligence model for security control.
구체적으로, 상기 테스트 데이터를 활용하여, 상기 인공지능 모델의 정확도를 테스트하는 성능관리단계를 더 포함할 수 있다. Specifically, a performance management step of testing the accuracy of the artificial intelligence model using the test data may be further included.
구체적으로, 상기 데이터수집단계의 특정 검색 조건, 상기 특징추출모듈의 특징정보, 상기 정규화모듈의 정규화 방식, 상기 데이터출력모듈의 조건 중 적어도 하나를 설정하기 위한 UI(User Interface)를 제공하는 단계를 더 포함할 수 있다. Specifically, providing a user interface (UI) for setting at least one of a specific search condition of the data collection step, feature information of the feature extraction module, normalization method of the normalization module, and condition of the data output module. It may further include.
구체적으로, 상기 데이터수집단계는, 수집 건의 총 수가 동시 수행 가능한 최대 수집 건수를 초과하는 경우, 상기 수집 건의 총 개수 중 최대 수집 건수를 초과하는 수집 건을 큐(queue)에 저장한 후 순차적으로 진행하며, 상기 큐에 저장한 후 진행하는 수집 건의 경우, 상기 원천 보안데이터에서 상기 수집 건의 발생시점 이전 데이터에 대해서만 상기 보안이벤트를 수집할 수 있다. Specifically, in the data collection step, if the total number of collection cases exceeds the maximum number of simultaneous collections, the number of collection cases exceeding the maximum number of collection cases is stored in a queue and sequentially performed. In the case of a collection case proceeding after being stored in the queue, the security event can be collected only for data prior to the occurrence point of the collection case in the source security data.
구체적으로, 상기 성능관리단계의 정확도 테스트 결과를 근거로, 상기 인공지능 모델의 정확도를 높이도록 상기 특징정보에 대한 변경을 추천하는 단계를 더 포함할 수 있다. Specifically, based on the result of the accuracy test of the performance management step, it may further include the step of recommending a change to the feature information to increase the accuracy of the artificial intelligence model.
구체적으로, 상기 정규화단계는, 상기 인공지능 모델의 정확도를 높이도록 상기 정규화에 대한 정규화 방식 변경을 추천할 수 있다.Specifically, in the normalization step, it is possible to recommend changing the normalization method for the normalization to increase the accuracy of the artificial intelligence model.
본 발명의 일 실시예에 따른 컴퓨터프로그램은, 하드웨어와 결합하여, 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 기 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델에 대하여, 모델 성능을 확인하는 모델성능확인단계; 상기 전체 특징정보에서 다수의 특징정보 조합을 설정하여, 상기 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인하는 조합성능확인단계; 및 상기 다수의 특징정보 조합 별 성능 중 상기 모델성능확인부에서 확인한 모델 성능 보다 높은 성능의 특정 특징정보 조합을 추천하는 추천단계를 실행시키기 위하여 매체에 저장된다.A computer program according to an embodiment of the present invention is a model that checks model performance with respect to an artificial intelligence model generated based on learning preset feature information among all feature information that can be set when generating an artificial intelligence model in combination with hardware. Performance check step; A combination performance checking step of setting a combination of a plurality of feature information from the whole feature information, and confirming the performance of the artificial intelligence model generated based on learning for each of the plurality of feature information combinations; And a performance of recommending a specific feature information combination having a higher performance than the model performance confirmed by the model performance checking unit among performances of the plurality of feature information combinations.
구체적으로, 상기 다수의 특징정보 조합은, 상기 기 설정된 특징정보에, 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가한 조합이며, 상기 특정 특징정보 조합은, 상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 상위 N개일 수 있다. Specifically, the combination of the plurality of feature information is a combination in which at least one of the remaining specific information excluding the preset feature information from the entire feature information is sequentially added to the preset feature information, and the specific feature information combination is: Among the plurality of feature information combinations, it may be the top N having higher performance than the model performance.
구체적으로, 상기 기 설정된 특정정보는 상기 전체 특징정보이며, 상기 조합성능확인단계는, 상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정, 상기 최대 성능이 상기 모델 성능 보다 높은 경우 상기 최대 성능의 단일 특징정보를 상기 특징정보로 재 설정하고, 상기 특징정보에 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 하나씩 순차적으로 추가하여 상기 다수의 특징정보 조합을 설정하는 조합설정 과정, 상기 다수의 특징정보 조합 중 상기 재 설정한 특징정보의 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정, 상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행할 수 있다.Specifically, the predetermined specific information is the entire feature information, and the combination performance checking step is the maximum performance among the performances of the artificial intelligence model generated based on learning for each single feature information in the whole feature information. Single feature information performance comparison process to check whether it is higher, if the maximum performance is higher than the model performance, the single feature information of the maximum performance is reset to the feature information, and the predetermined feature is set in the whole feature information in the feature information The combination setting process of setting the combination of the plurality of feature information by sequentially adding one by one of the specific information except the information, each of the feature information combinations having higher performance than the model performance of the re-set feature information among the plurality of feature information combinations Resetting as the feature information, so that the combination setting process is repeatedly performed for each re-set feature information, there is no feature information combination having a higher performance than the model performance among the multiple feature information combinations. In this case, a process of delivering the previous feature information as the specific feature information combination to the recommender may be performed.
구체적으로, 상기 기 설정된 특정정보는 상기 전체 특징정보이며, 상기 조합성능확인단계는, 상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정, 상기 최대 성능이 상기 모델 성능 보다 높지 않은 경우 상기 특징정보에서 서로 다른 하나의 특정정보를 제외한 상기 다수의 특징정보 조합을 설정하는 조합설정 과정, 상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정, 상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행할 수 있다.Specifically, the predetermined specific information is the entire feature information, and the combination performance checking step is the maximum performance among the performances of the artificial intelligence model generated based on learning for each single feature information in the whole feature information. Single feature information performance comparison process to check if it is higher, if the maximum performance is not higher than the model performance, combination setting process to set the combination of the plurality of feature information excluding one specific information from the feature information, the plurality of Among the feature information combinations, each of the feature information combinations having a performance higher than the model performance is reset as feature information, and a reset process is performed so that the combination setting process is repeatedly performed for each re-set feature information. If there is no feature information combination having a higher performance than the model performance, a process of delivering the immediately preceding feature information as the specific feature information combination to the recommendation unit may be performed.
본 발명의 일 실시예에 따른 컴퓨터프로그램은, 하드웨어와 결합하여, 인공지능 모델 생성 시 학습에 이용되는 특징정보의 속성을 확인하는 속성확인단계; 설정 가능한 전체 정규화 방식 중, 상기 특징정보의 속성에 따른 정규화 방식을 결정하는 결정단계; 및 상기 결정한 정규화 방식을 추천하는 추천단계를 실행시키기 위하여 매체에 저장된다.A computer program according to an embodiment of the present invention comprises: an attribute checking step in combination with hardware to check the attribute of feature information used for learning when creating an artificial intelligence model; A determining step of determining a normalization method according to the attribute of the feature information from among all the settable normalization methods; And a recommendation step of recommending the determined normalization method.
구체적으로, 상기 결정단계는, 상기 특징정보 전체 필드에 동일한 정규화 방식 적용되는 경우라면, 상기 특징정보의 속성이 숫자 속성인 경우, 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정하고, 상기 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정하고, 상기 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 상기 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다. Specifically, in the determining step, if the same normalization method is applied to all the feature information fields, if the attribute of the feature information is a numeric attribute, the first normalization method according to the whole number pattern of the feature information is determined, and the When the attribute of the feature information is a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of the feature information in a vector defined as the total number of categories of the feature information, and the feature When the attribute of the information is a combination attribute of a number and a category, the second normalization method and the first normalization method may be determined.
구체적으로, 상기 제1 정규화 방식은, 기 정의된 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식을 포함하며, 상기 결정단계는, 특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로, 상기 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다. Specifically, the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority, and the determining step includes standard deviation of the entire numeric pattern of feature information. And a normalization scheme having the highest priority applicable among the first normalization schemes based on whether there is an upper / lower limit of the normalization scaling range.
구체적으로, 상기 결정단계는, 상기 특징정보 전체 필드에서 필드 별로 정규화 방식 적용되는 경우라면, 상기 특징정보에서 속성이 종류 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고, 상기 특징정보에서 속성이 개수 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고, 상기 특징정보에서 속성이 비율 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키거나 또는 Standard score 정규화 방식을 결정하고, 상기 특징정보에서 속성이 존재 여부 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시킬 수 있다.Specifically, in the determining step, if the normalization method is applied for each field in the entire feature information field, the attribute has the highest priority applicable to the normalization method of the feature normalization method and the feature scaling normalization method for the attribute type attribute field in the feature information. Determines a normalization method having a high value, determines a normalization method having the highest priority applicable among a means normalization method and a feature scaling normalization method for the field of the attribute number attribute in the feature information, and the attribute ratio in the feature information For the field of the attribute, the normalization method is not determined and excluded from the normalization target, or the standard score normalization method is determined, and whether the attribute is present in the feature information or not. Can be.
이에, 본 발명에 따른 인공지능 모델 플랫폼 및 인공지능 모델 플랫폼 운영 방법에 의하면, 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현하되, 특히 인공지능 모델 성능에 직결되는 특징정보 및 정규화 방식을 최적으로 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현할 수 있다. Accordingly, according to the artificial intelligence model platform and the artificial intelligence model platform operating method according to the present invention, an artificial intelligence model platform capable of generating an artificial intelligence model for security control is implemented, in particular, a feature directly related to the performance of the artificial intelligence model. By allowing information and normalization methods to be optimally recommended / applied, it is possible to implement an artificial intelligence model platform that allows an average user who is not familiar with security control technology to generate an optimal artificial intelligence model for security control.
이로 인해, 본 발명에 따르면, 보안관제를 위한 목적 및 요구 사항에 적합한 최적의 인공지능 모델을 유연하고 다양하게 생성 및 적용할 수 있기 때문에, 보안관제 서비스의 품질 향상을 극대화시킬 수 있고, 아울러 대규모 사이버공격 및 이상행위 발생 징후를 효율적으로 분석하기 위한 인공지능 기반의 침해대응 체계 구축을 지원할 수 있는 효과까지 기대할 수 있다.For this reason, according to the present invention, since the optimal artificial intelligence model suitable for the purpose and requirements for security control can be flexibly and variously generated and applied, the quality improvement of the security control service can be maximized, and large scale It can be expected to have the effect of supporting the construction of an AI-based infringement response system to efficiently analyze the signs of cyber attacks and anomalies.
도 1은 본 발명의 실시예에 따른 인공지능 모델 플랫폼을 보여주는 개념도이다.1 is a conceptual diagram showing an AI model platform according to an embodiment of the present invention.
도 2는 본 발명의 실시예에 따른 인공지능 모델 플랫폼의 구성을 보여주는 구성도이다.2 is a configuration diagram showing the configuration of the AI model platform according to an embodiment of the present invention.
도 3은 본 발명의 실시예에 따른 특징정보 추천 장치의 구성을 보여주는 구성도이다.3 is a block diagram showing the configuration of a feature information recommendation device according to an embodiment of the present invention.
도 4는 본 발명의 실시예에 따른 정규화 방식 추천 장치의 구성을 보여주는 구성도이다.4 is a configuration diagram showing the configuration of a normalization method recommendation apparatus according to an embodiment of the present invention.
도 5는 본 발명의 실시예에 따른 인공지능 모델 플랫폼 운영 방법을 보여주는 흐름도이다.5 is a flowchart illustrating an artificial intelligence model platform operating method according to an embodiment of the present invention.
도 6은 본 발명의 실시예에 따른 특징정보 추천 장치의 동작 방법을 보여주는 흐름도이다.6 is a flowchart illustrating a method of operating a feature information recommendation device according to an embodiment of the present invention.
도 7은 본 발명의 실시예에 따른 정규화 방식 추천 장치의 동작 방법을 보여주는 흐름도이다.7 is a flowchart illustrating an operation method of a normalization method recommendation apparatus according to an embodiment of the present invention.
이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대하여 설명한다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.
현재, 과학기술사이버안전센터에서 제공하고 있는 실시간 보안관제 서비스는, 침해위협관리시스템(TMS)에서 탐지 및 수집하는 보안이벤트를 기반으로, 보안관제 요원에 의한 룰(Rule) 기반 분석 및 대응 지원이 이루어지는 서비스 구조를 갖는다.Currently, the real-time security control service provided by the Science and Technology Cyber Safety Center is based on security events detected and collected by the intrusion threat management system (TMS), and provides rule-based analysis and response support by security control personnel. It has a service structure that is made.
헌데, TMS에 의해 탐지되는 보안이벤트 수가 폭발적으로 증가하고 있으며, 이와 같은 대용량의 전체 보안이벤트를 보안관제 요원이 분석하기는 현실적으로 어려운 한계 상황에 도달하고 있다.However, the number of security events detected by TMS is explosively increasing, and it is realistically difficult for security control personnel to analyze such large-capacity security events.
또한, 기존의 보안관제 서비스는, 보안관제 요원의 전문 지식 및 경험에 의존하기 때문에, 특정 보안이벤트에 대한 분석이 집중되는 업무편중 현상 발생하거나 분석 결과의 편차가 발생하는 등 분석 평준화가 실현되지 못하는 상황도 발생하고 있다.In addition, since the existing security control service relies on the expertise and experience of the security control personnel, analysis leveling cannot be realized, such as occurrence of work bias that concentrates analysis on specific security events or deviation of analysis results. Things are happening.
결국, TMS에 의해 탐지되는 보안이벤트 수가 폭발적으로 증가하고 있는 현 상황에서는, 보안관제 요원의 분석에 의존하는 기존 보안관제 서비스의 서비스 구조 자체를 혁신할 필요가 있다.As a result, in the current situation in which the number of security events detected by TMS is exploding, it is necessary to innovate the service structure of the existing security control service itself, which relies on the analysis of security control personnel.
이에, 보안관제 요원의 분석을 대체할 수 있는 인공지능 모델을 활용하는 보안관제 서비스 구조를 생각해 볼 수 있다.Accordingly, it is possible to think of a security control service structure that utilizes an artificial intelligence model that can replace the analysis of security control personnel.
본 발명에서는, 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 제공하고자 한다.In the present invention, it is intended to provide an artificial intelligence model platform capable of generating an artificial intelligence model for security control.
특히, 본 발명에서는, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 제공하고자 한다.In particular, in the present invention, it is intended to provide an artificial intelligence model platform that enables an ordinary user who is not familiar with security control technology to generate an optimal artificial intelligence model for security control.
도 1은 본 발명에서 제안하는 인공지능 모델 플랫폼의 일 실시예를 개념적으로 보여주고 있다.1 conceptually shows an embodiment of the AI model platform proposed in the present invention.
도 1에 도시된 바와 같이, 본 발명의 인공지능 모델 플랫폼은, 보안관제를 위한 인공지능 모델 생성에 필요한 각종 데이터를 수집 및 가공하는 수집 기능, 수집 기능에서 수집 및 가공된 각종 데이터를 기반으로 인공지능 모델을 생성하고 이와 관련된 성능 및 이력을 관리하는 인공지능 기능, 그리고 시스템 관리자 및 일반 사용자에게 제공하는 UI(User Interface)를 기반으로 수집/인공지능 기능과 관련된 각종 설정 및 사용자 관리를 담당하는 관리 기능으로 구분할 수 있다.As shown in FIG. 1, the AI model platform of the present invention is based on various data collected and processed in a collection function and a collection function for collecting and processing various data necessary for generating an AI model for security control. Based on the artificial intelligence function that creates an intelligent model and manages the performance and history associated with it, and the management responsible for various settings and user management related to the collection / artificial intelligence function based on the user interface (UI) provided to system administrators and general users. It can be divided into functions.
그리고, 본 발명의 인공지능 모델 플랫폼은, 빅데이터 통합저장 스토리지로부터 신규 생성된 원천 보안데이터를 주기적으로 수집하는 검색엔진을 포함하고, 수집 기능에서의 각종 데이터를 검색엔진에 탑재하여 검색엔진을 데이터저장소로서 활용할 수 있다.And, the artificial intelligence model platform of the present invention includes a search engine that periodically collects the newly generated source security data from the big data integrated storage storage, and loads various data from the collection function into the search engine to search the data. Can be used as storage.
이렇게 되면, 수집 기능에 속하는 각종 모듈(예: 수집/특징추출/정규화/출력)은 검색엔진(데이터저장소)를 기반으로 동작할 수 있다.In this case, various modules (eg, collection / feature extraction / normalization / output) belonging to the collection function may operate based on a search engine (data storage).
이하에서는, 도 2를 참조하여 본 발명의 실시예에 인공지능 모델 플랫폼의 구성 및 각 구성의 역할을 구체적으로 설명하겠다.Hereinafter, the configuration of the AI model platform and the role of each configuration will be described in detail with reference to FIG. 2.
본 발명의 인공지능 모델 플랫폼(100)은, 데이터수집모듈(110), 특징추출모듈(120), 정규화모듈(130), 데이터출력모듈(140), 모델생성모듈(150)을 포함한다.The artificial intelligence model platform 100 of the present invention includes a data collection module 110, a feature extraction module 120, a normalization module 130, a data output module 140, and a model generation module 150.
더 나아가, 본 발명의 인공지능 모델 플랫폼(100)은, 성능관리모듈(160) 및 UI모듈(170)을 더 포함할 수 있다.Furthermore, the AI model platform 100 of the present invention may further include a performance management module 160 and a UI module 170.
이러한 인공지능 모델 플랫폼(100)의 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least part of the configuration of the AI model platform 100 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.
여기서, 소프트웨어 모듈이란, 예컨대, 인공지능 모델 플랫폼(100) 내에서 연산을 제어하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 인공지능 모델 플랫폼(100) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module may be understood as, for example, instructions executed by a processor that controls operations within the AI model platform 100, and these instructions are in a form mounted in a memory in the AI model platform 100. Will have.
결국, 본 발명의 일 실시예에 따른 인공지능 모델 플랫폼(100)은 전술한 구성을 통해, 본 발명에서 제안하는 기술 즉 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 기술을 실현하며, 이하에서는 이를 실현하기 위한 인공지능 모델 플랫폼(100) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.After all, the artificial intelligence model platform 100 according to an embodiment of the present invention realizes the technology proposed in the present invention through the above-described configuration, that is, the technology capable of generating an optimal artificial intelligence model for security control. , Hereinafter, each configuration in the AI model platform 100 for realizing this will be described in more detail.
먼저, UI모듈(170)은, 데이터수집모듈(110)의 특정 검색 조건, 특징추출모듈(120)의 특징정보, 정규화모듈(130)의 정규화 방식, 데이터출력모듈(140)의 조건 중 적어도 하나를 설정하기 위한 UI(User Interface)를 제공한다.First, the UI module 170 has at least one of a specific search condition of the data collection module 110, feature information of the feature extraction module 120, normalization method of the normalization module 130, and condition of the data output module 140. Provides a UI (User Interface) for setting.
예컨대, UI모듈(170)은, 본 발명의 인공지능 모델 플랫폼(100)에서 보안관제를 위한 인공지능 모델을 생성하고자 하는 시스템 관리자 또는 일반 사용자(이하, 사용자로 통칭함)의 조작에 따라, 데이터수집모듈(110)의 특정 검색 조건, 특징추출모듈(120)의 특징정보, 정규화모듈(130)의 정규화 방식, 데이터출력모듈(140)의 조건 중 적어도 하나를 설정하기 위한 UI를 제공한다.For example, the UI module 170, according to the operation of a system administrator or a general user (hereinafter referred to as a user) to create an AI model for security control in the AI model platform 100 of the present invention, data Provides a UI for setting at least one of a specific search condition of the collection module 110, feature information of the feature extraction module 120, a normalization method of the normalization module 130, and a condition of the data output module 140.
이에, UI모듈(170)은, 제공한 UI를 기반으로 수집/인공지능 기능과 관련된 각종 설정, 구체적으로 후술의 생성할 인공지능 모델을 위한 데이터수집모듈(110)의 특정 검색 조건, 특징추출모듈(120)의 특징정보, 정규화모듈(130)의 정규화 방식, 데이터출력모듈(140)의 조건 등을 사용자정보/설정정보 저장소에 저장/관리하게 된다.Accordingly, the UI module 170, based on the provided UI, various settings related to the collection / artificial intelligence function, specifically the specific search condition of the data collection module 110 for the artificial intelligence model to be generated later, feature extraction module The feature information of 120, the normalization method of the normalization module 130, the conditions of the data output module 140, etc. are stored / managed in the user information / setting information storage.
데이터수집모듈(110)은, 원천 보안데이터로부터 특정 검색 조건 즉 앞서 사용자에 의해 기 설정된 특정 검색 조건에 의해 학습/테스트 데이터로 사용하고자 하는 보안이벤트를 수집한다.The data collection module 110 collects security events to be used as learning / testing data based on specific search conditions, that is, predetermined search conditions previously set by the user, from the source security data.
예를 들어, 데이터수집모듈(110)의 특정 검색 조건으로서, 학습/테스트 데이터로 사용하고자 하는 일자(또는 기간), 건수, IP, 탐지패턴명, 탐지패턴유형 등이 설정될 수 있다. For example, as a specific search condition of the data collection module 110, a date (or period) to be used as learning / test data, number of cases, IP, detection pattern name, detection pattern type, and the like may be set.
여기서, 탐지패턴명이란, 침해위협관리시스템(TMS)에서 탐지되는 보안로그들의 대표 명칭을 의미하고, 탐지패턴유형이란, 유사한 탐지패턴 특징(성질, 유형)을 갖는 탐지패턴끼리 묶은 일종의 그룹을 의미하며, 예를 들면 탐지패턴유형은 웜 바이러스 피해, 자료훼손 및 유출, 경유지 악용, 홈페이지 변조, 서비스거부공격 피해, 단순침입시도의 6가지로 구분될 수 있다.Here, the detection pattern name means a representative name of security logs detected by the intrusion threat management system (TMS), and the detection pattern type means a group of detection patterns having similar detection pattern characteristics (property, type). For example, the detection pattern type can be divided into six types: worm virus damage, data corruption and leakage, waypoint abuse, homepage alteration, service rejection attack damage, and simple intrusion attempt.
이에, 데이터수집모듈(110)은, 특정 검색 조건이 일자(또는 기간)인 경우, 원천 보안데이터로부터 설정된 일자(또는 기간)에 속하는 보안이벤트를 수집할 수 있다.Accordingly, when the specific search condition is a date (or period), the data collection module 110 may collect security events belonging to a set date (or period) from the original security data.
또는, 데이터수집모듈(110)은, 특정 검색 조건이 건수인 경우, 원천 보안데이터로부터 지정된 시점에서 설정된 건수(예: 500,000건)의 보안이벤트를 수집할 수 있다.Alternatively, if the specific search condition is the number, the data collection module 110 may collect the security events of the number (for example, 500,000) set at the specified time from the original security data.
또는, 데이터수집모듈(110)은, 특정 검색 조건이 IP인 경우, 원천 보안데이터로부터 설정된 IP가 Source IP 또는 Destination IP와 일치하는 보안이벤트를 수집할 수 있다.Alternatively, when the specific search condition is IP, the data collection module 110 may collect a security event in which the IP set from the source security data matches the source IP or destination IP.
물론, 특정 검색 조건으로서, 일자(또는 기간), 건수, IP, 탐지패턴명, 탐지패턴유형 등의 조합이 설정될 수도 있다.Of course, as a specific search condition, a combination of date (or period), number of cases, IP, detection pattern name, and detection pattern type may be set.
이 경우 역시, 데이터수집모듈(110)은, 원천 보안데이터로부터 설정된 일자(또는 기간), 건수, IP, 탐지패턴명, 탐지패턴유형 등의 조합에 따른 보안이벤트를 수집할 수 있다.In this case, too, the data collection module 110 may collect security events according to a combination of date (or period), number, IP, detection pattern name, detection pattern type, etc. set from the source security data.
더 구체적으로, 데이터수집모듈(110)은, 전술과 같이 원천 보안데이터로부터 보안이벤트를 수집하는데 있어서, 시스템의 부하를 줄이기 위하여 동시 수행 가능한 최대 수집 건수가 한정될 수 있다.More specifically, the data collection module 110, in collecting security events from the original security data, as described above, may limit the maximum number of simultaneous executions to reduce the load on the system.
예를 들면, 원천 보안데이터로부터 설정된 일자(또는 기간)에 속하는 보안이벤트를 수집하는 경우, 설정된 일자(또는 기간)에 속하는 보안이벤트 수집 건의 총 수가 1000,000건이고, 동시 수행 가능한 최대 수집 건수가 500,000건이라고 가정할 수 있다. For example, when collecting security events belonging to a set date (or period) from the source security data, the total number of security event collection cases belonging to the set date (or period) is 1000,000, and the maximum number of concurrent collections It can be assumed that 500,000 cases.
이 경우, 데이터수집모듈(110)은, 금번 수집 건의 총 수가 동시 수행 가능한 최대 수집 건수를 초과하는 것으로 판단, 금번 수집 건의 총 개수 중 최대 수집 건수를 초과하는 수집 건을 큐(queue)에 저장한 후 순차적으로 진행할 수 있다.In this case, the data collection module 110 determines that the total number of collections this time exceeds the maximum number of collections that can be performed simultaneously, and stores the collections that exceed the maximum number of collections in the queue in a queue After that, you can proceed sequentially.
즉, 데이터수집모듈(110)은, 금번 수집 건의 총 개수 1000,000건 중 시간순서에 따라 최대 수집 건수 500,000건을 수집/진행하되, 최대 수집 건수 500,000건을 초과하는 수집 건 500,000건에 대해서는 큐(queue)에 저장한 후 순차적으로 수집/진행할 수 있다.That is, the data collection module 110 collects / progresses the maximum number of collections of 500,000 according to the time sequence among the total number of collections of 1000,000, but queues for 500,000 collections exceeding the maximum number of collections of 500,000 After storing in (queue), it can be collected / progressed sequentially.
이 경우, 데이터수집모듈(110)은, 큐에 저장한 후 진행하는 수집 건 500,000건의 경우, 원천 보안데이터에서 수집 건의 발생시점 이전 데이터에 대해서만 보안이벤트를 수집한다.In this case, the data collection module 110 collects security events only for the data prior to the occurrence of the collection case from the source security data in the case of 500,000 collection cases that proceed after being stored in the queue.
즉, 금번 수집 건의 총 개수 1000,000건 중 큐에 저장한 후 진행하는 수집 건 500,000건의 경우는, 수집 건의 발생시점과 실제 수집/진행된 시점 간의 차이가 발생하므로, 이로 인한 보안이벤트 수집 오류를 방지하기 위해 원천 보안데이터에서 수집 건의 발생시점 이전 데이터에서만 보안이벤트를 수집하는 것이다.That is, in the case of 500,000 collection cases that are processed after being stored in the queue among the total number of collection cases of 1000,000, there is a difference between when the collection cases occurred and the actual collection / progress time, thus preventing security event collection errors. In order to do this, the security event is collected only from the data prior to the point of occurrence of the collection in the original security data.
한편, 앞서 본 발명의 인공지능 모델 플랫폼(100)은, 빅데이터 통합저장 스토리지로부터 신규 생성된 원천 보안데이터를 주기적으로 수집하는 검색엔진을 포함한다고 언급한 바 있다.On the other hand, the artificial intelligence model platform 100 of the present invention was previously mentioned that it includes a search engine that periodically collects the newly generated source security data from the big data integrated storage storage.
이 경우 데이터수집모듈(110)는, 검색엔진(데이터 저장소) 내 원천 보안데이터에서 보안데이터를 수집할 수 있다.In this case, the data collection module 110 may collect security data from source security data in a search engine (data store).
빅데이터 통합저장 스토리지는 본 발명의 인공지능 모델 플랫폼(100) 뿐만 아니라 다른 시스템에서도 활용하는 저장소이기 때문에, 빅데이터 통합저장 스토리지로부터 대량의 데이터(보안이벤트)를 수집할 경우 빅데이터 통합저장 스토리지에 부하가 생겨 다른 시스템에도 영향을 미칠 수 있다.Since the big data integrated storage storage is a storage utilized not only in the AI model platform 100 of the present invention, but also in other systems, when a large amount of data (security events) is collected from the big data integrated storage storage, the big data integrated storage storage Loads can also affect other systems.
하지만, 본 발명(데이터수집모듈(110))은, 데이터수집모듈(110)가 빅데이터 통합저장 스토리지로부터 직접 보안이벤트를 수집하지 않고, 빅데이터 통합저장 스토리지로부터 신규 생성된 원천 보안데이터 만을 주기적으로 수집하는 검색엔진을 기반으로 보안이벤트를 수집하기 때문에, 전술의 빅데이터 통합저장 스토리지 부하 문제를 회피할 수 있다.However, in the present invention (the data collection module 110), the data collection module 110 does not collect security events directly from the big data integrated storage storage, but periodically only the source security data newly generated from the big data integrated storage storage. Since security events are collected based on the collected search engine, it is possible to avoid the big data integrated storage storage load problem described above.
특징추출모듈(120)은, 데이터수집모듈(110)에서 수집된 보안이벤트에 대하여 기 설정된 특징정보 즉 앞서 사용자에 의해 기 설정된 특징정보(Feature)를 추출한다.The feature extraction module 120 extracts pre-set feature information for the security event collected by the data collection module 110, that is, pre-set feature information by the user.
인공지능 모델 생성 시, 인공지능 알고리즘으로 데이터(보안이벤트)를 분류하기 위해서는 데이터(보안이벤트)가 어떤 특징으로 가지고 있는지 찾고 이를 벡터로 만들어야 하는데, 이러한 과정을 특징정보 추출 과정이라 한다.When creating an artificial intelligence model, in order to classify data (security events) with an artificial intelligence algorithm, it is necessary to find out what features the data (security events) have and make them into vectors. This process is called feature information extraction process.
특징추출모듈(120)은, 데이터수집모듈(110)에서 수집된 보안이벤트에 대하여 특징정보 추출 과정을 수행하는 역할을 담당하는 것이다.The feature extraction module 120 is responsible for performing a feature information extraction process for security events collected by the data collection module 110.
그리고, 특징추출모듈(120)에 의해 추출된 각 보안이벤트의 특징정보는, 후술의 인공지능 모델 생성 시 기계학습(예: Deep Learning)에 사용될 것이다.Then, the feature information of each security event extracted by the feature extraction module 120 will be used for machine learning (eg, deep learning) when creating an artificial intelligence model described later.
특히, 본 발명에서는, 사용자가 특징정보로서, 단일 특징을 설정할 수 있고 복합 특징을 설정할 수 있도록 한다.Particularly, in the present invention, the user can set a single feature as feature information and set a composite feature.
여기서, 단일 특징이란, 하나의 보안이벤트에서 추출할 수 있는 특징들을 의미한다.Here, the single feature means features that can be extracted from one security event.
예를 들면, 탐지시간, Source IP, Source port, Destination IP, Destination port, 프로토콜, 보안이벤트명, 보안이벤트 타입, 공격횟수, 공격방향, 패킷사이즈, 자동분석 결과, 동적분석 결과, 기관번호, 점보페이로드 여부, 페이로드, word2vec 변환 방식을 적용한 페이로드 등이, 단일 특징에 속할 수 있다.For example, detection time, source IP, source port, destination IP, destination port, protocol, security event name, security event type, number of attacks, attack direction, packet size, automatic analysis result, dynamic analysis result, organization number, jumbo Whether it is a payload, a payload using a word2vec conversion method, or the like may belong to a single feature.
참고로, Word2Vec을 통한 페이로드 변환 방식은, 단어를 벡터로 변환하는 방식으로서, 주변 단어들 간의 관계를 통해 해당 단어의 벡터를 결정하는 방식이다. 일반적인 문장은 띄어쓰기 기준으로 단어를 구별할 수 있지만, 페이로드는 의미 단위로 구분하기가 매우 어려우며 다량의 특수문자들이 포함되어 있기 때문에 word2vec을 적용하기 위해서는 사전 처리가 필요하다. For reference, the payload conversion method using Word2Vec is a method of converting a word into a vector, and is a method of determining a vector of a corresponding word through a relationship between adjacent words. In normal sentences, words can be distinguished on a space-by-space basis, but payload is very difficult to distinguish in semantic units and contains a lot of special characters, so pre-processing is required to apply word2vec.
본 발명에서는, word2vec을 적용하기 위한 사전 처리로서, 다음의 4단계를 수행할 수 있다.In the present invention, as a pre-process for applying word2vec, the following four steps can be performed.
1) 16진수로 인코딩된 문자열을 아스키 문자열로 변환(아스키 코드값 (32~127) 이외에는 공백으로 변환)1) Convert the hexadecimal-encoded string to ASCII string (convert to space except ASCII code value (32 ~ 127))
2) url encoding된 부분 처리(%25 -> ‘%’, %26 -> ‘&’, %2A -> ‘*’ ...) 2) url encoded part processing (% 25-> '%',% 26-> '&',% 2A-> '*' ...)
3) '@’, ‘\’, ‘-’, ‘:’, ‘%’, ‘_’, ‘.’, ‘!’, ‘/’, ‘`’를 제외한 특수기호들을 공백으로 치환하고 모든 대문자를 소문자로 치환3) Special symbols except '@', '\', '-', ':', '%', '_', '.', '!', '/', And '`' are replaced with spaces. Replace all uppercase letters with lowercase letters
4) 한 글자로 구성된 단어를 제외하고 word2vec알고리즘 적용4) Apply word2vec algorithm except for one word
한편, 복합 특징이란, 여러 보안이벤트 간의 집계, 통계적 기법들을 활용하여 추출할 수 있는 하나의 특징을 의미한다.On the other hand, the composite feature means a feature that can be extracted by using aggregate and statistical techniques between various security events.
예를 들면, 기간 또는 건수 등의 기준으로 보안이벤트 그룹을 형성하고, 그룹 내 연산(예: 집계, 통계적 기법 등)을 통해 추출할 수 있는 하나의 특징(예: 연산 결과값)이, 복합 특징에 속할 수 있다.For example, a security event group is formed based on a period or the number of cases, and one feature (eg, a result of an operation) that can be extracted through intra-group operations (eg, aggregation, statistical technique, etc.) is a complex feature. Can belong to.
예를 들어, 기간(8.22~9.3)을 기준을 다음의 표 1과 같은 보안이벤트 그룹을 형성한다고 가정한다.For example, it is assumed that a security event group as shown in Table 1 below is formed based on a period (8.22 to 9.3).
Figure PCTKR2018015476-appb-T000001
Figure PCTKR2018015476-appb-T000001
보안이벤트 그룹 내 연산(예: Source IP, Destination IP, 보안이벤트 명이 100.100.100.100/111.111.111.11/AAA인 보안이벤트의 개수)을 통해 추출할 수 있는 하나의 특징(예: 4개)이, 복합 특징에 속할 수 있다.이에, 특징추출모듈(120)은, 데이터수집모듈(110)에서 수집된 보안이벤트에 대하여, 기 설정된 특징정보(단일 특징 및/또는 복합 특징)를 추출할 수 있다.One feature (e.g. four) that can be extracted through operations in the security event group (e.g., source IP, destination IP, number of security events with the security event name of 100.100.100.100/111.111.111.11/AAA), is compounded The feature extraction module 120 may extract pre-set feature information (single feature and / or composite feature) with respect to the security event collected by the data collection module 110.
정규화모듈(130)은, 보안이벤트의 추출된 특징정보에 대하여 기 설정된 정규화를 수행한다.The normalization module 130 performs predetermined normalization on the extracted feature information of the security event.
정규화는 추출된 특징들의 값의 범위를 일정하게 맞춰주는 과정을 말한다. 필드(field) A가 50~100, 필드 B가 0~100의 범위를 가진다면 똑같은 50이라도 서로 다른 척도에 의해서 측정된 값이기 때문에 그 의미는 상이하다. 따라서, 서로 다른 필드의 값들을 공통 척도로 조정하여 일정한 의미를 갖도록 하는 과정이 필요하고 이를 정규화라 한다.Normalization refers to the process of consistently matching the range of values of the extracted features. If field A has a range of 50 to 100 and field B has a range of 0 to 100, the meaning is different because even the same 50 is a value measured by different scales. Therefore, it is necessary to adjust the values of different fields to a common scale to have a certain meaning and this is called normalization.
정규화모듈(130)은, 보안이벤트의 추출된 특징정보에 대하여, 기 설정된 정규화 방식에 따라서 서로 다른 필드의 값들을 공통 척도로 조정하여 일정한 의미를 갖도록 하는 정규화를 수행하게 된다.The normalization module 130 performs normalization on the extracted feature information of the security event to adjust the values of different fields to a common scale according to a preset normalization method to have a certain meaning.
이때, 기 설정된 정규화 방식은, 앞서 사용자에 의해 기 설정된 정규화 방식을 의미한다.At this time, the preset normalization scheme means a normalization scheme preset by the user.
본 발명의 인공지능 모델 플랫폼(100)에서는, 다음의 3가지 정규화 방식을 제공하여 사용자로 하여금 기 설정할 수 있도록 한다.In the artificial intelligence model platform 100 of the present invention, the following three normalization methods are provided to allow a user to pre-set.
수학식 1은 Feature scaling [a,b] 정규화 방식을 의미하며, 수학식 2는 Mean normalization [-1,1] 정규화 방식, 수학식 3은 Standard score 정규화 방식을 의미한다.Equation 1 means Feature scaling [a, b] normalization, Equation 2 means Mean normalization [-1,1] normalization, and Equation 3 means Standard score normalization.
Figure PCTKR2018015476-appb-M000001
Figure PCTKR2018015476-appb-M000001
Figure PCTKR2018015476-appb-I000001
Figure PCTKR2018015476-appb-I000001
Figure PCTKR2018015476-appb-M000002
Figure PCTKR2018015476-appb-M000002
Figure PCTKR2018015476-appb-I000002
Figure PCTKR2018015476-appb-I000002
Figure PCTKR2018015476-appb-M000003
Figure PCTKR2018015476-appb-M000003
Figure PCTKR2018015476-appb-I000003
Figure PCTKR2018015476-appb-I000003
정규화모듈(130)은, 보안이벤트의 추출된 특징정보에 대하여, 전술의 3가지 정규화 방식 중 사용자에 의해 기 설정된 정규화 방식에 따라 정규화를 수행하게 된다.The normalization module 130 performs normalization on the extracted feature information of the security event according to the normalization method preset by the user among the three normalization methods described above.
데이터출력모듈(140)은, 특정정보 정규화가 완료된 보안이벤트에서 학습 데이터 또는 테스트 데이터를 주어진 조건 즉 앞서 사용자에 의해 기 설정된(주어진) 조건에 의해 추출한다.The data output module 140 extracts training data or test data from a security event in which the normalization of specific information is completed, based on a given condition, that is, a preset (given) condition by the user.
구체적으로, 데이터출력모듈(140)은, 특정정보 정규화가 완료된 보안이벤트를, 사용자가 원하는 값, 순서, 포맷, 학습/테스트 데이터 비율, 파일분할방식 등에 따라 화면 또는 파일로 출력하게 된다. Specifically, the data output module 140 outputs the security event for which the specific information is normalized, to a screen or a file according to a user's desired value, order, format, learning / test data ratio, and file division method.
이처럼 출력된 학습 데이터 또는 테스트 데이터는, 인공지능 모델 생성 시 즉시 활용할 수 있도록 날짜, 사용자 별로 Database 또는 파일 저장소를 통해 관리한다.The output training data or test data is managed through database or file storage for each date and user so that they can be used immediately when creating an artificial intelligence model.
모델생성모듈(150)은, 데이터출력모듈(140)에서 출력/파일 저장소에 관리되는 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델을 생성한다.The model generation module 150 applies an artificial intelligence algorithm to learning data managed in the output / file storage in the data output module 140 to generate an artificial intelligence model for security control.
즉, 모델생성모듈(150)은, 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델, 예컨대 사용자에 의해 요구되는 기능의 인공지능 모델을 생성할 수 있다.That is, the model generation module 150 may apply an artificial intelligence algorithm to the learning data, and generate an artificial intelligence model for security control, for example, an artificial intelligence model of a function required by a user.
예를 들면, 모델생성모듈(150)은, 사용자 요구에 따라, 보안이벤트의 악성 여부를 탐지하기 위한 인공지능 탐지모델을 생성할 수 있고, 보안이벤트의 정탐/오탐을 분류하기 위한 인공지능 분류모델을 생성할 수도 있다.For example, the model generation module 150 may generate an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user request, and an artificial intelligence classification model for classifying spying / falsification of a security event. You can also create
구체적으로, 모델생성모듈(150)은, 데이터출력모듈(140)에서 출력/파일 저장소에 관리되는 학습 데이터를 기반으로, 인공지능 알고리즘 예컨대 사용자에 의해 기 선택된 기계학습(예: Deep Learning) 알고리즘에 따라, 보안관제를 위한 인공지능 모델을 생성할 수 있다.Specifically, the model generation module 150, based on the learning data managed in the output / file storage in the data output module 140, to an artificial intelligence algorithm, such as a machine learning (eg, Deep Learning) algorithm previously selected by the user. Accordingly, an artificial intelligence model for security control can be generated.
예를 들면, 모델생성모듈(150)은, Backward Propagation(오차역전파법) 계산 기반의 기계학습 기술에서 모델을 통해 예측되는 결과값과 실제 결과값 간의 편차를 나타내는 학습손실함수(Loss function)을 이용하여, 학습 데이터를 기반으로 학습손실함수(Loss function)의 편차가 0이 되는 인공지능 모델을 생성할 수 있다.For example, the model generation module 150 uses a learning function (Loss function) representing a deviation between a predicted result and an actual result through a model in a machine learning technique based on Backward Propagation calculation. Accordingly, an artificial intelligence model in which the deviation of the loss function is zero based on the learning data can be generated.
이상에서 설명한 바와 같이, 본 발명의 인공지능 모델 플랫폼(100)에 따르면, 별도의 프로그래밍 없이 UI를 기반으로 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 플랫폼 환경을 제공함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 자신의 목적 및 요구 사항에 맞는 인공지능 모델을 생성할 수 있도록 한다.As described above, according to the artificial intelligence model platform 100 of the present invention, by providing a platform environment that enables to create an artificial intelligence model for security control based on the UI without any programming, the security control technology Even unfamiliar general users can create artificial intelligence models suitable for their purposes and requirements for security control.
더 나아가, 본 발명의 인공지능 모델 플랫폼(100)에서 성능관리모듈(160)은, 데이터출력모듈(140)에서 출력/파일 저장소에 관리되는 테스트 데이터를 활용하여, 전술의 생성한 인공지능 모델의 정확도를 테스트한다.Furthermore, in the artificial intelligence model platform 100 of the present invention, the performance management module 160 utilizes test data managed in the output / file storage in the data output module 140, of the generated artificial intelligence model. Test accuracy.
성능관리모듈(160)은, 모델생성모듈(150)에 의해 생성된 인공지능 모델을 관리하기 위한 것으로서, ‘누가’ ‘언제’ ‘어떤 데이터’ ‘어떤 필드’ ‘어떤 샘플링 방식’ ‘어떤 정규화 방식’ ‘어떤 모델’을 이용하여 인공지능 모델을 만든 것인지, 또한 생성된 인공지능 모델이 어느 정도의 성능(정답률)을 갖는지 등의 성능 정보를 시스템(파일저장소)에 기록 및 관리한다. The performance management module 160 is for managing the artificial intelligence model generated by the model generation module 150, 'who' 'when' 'some data' 'some field' 'some sampling method' 'some normalization method 'Records and manages performance information on the system (file storage), such as' what model' the artificial intelligence model was created for, and how much performance (correct answer rate) the created artificial intelligence model has.
그리고, 성능관리모듈(160)은, 이러한 성능 정보 관리를 기반으로, 모델 생성을 위한 조건들과 성능을 한눈에 비교할 수 있어 조건들과 성능의 상관 관계를 쉽게 파악할 수 있도록 한다.In addition, the performance management module 160 can compare conditions and performance for model generation at a glance based on such performance information management, so that it is easy to grasp the correlation between conditions and performance.
본 발명에서는, 보안관제 기술에 익숙하지 않은 일반 사용자도 인공지능 모델을 생성할 수 있도록 하는 플랫폼 환경을 제공하고 있다는 점에서, 본 발명의 플랫폼 환경에서 생성된 인공지능 모델의 정확도(성능) 테스트는 필수적일 수도 있다. In the present invention, the accuracy (performance) test of the artificial intelligence model generated in the platform environment of the present invention is provided by providing a platform environment that allows an ordinary user who is not familiar with security control technology to generate an artificial intelligence model. It may be necessary.
구체적으로, 성능관리모듈(160)은, 데이터출력모듈(140)에서 출력/파일 저장소에 관리되는 테스트 데이터(정탐/오탐 분류 및 악성 여부 탐지의 실제 결과값을 알고 있는 보안이벤트)를 활용하여, 전술의 생성한 인공지능 모델의 정확도를 테스트한다.Specifically, the performance management module 160 utilizes test data (security events that know the actual result of detection and detection of malicious or false positives) managed in the output / file storage in the data output module 140, Test the accuracy of the AI model created above.
예를 들어, 성능관리모듈(160)은, 테스트 데이터를 활용하여 전술의 생성한 인공지능 모델을 테스트하여, 모델을 통해 예측되는 결과값과 알고 있는 실제 결과값의 일치 비율을 모델의 정확도(성능) 즉 테스트 결과로서 출력할 수 있다. For example, the performance management module 160 uses the test data to test the artificial intelligence model generated above, and the accuracy of the model (performance) ) That is, it can be output as a test result.
인공지능 모델을 생성하기 위해서는, 어떠한 특징(Feature)들을 사용하는지 그리고 어떤 정규화 방식을 적용하는지가 모델 성능(정확도)에 큰 영향을 미친다.In order to create an artificial intelligence model, which features are used and which normalization method is applied have a great influence on model performance (accuracy).
헌데, 사람 특히 보안관제 기술에 익숙하지 않은 일반 사용자가 자신이 원하는 인공지능 모델을 생성하는데 최적 성능을 낼 수 있는 특징정보(Feature)를 조합/설정하는 것은 어려울 것이다.However, it will be difficult for a general user who is unfamiliar with people, especially security control technology, to combine / set feature information that can provide optimal performance in generating an artificial intelligence model desired by the user.
이에, 본 발명에서 특징추출모듈(120)은, 성능관리모듈(160)의 정확도 테스트 결과를 근거로, 전술의 생성한 인공지능 모델의 정확도를 높이도록 특징정보(Feature)에 대한 변경을 추천할 수 있다.Thus, in the present invention, the feature extraction module 120 recommends a change to the feature information (Feature) to increase the accuracy of the above-described artificial intelligence model, based on the accuracy test result of the performance management module 160 Can be.
사람 특히 보안관제 기술에 익숙하지 않은 일반 사용자가 자신이 원하는 인공지능 모델을 생성하는데 최적 성능을 낼 수 있는 정규화 방식을 알고 설정하는 것 역시 어려울 것이다.It will also be difficult for people, especially ordinary users, unfamiliar with security control technology, to know and set up a normalization method that can achieve optimal performance in generating the artificial intelligence model they want.
또한, 본 발명에서 정규화모듈(130)은, 인공지능 모델의 정확도를 높이도록 정규화에 대한 정규화 방식 변경을 추천할 수 있다.In addition, in the present invention, the normalization module 130 may recommend changing the normalization method for normalization to increase the accuracy of the artificial intelligence model.
이하에서는, 도 3을 참조하여, 인공지능 모델의 정확도를 높이도록 특징정보(Feature) 변경을 추천하는 기술, 구체적으로 그 기술을 실현하는 특징정보 추천 장치에 대하여 설명하겠다.Hereinafter, a technique for recommending a feature change to increase the accuracy of the artificial intelligence model will be described with reference to FIG. 3, and specifically, a feature information recommendation device for realizing the technique.
도 3은, 본 발명의 일 실시예에 따른 특징정보 추천 장치의 구성을 도시하고 있다.3 is a block diagram of a feature information recommendation apparatus according to an embodiment of the present invention.
도 3에 도시된 바와 같이, 본 발명의 특징정보 추천 장치(200)는, 모델성능확인부(210), 조합성능확인부(220), 추천부(230)를 포함한다.As shown in FIG. 3, the feature information recommendation device 200 of the present invention includes a model performance confirmation unit 210, a combination performance confirmation unit 220, and a recommendation unit 230.
이러한 특징정보 추천 장치(200)의 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least a part of the configuration of the feature information recommendation device 200 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.
여기서, 소프트웨어 모듈이란, 예컨대, 특징정보 추천 장치(200) 내에서 연산을 제어하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 특징정보 추천 장치(200) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module may be understood as, for example, an instruction executed by a processor that controls an operation within the feature information recommendation apparatus 200, and these instructions may include a form mounted in the memory in the feature information recommendation apparatus 200. Will have.
결국, 본 발명의 일 실시예에 따른 특징정보 추천 장치(200)는 전술한 구성을 통해, 본 발명에서 제안하는 기술 즉 인공지능 모델의 정확도를 높이도록 특징정보(Feature) 변경을 추천하는 기술을 실현하며, 이하에서는 이를 실현하기 위한 특징정보 추천 장치(200) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.After all, the feature information recommendation apparatus 200 according to an embodiment of the present invention, through the above-described configuration, the technology proposed in the present invention, that is, the technology for recommending feature information (Feature) change to increase the accuracy of the artificial intelligence model In the following, each configuration in the feature information recommendation device 200 for realizing this will be described in more detail.
모델성능확인부(210)는, 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 기 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델에 대하여, 모델 성능을 확인한다.The model performance checking unit 210 checks model performance with respect to the artificial intelligence model generated based on learning the preset feature information among all the feature information that can be set when the artificial intelligence model is generated.
즉, 모델성능확인부(210)는, 사용자에 의해 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델의 성능(정확도)를 확인하는 것이다.That is, the model performance checking unit 210 checks the performance (accuracy) of the artificial intelligence model generated based on learning feature information set by the user.
구체적인 설명을 위해, 이하에서는, 본 발명의 인공지능 모델 플랫폼(100)에서 사용자에 의해 설정된 특징정보(이하, 사용자 설정 특징정보)를 학습/생성된 인공지능 모델을 가정하여 설명하겠다.For detailed description, hereinafter, it will be described on the assumption that the artificial intelligence model platform learning / generated feature information (hereinafter, user set feature information) set by the user in the AI model platform 100 of the present invention.
모델성능확인부(210)는, 전술과 같이 인공지능 모델 플랫폼(100)에서 사용자 설정 특징정보를 학습하여 생성된 인공지능 모델에 대하여, 모델 성능을 확인한다.The model performance checking unit 210 checks the model performance with respect to the artificial intelligence model generated by learning the user set feature information in the artificial intelligence model platform 100 as described above.
예를 들면, 모델성능확인부(210)는, 인공지능 모델에 대하여, 본 발명의 인공지능 모델 플랫폼(100, 특히 데이터출력모듈(140))에서 출력되는 테스트 데이터(정탐/오탐 분류 및 악성 여부 탐지의 실제 결과값을 알고 있는 보안이벤트)를 활용하여, 모델 성능(정확도)을 테스트/확인할 수 있다.For example, the model performance checking unit 210, with respect to the artificial intelligence model, test data output from the artificial intelligence model platform 100 (especially the data output module 140) of the present invention (sense / false classification and malignantness) Model performance (accuracy) can be tested / verified by utilizing the security event (which knows the actual result of detection).
이에 모델성능확인부(210)는, 본 발명의 인공지능 모델 플랫폼(100, 특히 데이터출력모듈(140))에서 생성되는 인공지능 모델을 대상으로, 테스트 데이터를 활용하여 인공지능 모델을 테스트함으로써, 모델을 통해 예측되는 결과값과 알고 있는 실제 결과값의 일치 비율을 모델의 정확도(성능) 즉 테스트 결과로서 출력할 수 있다. Accordingly, the model performance checking unit 210 targets the artificial intelligence model generated in the artificial intelligence model platform 100 (especially the data output module 140) of the present invention, and tests the artificial intelligence model using test data, The matching ratio between the predicted result value and the known actual result value can be output as the model's accuracy (performance), that is, the test result.
조합성능확인부(220)는, 전체 특징정보에서 다수의 특징정보 조합을 설정하여, 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인한다.The combination performance checking unit 220 sets a plurality of feature information combinations from the whole feature information, and checks the performance of the artificial intelligence model generated based on learning for each of the multiple feature information combinations.
구체적으로, 조합성능확인부(220)는, 인공지능 모델 생성 시 설정 가능한 전체 특징정보에서, 금번 인공지능 모델 생성 시 학습된 사용자 설정 특징정보 외 다양한 특징정보 조합을 설정하여 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인할 수 있다.Specifically, the combination performance checking unit 220 sets a variety of feature information combinations by setting various feature information combinations in addition to user-set feature information learned at the time of creation of the AI model, from all feature information that can be set when the AI model is generated. You can check the performance of the artificial intelligence model generated based on learning.
추천부(230)는, 조합성능확인부(220)에서 확인한 다수의 특징정보 조합 별 성능 중에서, 모델성능확인부(210)에서 확인한 모델 성능 즉 금번 사용자 설정을 기반으로 생성된 인공지능 모델의 성능 보다 높은 성능의 특정 특징정보 조합을 추천할 수 있다.The recommendation unit 230, among a plurality of feature information combination-specific performances confirmed by the combination performance confirmation unit 220, model performances confirmed by the model performance confirmation unit 210, that is, performances of the artificial intelligence model generated based on the user setting this time Higher performance specific feature information combinations can be recommended.
이하에서는, 특정 특징정보 조합을 추천하는 구체적인 실시예들을 설명하겠다.Hereinafter, specific embodiments for recommending specific feature information combinations will be described.
일 실시예에 따르면, 조합성능확인부(220)에 의해 설정되는 다수의 특징정보 조합은, 금번 인공지능 모델 생성 시 학습된 사용자 설정 특징정보에, 전체 특징정보에서 사용자 설정 특징정보를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가한 조합일 수 있다.According to one embodiment, the combination of the plurality of feature information set by the combination performance checking unit 220 is specified in the user set feature information learned when the artificial intelligence model is generated, except for the user set feature information in the whole feature information. It may be a combination of at least one piece of information sequentially added.
이하에서는, 전체 특징정보(예: a,b,c...,z(n=26)) 중 금번 인공지능 모델 생성 시 학습된 사용자 설정 특징정보(예: a,b,c,d,e,f(k=6))를 가정하여 설명하겠다. 그리고 이 경우, 모델성능확인부(210)에서 확인한 인공지능 모델 성능(mk)이 85%라고 가정한다.Hereinafter, user-set feature information (e.g., a, b, c, d, e) learned at the time of creation of this AI model among all feature information (e.g., a, b, c ..., z (n = 26)) , f (k = 6)). In this case, it is assumed that the artificial intelligence model performance (m k ) checked by the model performance verification unit 210 is 85%.
이에, 조합성능확인부(220)는, 사용자 설정 특징정보(a,b,c,d,e,f)에 전체 특징정보(n) 중 사용자 설정 특징정보(a,b,c,d,e,f)를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가하여, 다수의 특징정보 조합을 설정할 수 있다. Accordingly, the combination performance checking unit 220 includes user-set feature information (a, b, c, d, e) out of all feature information (n) in user-set feature information (a, b, c, d, e, f). A plurality of feature information combinations may be set by sequentially adding at least one of the specific information other than, f).
예를 들면, 조합성능확인부(220)는, 사용자가 설정한 사용자 설정 특징정보(a,b,c,d,e,f)에, 전체 특징정보(n) 중 사용자 설정 특징정보(a,b,c,d,e,f)를 제외한 나머지 특정정보 중 1~(n-k)개의 특징정보를 순차적으로 추가하여, 다음과 같은 다수의 특징정보 조합을 설정할 수 있다.For example, the combination performance checking unit 220, the user-set feature information (a, b, c, d, e, f) set by the user, the user-set feature information (a, out of all the feature information n) Among the remaining specific information except b, c, d, e, f), 1 ~ (nk) feature information can be sequentially added to set a plurality of feature information combinations as follows.
a,b,c,d,e,f,g -> m(k+1) 1 -> 82%a, b, c, d, e, f, g-> m (k + 1) 1- > 82%
a,b,c,d,e,f,h -> m(k+1) 2 -> 80%a, b, c, d, e, f, h-> m (k + 1) 2- > 80%
......
a,b,c,d,e,f,g,h,i -> m(k+3) 1 -> 88%a, b, c, d, e, f, g, h, i-> m (k + 3) 1- > 88%
......
a,b,c,d,e,f,...,z -> m(n) -> 85%a, b, c, d, e, f, ..., z-> m (n) -> 85%
그리고, 조합성능확인부(220)는, 전술과 같이 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능, 82%, 80%, ... 88%,...85%을 확인할 수 있다.And, the combination performance checking unit 220 performs the performance of the artificial intelligence model generated based on learning for each of a plurality of feature information combinations as described above, 82%, 80%, ... 88%, ... 85%. Can be confirmed.
이 경우, 추천부(230)는, 다수의 특징정보 조합 별 성능 중에서, 금번 사용자 설정을 기반으로 생성된 인공지능 모델의 성능(mk=85%) 보다 높은 성능을 갖는 상위 N개(예: 4개)를 특정 특징정보 조합으로서 선택/추천할 수 있다.In this case, the recommendation unit 230, among a plurality of performances for each feature information combination, the top N having higher performance than the performance (m k = 85%) of the artificial intelligence model generated based on this user setting (for example, 4) can be selected / recommended as a combination of specific feature information.
물론, 상위 N개는 시스템관리자 또는 사용자에 의해 지정/변경될 수 있는 개수이다.Of course, the top N are the number that can be specified / changed by the system administrator or user.
다른 예를 들면, 조합성능확인부(220)는, 사용자가 설정한 사용자 설정 특징정보(a,b,c,d,e,f)에, 전체 특징정보(n) 중 사용자 설정 특징정보(a,b,c,d,e,f)를 제외한 나머지 특정정보를 1개씩 순차적으로 추가하여, 다음과 같은 다수의 특징정보 조합을 설정할 수 있다.For another example, the combination performance checking unit 220 includes user-set feature information (a) among all feature information (n) in the user-set feature information (a, b, c, d, e, f) set by the user. A plurality of characteristic information combinations can be set as follows, by sequentially adding the remaining specific information one by one except for (b, c, d, e, f).
a,b,c,d,e,f,g -> m(k+1) 1 -> 82%a, b, c, d, e, f, g-> m (k + 1) 1- > 82%
a,b,c,d,e,f,h -> m(k+1) 2 -> 80%a, b, c, d, e, f, h-> m (k + 1) 2- > 80%
......
a,b,c,d,e,f,z -> m(k+1) ζ+1 -> 90%a, b, c, d, e, f, z-> m (k + 1) ζ + 1- > 90%
그리고, 조합성능확인부(220)는, 전술과 같이 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능, 82%, 80%, ...90%을 확인할 수 있다.Then, the combination performance checking unit 220 may check the performance, 82%, 80%, ... 90% of the artificial intelligence model generated based on learning for each combination of feature information as described above.
이 경우, 추천부(230)는, 다수의 특징정보 조합 별 성능 중에서, 금번 사용자 설정을 기반으로 생성된 인공지능 모델의 성능(mk=85%) 보다 높은 성능을 갖는 상위 N개(예: 3개)를 특정 특징정보 조합으로서 선택/추천할 수 있다.In this case, the recommendation unit 230, among a plurality of performances for each feature information combination, the top N having higher performance than the performance (m k = 85%) of the artificial intelligence model generated based on this user setting (for example, 3) can be selected / recommended as a combination of specific feature information.
물론, 상위 N개는 시스템관리자 또는 사용자에 의해 지정/변경될 수 있는 개수이다.Of course, the top N are the number that can be specified / changed by the system administrator or user.
한편, 다른 실시예에 따르면, 금번 인공지능 모델 생성 시 이용된 기 설정된 특정정보는 전체 특징정보(k=n=26)일 수 있다.Meanwhile, according to another embodiment, the predetermined specific information used when the artificial intelligence model is generated may be full feature information (k = n = 26).
이 경우, 조합성능확인부(220)는, 기 설정된 특징정보 즉 전체 특징정보(예: a,b,c...,z(n=26)) 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능을 확인하고, 각 단일 특징정보의 성능 중 최대 성능(Max(m1))이 모델 성능(m26) 보다 높은지 확인하는 단일특징정보 성능 비교과정을 수행할 수 있다.In this case, the combination performance checking unit 220 generates based on learning for each single feature information in the preset feature information, that is, all feature information (eg, a, b, c ..., z (n = 26)). The performance of a single feature information comparison process may be performed to check the performance of the artificial intelligence model and to determine whether the maximum performance (Max (m 1 )) of the performance of each single feature information is higher than the model performance (m 26 ).
조합성능확인부(220)는, 기 설정된 특징정보(a,b,c...,z(n=26))의 모델 성능(m26) 보다 단일 특징정보(예: c)의 최대 성능(Max(m1))이 높은 경우, 최대 성능의 단일 특징정보(c)를 특징정보로 재 설정하고, 특징정보(c)에 전체 특징정보(n)에서 특징정보(c)를 제외한 나머지 특정정보 중 하나씩 순차적으로 추가하여, 다수의 특징정보 조합을 설정하는 조합설정 과정을 수행할 수 있다.Combination performance checking unit 220, the maximum performance of a single feature information (e.g., c) than the model performance (m 26 ) of the predetermined feature information (a, b, c ..., z (n = 26)) When Max (m 1 )) is high, the maximum performance single feature information (c) is reset to feature information, and the remaining specific information except feature information (c) from the whole feature information (n) in the feature information (c) By adding one by one, it is possible to perform a combination setting process of setting a plurality of combinations of feature information.
이렇게 되면, 조합성능확인부(220)는, 전술과 마찬가지로 다음과 같은 다수의 특징정보 조합 별 성능을 확인할 수 있다.In this case, the combination performance checking unit 220 may check the performance for each combination of the plurality of feature information as described above.
c,a -> m2 1 -> 81%c, a-> m 2 1- > 81%
c,b -> m2 2 -> 90.5%c, b-> m 2 2- > 90.5%
......
c,z -> m2 25 -> 85%c, z-> m 2 25- > 85%
조합성능확인부(220)는, 다수의 특징정보 조합 중 재 설정한 특징정보(c)의 모델 성능(m1) 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 조합설정 과정이 반복 수행되도록 하는 재설정 과정을 수행할 수 있다. The combination performance checking unit 220 resets and resets each combination of feature information having a performance higher than the model performance (m 1 ) of the feature information (c) that is reset among a plurality of feature information combinations, and resets the feature information. For each feature information, a reset process may be performed so that the combination setting process is repeatedly performed.
즉, 조합성능확인부(220)는, 다수의 특징정보 조합 중 특징정보(c)의 모델 성능(m1) 보다 낮거나 같은 성능을 갖는 특징정보 조합을 삭제하고 특징정보(c)의 모델 성능(m1) 보다 높은 성능을 갖는 특징정보 조합 만을 다음과 같이 남기고, 이들 각각을 특징정보로 재 설정하여 다음의 표 2와 같이 재 설정한 각 특징정보에 대하여 조합설정 과정이 반복 수행되도록 하는 재설정 과정을 수행할 수 있다.That is, the combination performance checking unit 220 deletes feature information combinations having a performance equal to or lower than the model performance (m 1 ) of the feature information (c) among a plurality of feature information combinations, and performs model performance of the feature information (c). (m 1 ) Only the combination of feature information with higher performance is left as follows, and each of them is reset to feature information to reset the combination setting process repeatedly for each feature information reset as shown in Table 2 below. You can carry out the process.
c,l -> m2 12 -> 92.5%c, l-> m 2 12- > 92.5%
c,m -> m2 13 -> 93%c, m-> m 2 13- > 93%
c,n -> m2 14 -> 94%c, n-> m 2 14- > 94%
Figure PCTKR2018015476-appb-T000002
Figure PCTKR2018015476-appb-T000002
조합성능확인부(220)는, 전술의 조합설정 과정 및 재설정 과정을 반복하면서, 다수의 특징정보 조합 중 직전 특징정보를 기반으로 생성된 인공지능 모델의 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 특정 특징정보 조합으로서 선택하고 추천부(230)로 전달하는 과정을 수행한다.The combination performance checking unit 220 repeats the above-described combination setting process and resetting process, and among a plurality of feature information combinations, there is a feature information combination having a higher performance than the artificial intelligence model generated based on the previous feature information. If not, the process of selecting the previous feature information as a specific feature information combination and passing it to the recommender 230 is performed.
이 경우, 추천부(230)는, 다수의 특징정보 조합 별 성능 중에서, 조합성능확인부(220)로부터 전달되는 특징정보를 기 설정된 특징정보를 이용하여 생성된 인공지능 모델의 성능 보다 높은 성능을 갖는 특정 특징정보 조합으로서 추천할 수 있다.In this case, the recommendation unit 230 has higher performance than the performance of the artificial intelligence model generated by using the preset feature information from the feature information transmitted from the combination performance checking unit 220 among performances of a plurality of feature information combinations. It can be recommended as a combination of specific feature information.
한편, 조합성능확인부(220)는, 기 설정된 특징정보(a,b,c...,z(n=26))의 모델 성능(m26) 보다 단일 특징정보의 최대 성능(Max(m1))이 높지 않은 경우, 특징정보(a,b,c...,z(n=26))에서 서로 다른 하나의 특정정보를 제외하여, 다수의 특징정보 조합을 설정하는 조합설정 과정을 수행할 수 있다.On the other hand, the combination performance checking unit 220, the maximum performance (Max (m (m) of a single feature information) than the model performance (m 26 ) of the predetermined feature information (a, b, c ..., z (n = 26)) If 1 )) is not high, a combination setting process of setting a plurality of feature information combinations by excluding one specific information from feature information (a, b, c ..., z (n = 26)) It can be done.
이렇게 되면, 조합성능확인부(220)는, 전술과 마찬가지로 다음과 같은 다수의 특징정보 조합 별 성능을 확인할 수 있다.In this case, the combination performance checking unit 220 may check the performance for each combination of the plurality of feature information as described above.
b,c,d~z -> m25 1 -> 96%b, c, d ~ z-> m 25 1- > 96%
a,c,d~z -> m25 2 -> 95.6%a, c, d ~ z-> m 25 2- > 95.6%
......
a,b,c~y -> m25 25 -> 90%a, b, c ~ y-> m 25 25- > 90%
조합성능확인부(220)는, 다수의 특징정보 조합 중 모델 성능(m26) 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 조합설정 과정이 반복 수행되도록 하는 재설정 과정을 수행할 수 있다. The combination performance checking unit 220 resets each combination of feature information having higher performance than the model performance (m 26 ) among a plurality of feature information combinations as feature information, and the combination setting process is performed for each re-set feature information. A reset process may be performed to be repeatedly performed.
즉, 조합성능확인부(220)는, 다수의 특징정보 조합 중 모델 성능(m26) 보다 낮거나 같은 성능을 갖는 특징정보 조합을 삭제하고 모델 성능(m26) 보다 높은 성능을 갖는 특징정보 조합 만을 다음과 같이 남기고, 이들 각각을 특징정보로 재 설정하여 재 설정한 각 특징정보에 대하여 조합설정 과정이 반복 수행되도록 하는 재설정 과정을 수행할 수 있다.That is, the combination performance checking unit 220 deletes feature information combinations having a performance lower than or equal to model performance (m 26 ) among a plurality of feature information combinations, and combinations of feature information having higher performance than model performance (m 26 ). It is possible to perform a reset process in which the combination setting process is repeatedly performed for each of the characteristic information that is reset by resetting each of them as characteristic information, leaving only the following as follows.
b,c,d~z -> m25 1 -> 96%b, c, d ~ z-> m 25 1- > 96%
a,c,d~z -> m25 2 -> 95.6%a, c, d ~ z-> m 25 2- > 95.6%
a,b,d~y -> m25 3 -> 96%a, b, d ~ y-> m 25 3- > 96%
조합성능확인부(220)는, 전술의 조합설정 과정 및 재설정 과정을 반복하면서, 다수의 특징정보 조합 중 직전 특징정보를 기반으로 생성된 인공지능 모델의 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 특정 특징정보 조합으로서 선택하고 추천부(230)로 전달하는 과정을 수행한다.The combination performance checking unit 220 repeats the above-described combination setting process and resetting process, and among a plurality of feature information combinations, there is a feature information combination having a higher performance than the artificial intelligence model generated based on the previous feature information. If not, the process of selecting the previous feature information as a specific feature information combination and passing it to the recommender 230 is performed.
이 경우, 추천부(230)는, 다수의 특징정보 조합 별 성능 중에서, 조합성능확인부(220)로부터 전달되는 특징정보를 기 설정된 특징정보를 이용하여 생성된 인공지능 모델의 성능 보다 높은 성능을 갖는 특정 특징정보 조합으로서 추천할 수 있다.In this case, the recommendation unit 230 has higher performance than the performance of the artificial intelligence model generated by using the preset feature information from the feature information transmitted from the combination performance checking unit 220 among performances of a plurality of feature information combinations. It can be recommended as a combination of specific feature information.
이상, 본 발명에 따르면, 인공지능 모델 플랫폼(100)에서 제공하는 환경에서 UI를 기반으로 보안관제를 위한 인공지능 모델을 생성하는 사용자에게 최적의 성능(정확도)를 갖는 최적 특징(feature)를 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 한다. As described above, according to the present invention, in the environment provided by the artificial intelligence model platform 100, recommending an optimal feature having optimal performance (accuracy) to a user generating an artificial intelligence model for security control based on UI. / By making it applicable, even an average user who is not familiar with security control technology can create an optimal AI model for security control.
이하에서는, 도 4를 참조하여, 인공지능 모델의 정확도를 높이도록 정규화 방식 변경을 추천하는 기술, 구체적으로 그 기술을 실현하는 정규화 방식 추천 장치에 대하여 설명하겠다.Hereinafter, with reference to FIG. 4, a technique for recommending a normalization method change to increase the accuracy of an artificial intelligence model, and specifically, a normalization method recommendation apparatus for realizing the technique will be described.
도 4는, 본 발명의 일 실시예에 따른 정규화 방식 추천 장치의 구성을 도시하고 있다.4 illustrates a configuration of a normalization method recommendation apparatus according to an embodiment of the present invention.
도 4에 도시된 바와 같이, 본 발명의 정규화 방식 추천 장치(300)는, 속성확인부(310), 결정부(320), 추천부(330)를 포함한다.As shown in FIG. 4, the normalization method recommendation apparatus 300 of the present invention includes an attribute confirmation unit 310, a determination unit 320, and a recommendation unit 330.
이러한 정규화 방식 추천 장치(300)의 구성 전체 내지는 적어도 일부는 하드웨어 모듈 형태 또는 소프트웨어 모듈 형태로 구현되거나, 하드웨어 모듈과 소프트웨어 모듈이 조합된 형태로도 구현될 수 있다.The whole or at least part of the configuration of the normalization method recommendation device 300 may be implemented in the form of a hardware module or a software module, or a combination of a hardware module and a software module.
여기서, 소프트웨어 모듈이란, 예컨대, 정규화 방식 추천 장치(300) 내에서 연산을 제어하는 프로세서에 의해 실행되는 명령어로 이해될 수 있으며, 이러한 명령어는 정규화 방식 추천 장치(300) 내 메모리에 탑재된 형태를 가질 수 있을 것이다.Here, the software module may be understood as, for example, an instruction executed by a processor that controls an operation within the normalization method recommendation apparatus 300, and these instructions may include a form mounted in the memory in the normalization method recommendation apparatus 300. Will have.
결국, 본 발명의 일 실시예에 따른 정규화 방식 추천 장치(300)는 전술한 구성을 통해, 본 발명에서 제안하는 기술 즉 인공지능 모델의 정확도를 높이도록 정규화 방식 변경을 추천하는 기술을 실현하며, 이하에서는 이를 실현하기 위한 정규화 방식 추천 장치(300) 내 각 구성에 대해 보다 구체적으로 설명하기로 한다.In the end, the normalization method recommendation apparatus 300 according to an embodiment of the present invention realizes the technique proposed in the present invention, that is, the technique of recommending the normalization method change to increase the accuracy of the artificial intelligence model through the above-described configuration, Hereinafter, each configuration in the normalization method recommendation apparatus 300 for realizing this will be described in more detail.
속성확인부(310)는, 인공지능 모델 생성 시 학습에 이용되는 특징정보의 속성을 확인한다.The attribute checking unit 310 checks the attribute of feature information used for learning when the artificial intelligence model is generated.
여기서, 인공지능 모델 생성 시 학습에 이용되는 특징정보는, 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 UI를 기반으로 사용자에 의해 직접 설정되는 특징정보일 수 있고, 또는 전체 특징정보 중 추천되는 특정 특징정보 조합이 적용/설정되는 특징정보일 수도 있다.Here, the feature information used for learning when the AI model is generated may be feature information that is directly set by a user based on a UI among all the feature information that can be set when the AI model is generated, or a specific feature that is recommended among all feature information. The feature information combination may be feature information applied / set.
그리고, 특징정보의 속성은, 크게 숫자 속성과 카테고리 속성으로 구분될 수 있다.And, the attribute of the characteristic information can be largely divided into a number attribute and a category attribute.
즉, 속성확인부(310)는, 인공지능 모델 생성 시 학습에 이용되는 특징정보(직접 설정 또는 추천 적용)의 속성이, 숫자 속성인지 또는 카테고리 속성인지 또는 숫자 및 카테고리 조합 속성인지를 확인할 수 있다.That is, the attribute checking unit 310 may check whether the attribute of the feature information (direct setting or recommendation application) used for learning when the artificial intelligence model is generated is a numeric attribute, a category attribute, or a number and category combination attribute. .
결정부(320)는, 설정 가능한 전체 정규화 방식 중, 속성확인부(310)에서 확인한 특징정보의 속성에 따른 정규화 방식을 결정한다.The determination unit 320 determines a normalization method according to the attribute of the feature information checked by the attribute confirmation unit 310 among all the settable normalization methods.
구체적으로 설명하면, 결정부(320)는, 특징정보의 속성에 따른 정규화 방식을 결정하기에 앞서, 금번 특징정보 전체 필드에 동일한 정규화 방식이 적용되는지 또는 금번 특징정보 전체 필드에서 필드 별로 정규화 방식이 적용되는지를 먼저 구분할 수 있다.Specifically, before determining the normalization method according to the attribute of the feature information, the determination unit 320 determines whether the same normalization method is applied to all the feature information fields or the normalization method for each field in the whole feature information field. It can be distinguished first whether it is applied.
결정부(320)는, 금번 특징정보 전체 필드에 숫자 및/또는 카테고리 데이터만 존재하는 경우(단일 특징 case 포함), 금번 특징정보 전체 필드에 동일한 정규화 방식이 적용되는 것으로 구분할 수 있다. When only numeric and / or category data exists in the entire feature information field (including a single feature case), the determination unit 320 may classify that the same normalization method is applied to the entire feature information field.
이 경우, 결정부(320)는, 특징정보의 속성이 숫자 속성인 경우, 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정하고, 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정하고, 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 상기 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다. In this case, the determining unit 320 determines the first normalization method according to the entire numeric pattern of the feature information when the feature information attribute is a numeric attribute, and when the feature information attribute is a category attribute, the feature information is the whole of the feature information. If a second normalization method for expressing as a non-zero characteristic value is determined only at a location designated for each category of feature information in a vector defined by the number of categories, and if the attribute of the feature information is a number and category combination attribute, the second The normalization scheme and the first normalization scheme can be determined.
구체적으로, 제1 정규화 방식은, 기 정의된 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식을 포함한다(수학식 1,2,3 참조).Specifically, the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority (see Equations 1, 2, and 3).
결정부(320)는, 특징정보 전체 필드에 숫자 데이터만 존재하는 경우 특징정보의 속성이 숫자 속성인 것으로 구분하고, 이 경우 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정한다.The determination unit 320 classifies the attribute of the feature information as a numeric attribute when only numeric data exists in the entire feature information field, and determines the first normalization method according to the whole numeric pattern of the feature information.
이때, 결정부(320)는, 제1 정규화 방식 중 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식의 순서로 결정하되, 특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로, 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다.At this time, the determining unit 320 determines the standard score normalization method, the mean normalization normalization method, and the feature scaling normalization method according to the priority among the first normalization methods, but the standard deviation and normalization of the entire numeric pattern of the feature information Based on the existence of the upper / lower limit of the scaling range, the normalization method having the highest priority applicable among the first normalization methods may be determined.
또한, 결정부(320)는, 특징정보 전체 필드에 카테고리 데이터만 존재하는 경우 특징정보의 속성이 카테고리 속성인 것으로 구분하고, 이 경우 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정할 수 있다.In addition, when only category data exists in the entire feature information field, the determination unit 320 classifies the attribute of the feature information as a category attribute, and in this case, the feature information in a vector defined as the total number of categories of the feature information. A second normalization method that expresses a non-zero characteristic value only at a location designated by each category may be determined.
학습 데이터에 인공지능 알고리즘(예: 기계 학습)을 적용하여 인공지능 모델을 생성하기 위해서는, 데이터를 기계가 이해할 수 있는 수치 형태의 데이터로 변환해 주어야 하는데, 본 발명에서는 이러한 변환 방식(제2 정규화 방식)으로 One Hot Encoding을 채택할 수 있다.In order to generate an artificial intelligence model by applying an artificial intelligence algorithm (for example, machine learning) to training data, the data must be converted into numerical data that can be understood by the machine. In the present invention, this conversion method (second normalization) Method) One Hot Encoding can be adopted.
이에, 결정부(320)는, 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값(예: 1)으로 표현하는 제2 정규화 방식_One Hot Encoding을 결정할 수 있다.Accordingly, when the attribute of the feature information is a category attribute, the determination unit 320 has a non-zero characteristic value (eg, 1) in a location designated for each category of feature information in a vector defined as the total number of categories of the feature information. ) To determine the second normalization method _One Hot Encoding.
제2 정규화 방식_One Hot Encoding을 간단히 설명하면, 특징정보가 과일이라는 카테고리 속성을 가지며 사과, 배, 감(과일의 종류가 3개이므로 3차원 벡터로 표현)이 전체 카테고리 개수라고 가정한다. The second normalization method _One Hot Encoding, briefly, assumes that feature information has a category attribute of fruit, and that apples, pears, and persimmons (expressed as a three-dimensional vector because there are three kinds of fruits) are the total number of categories.
이때 사과, 배, 감 각각을 데이터로 가지는 각 특징정보는 제2 정규화 방식_One Hot Encoding에 따라 다음과 같이 표현될 수 있다.In this case, each feature information having apple, pear, and persimmon as data may be expressed as follows according to the second normalization method _One Hot Encoding.
사과 = {1, 0, 0}Apple = {1, 0, 0}
배 = {0, 1, 0}Times = {0, 1, 0}
감 = {0, 0, 1}Persimmon = {0, 0, 1}
또한, 결정부(320)는, 특징정보 전체 필드에 숫자 및 카테고리 데이터가 존재하는 경우 특징정보의 속성이 숫자 및 카테고리 조합 속성인 것으로 구분하고, 이 경우 전술의 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다. In addition, when the number and category data are present in the entire feature information field, the determination unit 320 classifies the attribute of the feature information as a numeric and category combination attribute, and in this case, the second normalization method and the first normalization method described above. Can decide.
즉, 결정부(320)는, 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 특징정보 내 카테고리 속성의 데이터에 대해서 먼저 전술의 제2 정규화 방식_One Hot Encoding이 적용된 후, 특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하기 위해서, 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다.That is, when the attribute of the feature information is a combination of numbers and categories, the determination unit 320 first applies the second normalization method _One Hot Encoding described above to the data of the category attribute in the feature information, and then the whole of the feature information. The second normalization method and the first normalization method may be determined in order to determine the highest priority normalization method applicable among the first normalization methods based on the existence of the upper and lower limit of the standard deviation and the normalization scaling range for the numeric pattern. .
한편, 결정부(320)는, 특징정보가 복합 특징(여러 보안이벤트 간의 집계, 통계적 기법들을 활용하여 추출할 수 있는 하나의 특징)인 경우, 금번 특징정보 전체 필드에서 필드 별로 정규화 방식 적용되는 것으로 구분할 수 있다. On the other hand, if the feature information is a composite feature (one feature that can be extracted using aggregation and statistical techniques between multiple security events), the feature information is applied to the normalization method for each field in the entire feature information field. Can be distinguished.
이 경우, 결정부(320)는, 특징정보에서 속성이 종류 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다.In this case, the determination unit 320 may determine a normalization method having the highest priority that can be applied among a means normalization method and a feature scaling normalization method for a field of attribute type attribute in the feature information.
또한, 결정부(320)는, 특징정보에서 속성이 개수 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다.In addition, the determination unit 320 may determine a normalization method having the highest priority that is applicable among a means normalization method and a feature scaling normalization method for a field of a number attribute whose attribute is in the feature information.
또한, 결정부(320)는, 특징정보에서 속성이 비율 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키도록 결정하거나 또는 Standard score 정규화 방식을 결정할 수 있다.In addition, the determining unit 320 may determine whether to normalize the normalization method for the attribute attribute field in the attribute information and exclude it from the normalization target, or determine the standard score normalization method.
또한, 결정부(320)는, 특징정보에서 속성이 존재 여부(예: 연산 결과값의 유/무)속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키도록 결정할 수 있다.In addition, the determination unit 320 may determine that the normalization method is not determined and excluded from the normalization target for the field of the attribute presence or absence (for example, presence / absence of an operation result value) in the feature information.
추천부(330)는, 결정부(320)에서 결정한 정규화 방식을 추천한다.The recommendation unit 330 recommends the normalization method determined by the determination unit 320.
이상, 본 발명에 따르면, 인공지능 모델 플랫폼(100)에서 제공하는 환경에서 UI를 기반으로 보안관제를 위한 인공지능 모델을 생성하는 사용자에게 최적의 성능(정확도)를 갖는 최적 정규화 방식을 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 한다. As described above, according to the present invention, in the environment provided by the AI model platform 100, the recommendation / applying the optimal normalization method with optimal performance (accuracy) to the user generating the AI model for security control based on the UI By doing so, even an average user who is not familiar with security control technology can create an optimal artificial intelligence model for security control.
이상에서 설명한 바와 같이, 본 발명에 의하면, 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현하되, 특히 인공지능 모델 성능에 직결되는 특징정보 및 정규화 방식을 최적으로 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현할 수 있다. As described above, according to the present invention, an artificial intelligence model platform that enables the creation of an artificial intelligence model for security control is implemented, but in particular, feature information and normalization methods directly related to the performance of the artificial intelligence model are optimally recommended / By making it possible to apply, it is possible to implement an artificial intelligence model platform that allows an ordinary user who is not familiar with security control technology to generate an optimal AI model for security control.
이로 인해, 본 발명에 따르면, 보안관제를 위한 목적 및 요구 사항에 적합한 최적의 인공지능 모델을 유연하고 다양하게 생성 및 적용할 수 있기 때문에, 보안관제 서비스의 품질 향상을 극대화시킬 수 있고, 아울러 대규모 사이버공격 및 이상행위 발생 징후를 효율적으로 분석하기 위한 인공지능 기반의 침해대응 체계 구축을 지원할 수 있는 효과까지 기대할 수 있다.For this reason, according to the present invention, since the optimal artificial intelligence model suitable for the purpose and requirements for security control can be flexibly and variously generated and applied, the quality improvement of the security control service can be maximized, and large scale It can be expected to have the effect of supporting the construction of an AI-based infringement response system to efficiently analyze the signs of cyber attacks and anomalies.
이하에서는, 도 5를 참조하여, 본 발명의 일 실시예에 따른 인공지능 모델 플랫폼 운영 방법에 대하여 설명하겠다.Hereinafter, an artificial intelligence model platform operating method according to an embodiment of the present invention will be described with reference to FIG. 5.
본 발명의 인공지능 모델 플랫폼(100)은, 빅데이터 통합저장 스토리지로부터 신규 생성된 원천 보안데이터를 주기적으로 수집한다(S10).The artificial intelligence model platform 100 of the present invention periodically collects the newly generated source security data from the big data integrated storage storage (S10).
본 발명의 인공지능 모델 플랫폼(100)은, 보안관제를 위한 인공지능 모델을 생성하고자 하는 시스템 관리자 또는 일반 사용자(이하, 사용자로 통칭함)의 조작에 따라, UI를 통해 수집/인공지능 기능과 관련된 각종 설정을 입력 받아 설정정보로 저장/관리한다(S20).The artificial intelligence model platform 100 of the present invention collects / artificial intelligence functions through a UI according to the operation of a system administrator or a general user (hereinafter referred to as a user) who wants to create an artificial intelligence model for security control. It receives various related settings and stores / manages them as setting information (S20).
그리고, 본 발명의 인공지능 모델 플랫폼(100)은, 원천 보안데이터로부터 특정 검색 조건 즉 앞서 사용자에 의해 기 설정된 특정 검색 조건에 의해 학습/테스트 데이터로 사용하고자 하는 보안이벤트를 수집한다(S30).Then, the artificial intelligence model platform 100 of the present invention collects security events to be used as learning / test data based on a specific search condition, that is, a specific search condition previously set by the user from the original security data (S30).
본 발명의 인공지능 모델 플랫폼(100)은, S30단계에서 수집된 보안이벤트에 대하여 기 설정된 특징정보 즉 앞서 사용자에 의해 기 설정된 특징정보(Feature)를 추출한다(S40).The AI model platform 100 of the present invention extracts pre-set feature information for the security event collected in step S30, that is, pre-set feature information by the user (S40).
그리고, 본 발명의 인공지능 모델 플랫폼(100)은, 보안이벤트의 추출된 특징정보에 대하여, 앞서 사용자에 의해 기 설정된 정규화를 수행한다(S50).Then, the AI model platform 100 of the present invention performs normalization preset by the user on the extracted feature information of the security event (S50).
본 발명의 인공지능 모델 플랫폼(100)에서는, 전술의 3가지 정규화 방식을 제공하여 사용자로 하여금 기 설정할 수 있도록 한다.In the artificial intelligence model platform 100 of the present invention, the above three normalization methods are provided to allow a user to pre-set.
이때, 본 발명의 인공지능 모델 플랫폼(100)은, 사용자에 의해 설정되는 정규화 방식이 최적이 아닐 수 있으므로, 인공지능 모델의 정확도를 높일 수 있는 최적의 정규화 방식을 추천할 수 있다(S50). At this time, the artificial intelligence model platform 100 of the present invention, since the normalization scheme set by the user may not be optimal, it may recommend the optimal normalization scheme to increase the accuracy of the artificial intelligence model (S50).
정규화 방식 추천에 대한 구체적인 설명은, 후술의 도 7에서 구체적으로 언급하겠다.The detailed description of the normalization method recommendation will be specifically described in FIG. 7 to be described later.
본 발명의 인공지능 모델 플랫폼(100)은, 특정정보 정규화가 완료된 보안이벤트에서 학습 데이터 또는 테스트 데이터를 주어진 조건 즉 앞서 사용자에 의해 기 설정된(주어진) 조건에 의해 추출한다(S60).The AI model platform 100 of the present invention extracts training data or test data from a security event in which normalization of specific information is completed, based on a given condition, that is, a predetermined (given) condition by the user (S60).
구체적으로, 본 발명의 인공지능 모델 플랫폼(100)은, 특정정보 정규화가 완료된 보안이벤트를, 사용자가 원하는 값, 순서, 포맷, 학습/테스트 데이터 비율, 파일분할방식 등에 따라 화면 또는 파일로 출력하게 된다. Specifically, the artificial intelligence model platform 100 of the present invention, to output a security event that has been normalized specific information, the screen or file according to the value, order, format, learning / test data ratio, file division method, etc. do.
그리고, 본 발명의 인공지능 모델 플랫폼(100)은, 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델을 생성한다(S70).Then, the artificial intelligence model platform 100 of the present invention applies an artificial intelligence algorithm to the learning data to generate an artificial intelligence model for security control (S70).
즉, 본 발명의 인공지능 모델 플랫폼(100)은, 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델, 예컨대 사용자에 의해 요구되는 기능의 인공지능 모델을 생성할 수 있다.That is, the artificial intelligence model platform 100 of the present invention may apply an artificial intelligence algorithm to learning data to generate an artificial intelligence model for security control, for example, an artificial intelligence model of a function required by a user.
예를 들면, 본 발명의 인공지능 모델 플랫폼(100)은, 사용자 요구에 따라, 보안이벤트의 악성 여부를 탐지하기 위한 인공지능 탐지모델을 생성할 수 있고, 보안이벤트의 정탐/오탐을 분류하기 위한 인공지능 분류모델을 생성할 수도 있다.For example, the artificial intelligence model platform 100 of the present invention may generate an artificial intelligence detection model for detecting whether a security event is malicious or not according to a user's request, and to classify the spying / falsification of the security event. You can also create an artificial intelligence classification model.
구체적으로, 본 발명의 인공지능 모델 플랫폼(100)은, S60단계에서 출력/파일 저장소에 관리되는 학습 데이터를 기반으로, 인공지능 알고리즘 예컨대 사용자에 의해 기 선택된 기계학습(예: Deep Learning) 알고리즘에 따라, 보안관제를 위한 인공지능 모델을 생성할 수 있다.Specifically, the AI model platform 100 of the present invention, based on the learning data managed in the output / file storage in step S60, AI algorithms, such as machine learning (eg, Deep Learning) algorithms previously selected by the user Accordingly, an artificial intelligence model for security control can be generated.
예를 들면, 본 발명의 인공지능 모델 플랫폼(100)은, Backward Propagation(오차역전파법) 계산 기반의 기계학습 기술에서 모델을 통해 예측되는 결과값과 실제 결과값 간의 편차를 나타내는 학습손실함수(Loss function)을 이용하여, 학습 데이터를 기반으로 학습손실함수(Loss function)의 편차가 0이 되는 인공지능 모델을 생성할 수 있다.For example, the artificial intelligence model platform 100 of the present invention is a learning loss function (Loss) indicating a deviation between a predicted result and an actual result through a model in a machine learning technique based on computation of backward propagation. function), it is possible to generate an artificial intelligence model in which the deviation of the loss function is zero based on the learning data.
더 나아가, 본 발명의 인공지능 모델 플랫폼(100)은, S60단계에서 출력/파일 저장소에 관리되는 테스트 데이터(정탐/오탐 분류 및 악성 여부 탐지의 실제 결과값을 알고 있는 보안이벤트)를 활용하여, 전술의 생성한 인공지능 모델의 정확도를 테스트한다(S80).Further, the artificial intelligence model platform 100 of the present invention utilizes test data (security events that know the actual result of detection and detection of malicious or false positives) managed in the output / file storage in step S60, The accuracy of the artificial intelligence model generated above is tested (S80).
예를 들어, 본 발명의 인공지능 모델 플랫폼(100)은, 테스트 데이터를 활용하여 전술의 생성한 인공지능 모델을 테스트하여, 모델을 통해 예측되는 결과값과 알고 있는 실제 결과값의 일치 비율을 모델의 정확도(성능) 즉 테스트 결과로서 출력할 수 있다. For example, the artificial intelligence model platform 100 of the present invention uses the test data to test the artificial intelligence model generated above, and model the matching ratio between the predicted result value and the known actual result value through the model. The accuracy (performance) of ie can be output as a test result.
이에, 본 발명의 인공지능 모델 플랫폼(100)은, '누가’ ‘언제’ ‘어떤 데이터’ ‘어떤 필드’ ‘어떤 샘플링 방식’ ‘어떤 정규화 방식’ ‘어떤 모델’을 이용하여 인공지능 모델을 만든 것인지, 또한 생성된 인공지능 모델이 어느 정도의 성능(정답률)을 갖는지 등의 성능 정보를 시스템(파일저장소)에 기록 및 관리할 수 있다.Thus, the AI model platform 100 of the present invention, 'who', 'when', 'some data', 'some field', 'some sampling method', 'some normalization method' and 'some model' using the 'AI model' Performance information such as whether or not the generated artificial intelligence model has a certain performance (correct answer rate) can be recorded and managed in a system (file storage).
그리고, 본 발명의 인공지능 모델 플랫폼(100)은, 이러한 성능 정보 관리를 기반으로, 모델 생성을 위한 조건들과 성능을 한눈에 비교할 수 있어 조건들과 성능의 상관 관계를 쉽게 파악할 수 있도록 한다.In addition, the artificial intelligence model platform 100 of the present invention, based on such performance information management, can compare conditions and performance for model generation at a glance so that it is easy to grasp the correlation between conditions and performance.
이때, 본 발명의 인공지능 모델 플랫폼(100)은, S80단계의 정확도 테스트 결과를 근거로, 전술의 생성한 인공지능 모델의 정확도를 높이도록 특징정보(Feature)에 대한 변경을 추천할 수 있다(S90,S100).At this time, the AI model platform 100 of the present invention may recommend a change to the feature information (Feature) to increase the accuracy of the generated AI model based on the accuracy test result of step S80 ( S90, S100).
즉, 본 발명의 인공지능 모델 플랫폼(100)은, 인공지능 모델 생성 시 학습에 이용된 특징정보(이하, 사용자 설정 특징정보) 대비, 인공지능 모델의 정확도를 향상시킬 수 있는 다른 특징정보 조합이 있다면(S90 Yes), 이를 추천하는 방식이다(S100).That is, the AI model platform 100 of the present invention has a combination of other feature information capable of improving the accuracy of the AI model, compared to the feature information (hereinafter, user set feature information) used for learning when the AI model is generated. If there is (S90 Yes), this is the recommended method (S100).
이하에서는 도 6을 참조하여 본 발명의 하드웨어(추천 장치)에서 수행되는 컴퓨터프로그램 즉 특징정보 추천을 위한 컴퓨터프로그램에 대해 설명하며, 다만 설명의 편의 상 특징정보 추천 장치(200)의 동작 방법으로 지칭하여 설명하겠다.Hereinafter, a computer program executed in the hardware (recommended device) of the present invention will be described with reference to FIG. 6, which is referred to as an operation method of the feature information recommendation device 200 for convenience of description. I will explain.
본 발명의 컴퓨터프로그램 즉 특징정보 추천 장치(200)의 동작 방법에 따르면, 사용자에 의해 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델의 성능(정확도)를 확인한다(S110).According to the operating method of the computer program, that is, the feature information recommendation apparatus 200 of the present invention, the performance (accuracy) of the artificial intelligence model generated based on the feature information learning set by the user is checked (S110).
구체적인 설명을 위해, 이하에서는, 본 발명의 인공지능 모델 플랫폼(100)에서 사용자에 의해 설정된 특징정보(이하, 사용자 설정 특징정보)를 학습/생성된 인공지능 모델을 가정하여 설명하겠다.For detailed description, hereinafter, it will be described on the assumption that the artificial intelligence model platform learning / generated feature information (hereinafter, user set feature information) set by the user in the AI model platform 100 of the present invention.
본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 전술과 같이 인공지능 모델 플랫폼(100)에서 사용자 설정 특징정보를 학습하여 생성된 인공지능 모델에 대하여, 모델 성능을 확인한다(S110).The operation method of the feature information recommendation apparatus 200 according to the present invention checks model performance with respect to the artificial intelligence model generated by learning user-set feature information in the artificial intelligence model platform 100 as described above (S110). .
예를 들면, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 인공지능 모델에 대하여, 본 발명의 인공지능 모델 플랫폼(100, 특히 데이터출력모듈(140))에서 출력되는 테스트 데이터(정탐/오탐 분류 및 악성 여부 탐지의 실제 결과값을 알고 있는 보안이벤트)를 활용하여, 모델 성능(정확도)을 테스트/확인할 수 있다.For example, the operation method of the feature information recommendation device 200 according to the present invention, for the artificial intelligence model, the test data output from the artificial intelligence model platform (100, especially the data output module 140) of the present invention ( Model performance (accuracy) can be tested / confirmed by utilizing the security event (which knows the actual result of the detection of false positives / false positives and malicious detection).
이에 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 본 발명의 인공지능 모델 플랫폼(100, 특히 데이터출력모듈(140))에서 생성되는 인공지능 모델을 대상으로, 테스트 데이터를 활용하여 인공지능 모델을 테스트함으로써, 모델을 통해 예측되는 결과값과 알고 있는 실제 결과값의 일치 비율을 모델의 정확도(성능) 즉 테스트 결과로서 출력할 수 있다. Accordingly, the operation method of the feature information recommendation device 200 according to the present invention is based on the artificial intelligence model generated by the artificial intelligence model platform 100 (especially the data output module 140) of the present invention, and utilizes test data. By testing the artificial intelligence model, it is possible to output the accuracy (performance) of the model, that is, the test result, as the ratio of the predicted result through the model and the known actual result.
본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 전체 특징정보에서 다수의 특징정보 조합을 설정하여, 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인한다(S120,S130).The operation method of the feature information recommendation apparatus 200 according to the present invention sets a plurality of feature information combinations from the whole feature information, and checks the performance of the AI model generated based on learning for each combination of the feature information ( S120, S130).
구체적으로, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 인공지능 모델 생성 시 설정 가능한 전체 특징정보에서, 금번 인공지능 모델 생성 시 학습된 사용자 설정 특징정보 외 다양한 특징정보 조합을 설정하여 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인할 수 있다.Specifically, the operation method of the feature information recommendation apparatus 200 according to the present invention sets a combination of various feature information in addition to user-set feature information learned at the time of creation of the AI model, from all feature information that can be set when the AI model is generated Thus, it is possible to check the performance of the artificial intelligence model generated based on learning for each combination of feature information.
이하에서는, 특정 특징정보 조합을 추천하는 구체적인 실시예들을 설명하겠다.Hereinafter, specific embodiments for recommending specific feature information combinations will be described.
이하에서는, 전체 특징정보(예: a,b,c...,z(n=26)) 중 금번 인공지능 모델 생성 시 학습된 사용자 설정 특징정보(예: a,b,c,d,e,f(k=6))를 가정하여 설명하겠다. 그리고 이 경우 확인한 인공지능 모델 성능(mk)이 85%라고 가정한다.Hereinafter, user-set feature information (e.g., a, b, c, d, e) learned at the time of creation of this AI model among all feature information (e.g., a, b, c ..., z (n = 26)) , f (k = 6)). In this case, it is assumed that the verified artificial intelligence model performance (m k ) is 85%.
이에, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 사용자 설정 특징정보(a,b,c,d,e,f)에 전체 특징정보(n) 중 사용자 설정 특징정보(a,b,c,d,e,f)를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가하여, 다수의 특징정보 조합을 설정할 수 있다(S120). Accordingly, the operation method of the feature information recommendation apparatus 200 according to the present invention includes user-set feature information (a, out of all feature information (n) in user-set feature information (a, b, c, d, e, f). A plurality of feature information combinations may be set by sequentially adding at least one of the specific information other than b, c, d, e, f) (S120).
예를 들면, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 사용자가 설정한 사용자 설정 특징정보(a,b,c,d,e,f)에, 전체 특징정보(n) 중 사용자 설정 특징정보(a,b,c,d,e,f)를 제외한 나머지 특정정보 중 1~(n-k)개의 특징정보를 순차적으로 추가하여, 다음과 같은 다수의 특징정보 조합을 설정할 수 있다.For example, the operation method of the feature information recommendation apparatus 200 according to the present invention includes the user set feature information (a, b, c, d, e, f) set by the user, among the whole feature information (n) One to (nk) feature information among the remaining specific information except for the user-set feature information (a, b, c, d, e, f) can be sequentially added to set a plurality of feature information combinations as follows.
a,b,c,d,e,f,g -> m(k+1) 1 -> 82%a, b, c, d, e, f, g-> m (k + 1) 1- > 82%
a,b,c,d,e,f,h -> m(k+1) 2 -> 80%a, b, c, d, e, f, h-> m (k + 1) 2- > 80%
......
a,b,c,d,e,f,g,h,i -> m(k+3) 1 -> 88%a, b, c, d, e, f, g, h, i-> m (k + 3) 1- > 88%
......
a,b,c,d,e,f,...,z -> m(n) -> 85%a, b, c, d, e, f, ..., z-> m (n) -> 85%
그리고, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 전술과 같이 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능, 82%, 80%, ... 88%,...85%을 확인할 수 있다(S130).And, the operation method of the feature information recommendation apparatus 200 according to the present invention is the performance of the artificial intelligence model generated based on learning for each combination of multiple feature information as described above, 82%, 80%, ... 88% , ... 85% can be confirmed (S130).
이 경우, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 다수의 특징정보 조합 별 성능 중에서, 금번 사용자 설정을 기반으로 생성된 인공지능 모델의 성능(mk=85%) 보다 높은 성능을 갖는 상위 N개(예: 4개)를 특정 특징정보 조합으로서 선택/추천할 수 있다(S140 Yes, S150).In this case, the operation method of the feature information recommendation device 200 according to the present invention is higher than the performance (m k = 85%) of the artificial intelligence model generated based on the current user setting among performances of a plurality of feature information combinations. The top N (for example, 4) having performance may be selected / recommended as a specific feature information combination (S140 Yes, S150).
다른 예를 들면, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 사용자가 설정한 사용자 설정 특징정보(a,b,c,d,e,f)에, 전체 특징정보(n) 중 사용자 설정 특징정보(a,b,c,d,e,f)를 제외한 나머지 특정정보를 1개씩 순차적으로 추가하여, 다음과 같은 다수의 특징정보 조합을 설정할 수 있다(S120).For another example, the operating method of the feature information recommendation apparatus 200 according to the present invention includes user-specified feature information (a, b, c, d, e, f), and overall feature information (n) Among the user-specific feature information (a, b, c, d, e, f), the remaining specific information may be sequentially added one by one to set a plurality of feature information combinations as follows (S120).
a,b,c,d,e,f,g -> m(k+1) 1 -> 82%a, b, c, d, e, f, g-> m (k + 1) 1- > 82%
a,b,c,d,e,f,h -> m(k+1) 2 -> 80%a, b, c, d, e, f, h-> m (k + 1) 2- > 80%
......
a,b,c,d,e,f,z -> m(k+1) ζ+1 -> 90%a, b, c, d, e, f, z-> m (k + 1) ζ + 1- > 90%
그리고, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 전술과 같이 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능, 82%, 80%, ...90%을 확인할 수 있다(S130).And, the operation method of the feature information recommendation device 200 according to the present invention is the performance of the artificial intelligence model generated based on learning for each combination of a plurality of feature information, as described above, 82%, 80%, ... 90% It can be confirmed (S130).
이 경우, 본 발명에 따른 특징정보 추천 장치(200)의 동작 방법은, 다수의 특징정보 조합 별 성능 중에서, 금번 사용자 설정을 기반으로 생성된 인공지능 모델의 성능(mk=85%) 보다 높은 성능을 갖는 상위 N개(예: 3개)를 특정 특징정보 조합으로서 선택/추천할 수 있다(S140 Yes, S150).In this case, the operation method of the feature information recommendation apparatus 200 according to the present invention is higher than the performance (m k = 85%) of the artificial intelligence model generated based on the current user setting among performances of a plurality of feature information combinations. The top N (for example, three) having performance may be selected / recommended as a specific feature information combination (S140 Yes, S150).
이상, 본 발명에 따르면, 인공지능 모델 플랫폼(100)에서 제공하는 환경에서 UI를 기반으로 보안관제를 위한 인공지능 모델을 생성하는 사용자에게 최적의 성능(정확도)를 갖는 최적 특징(feature)를 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 한다. As described above, according to the present invention, in the environment provided by the artificial intelligence model platform 100, recommending an optimal feature having optimal performance (accuracy) to a user generating an artificial intelligence model for security control based on UI. / By making it applicable, even an average user who is not familiar with security control technology can create an optimal AI model for security control.
이하에서는 도 7을 참조하여 본 발명의 하드웨어(추천 장치)에서 수행되는 컴퓨터프로그램 즉 정규화 방식 추천을 위한 컴퓨터프로그램에 대해 설명하며, 다만 설명의 편의 상 정규화 방식 추천 장치(300)의 동작 방법으로 지칭하여 설명하겠다.Hereinafter, a computer program performed in the hardware (recommended device) of the present invention will be described with reference to FIG. 7, that is, a computer program for recommending a normalization method, but for convenience of description, referred to as an operation method of the normalization method recommendation apparatus 300 I will explain.
본 발명의 컴퓨터프로그램 즉 정규화 방식 추천 장치(300)의 동작 방법에 따르면, 인공지능 모델 생성 시 학습에 이용되는 특징정보의 속성을 확인한다(S200).According to the method of operation of the computer program, that is, the normalization method recommendation apparatus 300 of the present invention, when generating an artificial intelligence model, the property of feature information used for learning is checked (S200).
여기서, 인공지능 모델 생성 시 학습에 이용되는 특징정보는, 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 UI를 기반으로 사용자에 의해 직접 설정되는 특징정보일 수 있고, 또는 전체 특징정보 중 추천되는 특정 특징정보 조합이 적용/설정되는 특징정보일 수도 있다.Here, the feature information used for learning when the AI model is generated may be feature information that is directly set by a user based on a UI among all the feature information that can be set when the AI model is generated, or a specific feature that is recommended among all feature information. The feature information combination may be feature information applied / set.
그리고, 특징정보의 속성은, 크게 숫자 속성과 카테고리 속성으로 구분될 수 있다.And, the attribute of the characteristic information can be largely divided into a number attribute and a category attribute.
즉, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 인공지능 모델 생성 시 학습에 이용되는 특징정보(직접 설정 또는 추천 적용)의 속성이, 숫자 속성인지 또는 카테고리 속성인지 또는 숫자 및 카테고리 조합 속성인지를 확인할 수 있다(S200).That is, the operation method of the normalization method recommendation apparatus 300 according to the present invention is whether the attribute of feature information (direct setting or recommendation application) used for learning when generating an artificial intelligence model is a numeric attribute or a category attribute or a number and It can be checked whether the category is a combination attribute (S200).
본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 설정 가능한 전체 정규화 방식 중, S200단계에서 확인한 특징정보의 속성에 따른 정규화 방식을 결정한다.The operation method of the normalization method recommendation apparatus 300 according to the present invention determines a normalization method according to the attribute of the feature information identified in step S200 among all the settable normalization methods.
구체적으로 설명하면, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보의 속성에 따른 정규화 방식을 결정하기에 앞서, 금번 특징정보 전체 필드에 동일한 정규화 방식이 적용되는지 또는 금번 특징정보 전체 필드에서 필드 별로 정규화 방식이 적용되는지를 먼저 구분할 수 있다(S210).Specifically, the method of operation of the apparatus 300 for recommending a normalization method according to the present invention, prior to determining the normalization method according to the attribute of the feature information, is the same normalization method applied to all of the feature information fields or this feature In the entire information field, whether a normalization method is applied for each field may be distinguished first (S210).
본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 금번 특징정보 전체 필드에 숫자 및/또는 카테고리 데이터만 존재하는 경우(단일 특징 case 포함), 금번 특징정보 전체 필드에 동일한 정규화 방식이 적용되는 것으로 구분할 수 있다(S210 Yes). In the operation method of the normalization method recommendation apparatus 300 according to the present invention, when only numeric and / or category data exists in the entire feature information field (including a single feature case), the same normalization method is applied to the entire feature information field. It can be classified as being (S210 Yes).
이 경우, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보의 속성이 숫자 속성인 경우, 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정하고, 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정하고, 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 상기 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다(S220). In this case, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, when the attribute of the feature information is a numeric attribute, the first normalization method according to the entire number pattern of the feature information is determined, and the attribute of the feature information is In the case of a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of feature information in a vector defined as the total number of categories of the feature information, and the attribute of the feature information is a number and In the case of a category combination attribute, the second normalization method and the first normalization method may be determined (S220).
구체적으로, 제1 정규화 방식은, 기 정의된 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식을 포함한다(수학식 1,2,3 참조).Specifically, the first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority (see Equations 1, 2, and 3).
본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보 전체 필드에 숫자 데이터만 존재하는 경우 특징정보의 속성이 숫자 속성인 것으로 구분하고, 이 경우 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정한다.The operation method of the normalization method recommendation apparatus 300 according to the present invention is to classify the attribute of the feature information as a numeric attribute when only numeric data exists in the entire feature information field, and in this case, the method according to the whole number pattern of the feature information 1 Determine the normalization method.
이때, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 제1 정규화 방식 중 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식의 순서로 결정하되, 특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로, 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다.At this time, the operation method of the normalization method recommendation apparatus 300 according to the present invention is determined in the order of the standard score normalization method, the mean normalization normalization method, and the feature scaling normalization method according to the priority of the first normalization method. Based on the existence of the standard deviation and the upper / lower limit of the normalized scaling range for the entire numeric pattern, the highest normalized normalization method applicable among the first normalization methods may be determined.
또한, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보 전체 필드에 카테고리 데이터만 존재하는 경우 특징정보의 속성이 카테고리 속성인 것으로 구분하고, 이 경우 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값(예: 1)으로 표현하는 제2 정규화 방식_One Hot Encoding을 결정할 수 있다.In addition, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, when only category data exists in the entire feature information field, the attribute of the feature information is classified as a category attribute, and in this case, the total number of categories of the feature information A second normalization method _One Hot Encoding that expresses a non-zero characteristic value (eg, 1) only at a designated location for each category of feature information in a defined vector can be determined.
또한, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보 전체 필드에 숫자 및 카테고리 데이터가 존재하는 경우 특징정보의 속성이 숫자 및 카테고리 조합 속성인 것으로 구분하고, 이 경우 전술의 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다. In addition, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, when the number and category data exist in the entire feature information field, the attribute of the feature information is divided into a number and category combination attribute. The second normalization method and the first normalization method may be determined.
즉, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 특징정보 내 카테고리 속성의 데이터에 대해서 먼저 전술의 제2 정규화 방식_One Hot Encoding이 적용된 후, 특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하기 위해서, 제2 정규화 방식 및 제1 정규화 방식을 결정할 수 있다.That is, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, when the attribute of the feature information is a number and category combination attribute, the second normalization method of the above-mentioned second normalization method_One Hot After Encoding is applied, the second normalization method is used to determine the highest normalization method applicable among the first normalization methods based on whether there is a standard deviation and a normalization scaling range upper / lower limit for the entire numeric pattern of the feature information. And a first normalization method.
한편, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보가 복합 특징(여러 보안이벤트 간의 집계, 통계적 기법들을 활용하여 추출할 수 있는 하나의 특징)인 경우, 금번 특징정보 전체 필드에서 필드 별로 정규화 방식 적용되는 것으로 구분할 수 있다(S210 No). On the other hand, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, when the feature information is a composite feature (a single feature that can be extracted by using statistical and statistical methods between multiple security events), the entire feature information of this time It can be classified as being applied to the normalization method for each field in the field (S210 No).
이 경우, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보에서 속성이 종류 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다(S230).In this case, the operation method of the normalization method recommendation apparatus 300 according to the present invention includes a normalization method having the highest priority, which is applicable among the means of the normalization method and the feature scaling normalization method for the field of attribute type attribute in the feature information. It can be determined (S230).
또한, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보에서 속성이 개수 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정할 수 있다(S230).In addition, the operation method of the normalization method recommendation apparatus 300 according to the present invention determines the normalization method having the highest priority among the normalization method, the mean normalization normalization method, and the feature scaling normalization method for the field of the attribute whose number of attributes is the feature information. It can be (S230).
또한, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보에서 속성이 비율 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키도록 결정하거나 또는 Standard score 정규화 방식을 결정할 수 있다(S230).In addition, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, the attribute in the attribute information determines the normalization method for the field of the ratio attribute and decides to exclude it from the normalization target or determines the standard score normalization method. It can be (S230).
또한, 본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, 특징정보에서 속성이 존재 여부(예: 연산 결과값의 유/무)속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키도록 결정할 수 있다(S230).In addition, in the operation method of the normalization method recommendation apparatus 300 according to the present invention, the normalization method is not determined and the normalization target is determined for the field of the attribute presence or absence (for example, presence / absence of an operation result value) in the feature information. It may be decided to exclude (S230).
본 발명에 따른 정규화 방식 추천 장치(300)의 동작 방법은, S220단계 또는 S230단계에서 결정한 정규화 방식을 추천한다(S240).The operation method of the normalization method recommendation apparatus 300 according to the present invention recommends the normalization method determined in step S220 or step S230 (S240).
이상, 본 발명에 따르면, 인공지능 모델 플랫폼(100)에서 제공하는 환경에서 UI를 기반으로 보안관제를 위한 인공지능 모델을 생성하는 사용자에게 최적의 성능(정확도)를 갖는 최적 정규화 방식을 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 한다.As described above, according to the present invention, in the environment provided by the AI model platform 100, the recommendation / applying the optimal normalization method with optimal performance (accuracy) to the user generating the AI model for security control based on the UI By doing so, even an average user who is not familiar with security control technology can create an optimal artificial intelligence model for security control.
이상에서 설명한 바와 같이, 본 발명에 따르면, 보안관제를 위한 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현하되, 특히 인공지능 모델 성능에 직결되는 특징정보 및 정규화 방식을 최적으로 추천/적용할 수 있도록 함으로써, 보안관제 기술에 익숙하지 않은 일반 사용자도 보안관제를 위한 최적의 인공지능 모델을 생성할 수 있도록 하는 인공지능 모델 플랫폼을 구현할 수 있다. As described above, according to the present invention, an artificial intelligence model platform that enables the creation of an artificial intelligence model for security control is implemented, but in particular, feature information and normalization methods directly related to the performance of the artificial intelligence model are optimally recommended / By making it possible to apply, it is possible to implement an artificial intelligence model platform that allows an ordinary user who is not familiar with security control technology to generate an optimal AI model for security control.
이로 인해, 본 발명에 따르면, 보안관제를 위한 목적 및 요구 사항에 적합한 최적의 인공지능 모델을 유연하고 다양하게 생성 및 적용할 수 있기 때문에, 보안관제 서비스의 품질 향상을 극대화시킬 수 있고, 아울러 대규모 사이버공격 및 이상행위 발생 징후를 효율적으로 분석하기 위한 인공지능 기반의 침해대응 체계 구축을 지원할 수 있는 효과까지 기대할 수 있다.For this reason, according to the present invention, since the optimal artificial intelligence model suitable for the purpose and requirements for security control can be flexibly and variously generated and applied, the quality improvement of the security control service can be maximized, and large scale It can be expected to have the effect of supporting the construction of an AI-based infringement response system to efficiently analyze the signs of cyber attacks and anomalies.
위 설명한 본 발명의 일 실시예에 따른 인공지능 모델 플랫폼 운영 방법은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The artificial intelligence model platform operating method according to an embodiment of the present invention described above may be implemented in a form of program instructions that can be executed through various computer means and may be recorded in a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and available to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.
지금까지 본 발명을 바람직한 실시 예를 참조하여 상세히 설명하였지만, 본 발명이 상기한 실시 예에 한정되는 것은 아니며, 이하의 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 또는 수정이 가능한 범위까지 본 발명의 기술적 사상이 미친다 할 것이다.The present invention has been described in detail with reference to preferred embodiments, but the present invention is not limited to the above-described embodiments, and the technical field to which the present invention pertains without departing from the gist of the present invention claimed in the claims below. Anyone with ordinary knowledge in the technical idea of the present invention extends to the extent that various modifications or modifications are possible.

Claims (28)

  1. 원천 보안데이터로부터 특정 검색 조건에 의해 학습/테스트 데이터로 사용하고자 하는 보안이벤트를 수집하는 데이터수집모듈;A data collection module that collects security events to be used as learning / test data according to specific search conditions from the source security data;
    상기 수집된 보안이벤트에 대하여 기 설정된 특징정보를 추출하는 특징추출모듈;A feature extraction module that extracts preset feature information for the collected security event;
    상기 보안이벤트의 추출된 특징정보에 대하여 기 설정된 정규화를 수행하는 정규화모듈;A normalization module that performs preset normalization on the extracted feature information of the security event;
    상기 특정정보 정규화가 완료된 보안이벤트에서 학습 데이터 또는 테스트 데이터를 주어진 조건에 의해 추출하는 데이터출력모듈; 및A data output module that extracts learning data or test data from the security event where the specific information normalization is completed according to a given condition; And
    상기 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델을 생성하는 모델생성모듈을 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼.An artificial intelligence model platform comprising a model generation module that generates an artificial intelligence model for security control by applying an artificial intelligence algorithm to the learning data.
  2. 제 1 항에 있어서,According to claim 1,
    상기 테스트 데이터를 활용하여, 상기 인공지능 모델의 정확도를 테스트하는 성능관리모듈을 더 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼.The artificial intelligence model platform further comprising a performance management module for testing the accuracy of the artificial intelligence model using the test data.
  3. 제 1 항에 있어서,According to claim 1,
    상기 데이터수집모듈의 특정 검색 조건, 상기 특징추출모듈의 특징정보, 상기 정규화모듈의 정규화 방식, 상기 데이터출력모듈의 조건 중 적어도 하나를 설정하기 위한 UI(User Interface)를 제공하는 UI모듈을 더 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼.Further comprising a UI module providing a user interface (UI) for setting at least one of the specific search condition of the data collection module, the feature information of the feature extraction module, the normalization method of the normalization module, and the condition of the data output module. AI model platform, characterized in that.
  4. 제 1 항에 있어서,According to claim 1,
    상기 데이터수집모듈은,The data collection module,
    수집 건의 총 수가 동시 수행 가능한 최대 수집 건수를 초과하는 경우, 상기 수집 건의 총 개수 중 최대 수집 건수를 초과하는 수집 건을 큐(queue)에 저장한 후 순차적으로 진행하며,If the total number of collections exceeds the maximum number of collections that can be performed simultaneously, the collections exceeding the maximum number of collections among the total number of collections are stored in a queue and sequentially executed.
    상기 큐에 저장한 후 진행하는 수집 건의 경우, 상기 원천 보안데이터에서 상기 수집 건의 발생시점 이전 데이터에 대해서만 상기 보안이벤트를 수집하는 것을 특징으로 하는 인공지능 모델 플랫폼.In the case of a collection case proceeding after being stored in the queue, the artificial intelligence model platform characterized in that the security event is collected only for data prior to the occurrence point of the collection case in the source security data.
  5. 제 2 항에 있어서,According to claim 2,
    상기 특징추출모듈은, The feature extraction module,
    상기 성능관리모듈의 정확도 테스트 결과를 근거로, 상기 인공지능 모델의 정확도를 높이도록 상기 특징정보에 대한 변경을 추천하는 것을 특징으로 하는 인공지능 모델 플랫폼.Based on the result of the accuracy test of the performance management module, the AI model platform, characterized by recommending a change to the feature information to increase the accuracy of the AI model.
  6. 제 1 항에 있어서,According to claim 1,
    상기 정규화모듈은,The normalization module,
    상기 인공지능 모델의 정확도를 높이도록 상기 정규화에 대한 정규화 방식 변경을 추천하는 것을 특징으로 하는 인공지능 모델 플랫폼.AI model platform, characterized by recommending a normalization method change for the normalization to increase the accuracy of the AI model.
  7. 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 기 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델에 대하여, 모델 성능을 확인하는 모델성능확인부;A model performance checking unit that checks model performance with respect to an artificial intelligence model generated based on learning predetermined feature information among all feature information that can be set when generating the artificial intelligence model;
    상기 전체 특징정보에서 다수의 특징정보 조합을 설정하여, 상기 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인하는 조합성능확인부; 및A combination performance checking unit configured to set a plurality of feature information combinations from the entire feature information to check the performance of the artificial intelligence model generated based on learning for each of the plurality of feature information combinations; And
    상기 다수의 특징정보 조합 별 성능 중 상기 모델성능확인부에서 확인한 모델 성능 보다 높은 성능의 특정 특징정보 조합을 추천하는 추천부를 포함하는 것을 특징으로 하는 특징정보 추천 장치.And a recommendation unit for recommending a specific feature information combination having a higher performance than the model performance confirmed by the model performance checking unit among performances for each of the plurality of feature information combinations.
  8. 제 7 항에 있어서,The method of claim 7,
    상기 다수의 특징정보 조합은,The combination of the plurality of feature information,
    상기 기 설정된 특징정보에, 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가한 조합이며,It is a combination that is sequentially added to at least one of the remaining specific information excluding the preset characteristic information from the entire characteristic information to the preset characteristic information,
    상기 특정 특징정보 조합은,The specific feature information combination,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 상위 N개인 것을 특징으로 하는 특징정보 추천 장치.Feature information recommendation device, characterized in that the top N having a higher performance than the model performance among the plurality of feature information combinations.
  9. 제 7 항에 있어서,The method of claim 7,
    상기 기 설정된 특정정보는 상기 전체 특징정보이며,The predetermined specific information is the entire feature information,
    상기 조합성능확인부는,The combination performance confirmation unit,
    상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정,A single feature information performance comparison process to determine whether the maximum performance of the AI model generated based on learning for each single feature information in the entire feature information is higher than the model performance,
    상기 최대 성능이 상기 모델 성능 보다 높은 경우 상기 최대 성능의 단일 특징정보를 상기 특징정보로 재 설정하고, 상기 특징정보에 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 하나씩 순차적으로 추가하여 상기 다수의 특징정보 조합을 설정하는 조합설정 과정,When the maximum performance is higher than the model performance, the single feature information of the maximum performance is reset to the feature information, and the feature information is sequentially added to the feature information one by one of the remaining specific information excluding the preset feature information. A combination setting process for setting the combination of the plurality of feature information,
    상기 다수의 특징정보 조합 중 상기 재 설정한 특징정보의 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정,Each of the combinations of feature information having higher performance than the model performance of the re-set feature information among the plurality of feature information combinations is reset as feature information so that the combination setting process is repeatedly performed for each re-set feature information Reset process,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행하는 것을 특징으로 하는 특징정보 추천 장치.If there is no feature information combination having a higher performance than the model performance among the plurality of feature information combinations, feature information characterized by performing a process of delivering the immediately preceding feature information as the specific feature information combination to the recommendation unit Recommended device.
  10. 제 7 항에 있어서,The method of claim 7,
    상기 기 설정된 특정정보는 상기 전체 특징정보이며,The predetermined specific information is the entire feature information,
    상기 조합성능확인부는,The combination performance confirmation unit,
    상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정,A single feature information performance comparison process to determine whether the maximum performance of the AI model generated based on learning for each single feature information in the entire feature information is higher than the model performance,
    상기 최대 성능이 상기 모델 성능 보다 높지 않은 경우 상기 특징정보에서 서로 다른 하나의 특정정보를 제외한 상기 다수의 특징정보 조합을 설정하는 조합설정 과정,If the maximum performance is not higher than the model performance, a combination setting process of setting the combination of the plurality of feature information excluding one specific information different from the feature information,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정,A reset process for resetting each combination of feature information having a performance higher than the model performance among the plurality of feature information combinations as feature information, so that the combination setting process is repeatedly performed for each re-set feature information,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행하는 것을 특징으로 하는 특징정보 추천 장치.If there is no feature information combination having a higher performance than the model performance among the plurality of feature information combinations, feature information characterized by performing a process of delivering the immediately preceding feature information as the specific feature information combination to the recommendation unit Recommended device.
  11. 인공지능 모델 생성 시 학습에 이용되는 특징정보의 속성을 확인하는 속성확인부;An attribute confirmation unit that checks attributes of feature information used for learning when creating an artificial intelligence model;
    설정 가능한 전체 정규화 방식 중, 상기 특징정보의 속성에 따른 정규화 방식을 결정하는 결정부; 및Determining unit for determining a normalization method according to the attribute of the feature information, from among all the settable normalization method; And
    상기 결정한 정규화 방식을 추천하는 추천부를 포함하는 것을 특징으로 하는 정규화 방식 추천 장치.And a recommendation unit for recommending the determined normalization method.
  12. 제 11 항에 있어서,The method of claim 11,
    상기 결정부는,The determining unit,
    상기 특징정보 전체 필드에 동일한 정규화 방식 적용되는 경우라면,If the same normalization method is applied to all of the feature information fields,
    상기 특징정보의 속성이 숫자 속성인 경우, 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정하고,When the attribute of the feature information is a numeric attribute, a first normalization method according to the entire number pattern of the feature information is determined,
    상기 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정하고,When the attribute of the feature information is a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of feature information in a vector defined as the total number of categories of the feature information,
    상기 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 상기 제2 정규화 방식 및 제1 정규화 방식을 결정하는 것을 특징으로 하는 정규화 방식 추천 장치.A device for recommending a normalization method, wherein the second normalization method and the first normalization method are determined when the attribute of the feature information is a combination of numbers and categories.
  13. 제 12 항에 있어서,The method of claim 12,
    상기 제1 정규화 방식은, 기 정의된 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식을 포함하며,The first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority,
    상기 결정부는,The determining unit,
    특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로, 상기 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하는 것을 특징으로 하는 정규화 방식 추천 장치.A normalization method recommendation apparatus characterized by determining a normalization method having the highest priority applicable among the first normalization methods based on whether there is a standard deviation and a normalization scaling range upper / lower limit for the entire numeric pattern of the feature information.
  14. 제 11 항에 있어서,The method of claim 11,
    상기 결정부는,The determining unit,
    상기 특징정보 전체 필드에서 필드 별로 정규화 방식 적용되는 경우라면,If the normalization method is applied to each field in the entire feature information field,
    상기 특징정보에서 속성이 종류 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고,In the feature information, for the field of the attribute type attribute, the normalization method having the highest priority applicable to the normalization method of the normalization method or the feature scaling normalization method is determined,
    상기 특징정보에서 속성이 개수 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고,In the feature information, for the field of the number attribute of attribute, the normalization method having the highest priority applicable among the means normalization method and the feature scaling normalization method is determined,
    상기 특징정보에서 속성이 비율 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키거나 또는 Standard score 정규화 방식을 결정하고,In the feature information, a normalization method is not determined for a field whose attribute is a ratio attribute and is excluded from normalization, or a standard score normalization method is determined,
    상기 특징정보에서 속성이 존재 여부 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키는 것을 특징으로 하는 정규화 방식 추천 장치.A device for recommending a normalization method, characterized in that a normalization method is not determined and excluded from the normalization target for a field of an attribute in the feature information.
  15. 원천 보안데이터로부터 특정 검색 조건에 의해 학습/테스트 데이터로 사용하고자 하는 보안이벤트를 수집하는 데이터수집단계;A data collection step of collecting security events to be used as learning / test data according to specific search conditions from the source security data;
    상기 수집된 보안이벤트에 대하여 기 설정된 특징정보를 추출하는 특징추출단계;A feature extraction step of extracting predetermined feature information for the collected security event;
    상기 보안이벤트의 추출된 특징정보에 대하여 기 설정된 정규화를 수행하는 정규화단계;A normalization step of performing preset normalization on the extracted feature information of the security event;
    상기 특정정보 정규화가 완료된 보안이벤트에서 학습 데이터 또는 테스트 데이터를 주어진 조건에 의해 추출하는 데이터출력단계; 및A data output step of extracting training data or test data according to a given condition from the security event in which the specific information normalization is completed; And
    상기 학습 데이터에 인공지능 알고리즘을 적용하여, 보안관제를 위한 인공지능 모델을 모델생성단계를 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼 운영 방법.A method of operating an artificial intelligence model platform comprising applying an artificial intelligence algorithm to the learning data and model-generating an artificial intelligence model for security control.
  16. 제 15 항에 있어서,The method of claim 15,
    상기 테스트 데이터를 활용하여, 상기 인공지능 모델의 정확도를 테스트하는 성능관리단계를 더 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼 운영 방법.Using the test data, the artificial intelligence model platform operating method further comprises a performance management step of testing the accuracy of the artificial intelligence model.
  17. 제 15 항에 있어서,The method of claim 15,
    상기 데이터수집단계의 특정 검색 조건, 상기 특징추출모듈의 특징정보, 상기 정규화모듈의 정규화 방식, 상기 데이터출력모듈의 조건 중 적어도 하나를 설정하기 위한 UI(User Interface)를 제공하는 단계를 더 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼 운영 방법.The method further includes providing a user interface (UI) for setting at least one of a specific search condition of the data collection step, feature information of the feature extraction module, normalization method of the normalization module, and condition of the data output module. Artificial intelligence model platform operating method, characterized in that.
  18. 제 15 항에 있어서,The method of claim 15,
    상기 데이터수집단계는,The data collection step,
    수집 건의 총 수가 동시 수행 가능한 최대 수집 건수를 초과하는 경우, 상기 수집 건의 총 개수 중 최대 수집 건수를 초과하는 수집 건을 큐(queue)에 저장한 후 순차적으로 진행하며,If the total number of collections exceeds the maximum number of collections that can be performed simultaneously, the collections exceeding the maximum number of collections among the total number of collections are stored in a queue and sequentially executed.
    상기 큐에 저장한 후 진행하는 수집 건의 경우, 상기 원천 보안데이터에서 상기 수집 건의 발생시점 이전 데이터에 대해서만 상기 보안이벤트를 수집하는 것을 특징으로 하는 인공지능 모델 플랫폼 운영 방법.In the case of a collection case proceeding after being stored in the queue, the artificial intelligence model platform operating method characterized in that the security event is collected only for data prior to the occurrence point of the collection case in the source security data.
  19. 제 16 항에 있어서,The method of claim 16,
    상기 성능관리단계의 정확도 테스트 결과를 근거로, 상기 인공지능 모델의 정확도를 높이도록 상기 특징정보에 대한 변경을 추천하는 단계를 더 포함하는 것을 특징으로 하는 인공지능 모델 플랫폼 운영 방법.And recommending a change to the feature information to increase the accuracy of the AI model based on the accuracy test result of the performance management step.
  20. 제 15 항에 있어서,The method of claim 15,
    상기 정규화단계는,The normalization step,
    상기 인공지능 모델의 정확도를 높이도록 상기 정규화에 대한 정규화 방식 변경을 추천하는 것을 특징으로 하는 인공지능 모델 플랫폼 운영 방법.The artificial intelligence model platform operating method characterized in that it is recommended to change the normalization method for the normalization to increase the accuracy of the artificial intelligence model.
  21. 하드웨어와 결합하여, 인공지능 모델 생성 시 설정 가능한 전체 특징정보 중 기 설정된 특징정보 학습을 기반으로 생성된 인공지능 모델에 대하여, 모델 성능을 확인하는 모델성능확인단계;A model performance verification step of confirming model performance with respect to the artificial intelligence model generated based on learning the predetermined feature information among all the feature information that can be set when generating the artificial intelligence model in combination with hardware;
    상기 전체 특징정보에서 다수의 특징정보 조합을 설정하여, 상기 다수의 특징정보 조합 별로 학습을 기반으로 생성된 인공지능 모델의 성능을 확인하는 조합성능확인단계; 및A combination performance checking step of setting a combination of a plurality of feature information from the whole feature information, and confirming the performance of the artificial intelligence model generated based on learning for each of the plurality of feature information combinations; And
    상기 다수의 특징정보 조합 별 성능 중 상기 모델성능확인부에서 확인한 모델 성능 보다 높은 성능의 특정 특징정보 조합을 추천하는 추천단계를 실행시키기 위하여 매체에 저장된 컴퓨터프로그램.A computer program stored in a medium to perform a recommendation step of recommending a specific feature information combination having a higher performance than the model performance confirmed by the model performance checking unit among performances of the plurality of feature information combinations.
  22. 제 21 항에 있어서,The method of claim 21,
    상기 다수의 특징정보 조합은,The combination of the plurality of feature information,
    상기 기 설정된 특징정보에, 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 적어도 하나씩 순차적으로 추가한 조합이며,It is a combination that is sequentially added to at least one of the remaining specific information excluding the preset characteristic information from the entire characteristic information to the preset characteristic information,
    상기 특정 특징정보 조합은,The specific feature information combination,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 상위 N개인 것을 특징으로 하는 컴퓨터프로그램.A computer program characterized by being the top N having a higher performance than the model performance among the combination of the plurality of feature information.
  23. 제 21 항에 있어서,The method of claim 21,
    상기 기 설정된 특정정보는 상기 전체 특징정보이며,The predetermined specific information is the entire feature information,
    상기 조합성능확인단계는,The combination performance check step,
    상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정,A single feature information performance comparison process to determine whether the maximum performance of the AI model generated based on learning for each single feature information in the entire feature information is higher than the model performance,
    상기 최대 성능이 상기 모델 성능 보다 높은 경우 상기 최대 성능의 단일 특징정보를 상기 특징정보로 재 설정하고, 상기 특징정보에 상기 전체 특징정보에서 상기 기 설정된 특징정보를 제외한 나머지 특정정보 중 하나씩 순차적으로 추가하여 상기 다수의 특징정보 조합을 설정하는 조합설정 과정,When the maximum performance is higher than the model performance, the single feature information of the maximum performance is reset to the feature information, and the feature information is sequentially added to the feature information one by one of the remaining specific information excluding the preset feature information. A combination setting process for setting the combination of the plurality of feature information,
    상기 다수의 특징정보 조합 중 상기 재 설정한 특징정보의 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정,Each of the combinations of feature information having higher performance than the model performance of the re-set feature information among the plurality of feature information combinations is reset as feature information so that the combination setting process is repeatedly performed for each re-set feature information Reset process,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행하는 것을 특징으로 하는 컴퓨터프로그램.If there is no feature information combination having a higher performance than the model performance among the plurality of feature information combinations, a computer program characterized by performing a process of delivering the immediately preceding feature information as the specific feature information combination to the recommendation unit. .
  24. 제 21 항에 있어서,The method of claim 21,
    상기 기 설정된 특정정보는 상기 전체 특징정보이며,The predetermined specific information is the entire feature information,
    상기 조합성능확인단계는,The combination performance check step,
    상기 전체 특징정보 내 각 단일 특징정보 별로 학습을 기반으로 생성되는 인공지능 모델의 성능 중 최대 성능이 상기 모델 성능 보다 높은지 확인하는 단일특징정보 성능 비교과정,A single feature information performance comparison process to determine whether the maximum performance of the AI model generated based on learning for each single feature information in the entire feature information is higher than the model performance,
    상기 최대 성능이 상기 모델 성능 보다 높지 않은 경우 상기 특징정보에서 서로 다른 하나의 특정정보를 제외한 상기 다수의 특징정보 조합을 설정하는 조합설정 과정,If the maximum performance is not higher than the model performance, a combination setting process of setting the combination of the plurality of feature information excluding one specific information different from the feature information,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합 각각을 특징정보로 재 설정하여, 재 설정한 각 특징정보에 대하여 상기 조합설정 과정이 반복 수행되도록 하는 재설정 과정,A reset process for resetting each combination of feature information having a performance higher than the model performance among the plurality of feature information combinations as feature information, so that the combination setting process is repeatedly performed for each re-set feature information,
    상기 다수의 특징정보 조합 중 상기 모델 성능 보다 높은 성능을 갖는 특징정보 조합이 존재하지 않는 경우, 직전의 특징정보를 상기 특정 특징정보 조합으로서 상기 추천부로 전달하는 과정을 수행하는 것을 특징으로 하는 컴퓨터프로그램.If there is no feature information combination having a higher performance than the model performance among the plurality of feature information combinations, a computer program characterized by performing a process of delivering the immediately preceding feature information as the specific feature information combination to the recommendation unit. .
  25. 하드웨어와 결합하여, 인공지능 모델 생성 시 학습에 이용되는 특징정보의 속성을 확인하는 속성확인단계;In combination with hardware, an attribute checking step of checking the attribute of the feature information used for learning when creating the artificial intelligence model;
    설정 가능한 전체 정규화 방식 중, 상기 특징정보의 속성에 따른 정규화 방식을 결정하는 결정단계; 및A determining step of determining a normalization method according to the attribute of the feature information from among all the settable normalization methods; And
    상기 결정한 정규화 방식을 추천하는 추천단계를 실행시키기 위하여 매체에 저장된 컴퓨터프로그램.A computer program stored in the medium to execute the recommendation step of recommending the determined normalization method.
  26. 제 25 항에 있어서,The method of claim 25,
    상기 결정단계는,The determining step,
    상기 특징정보 전체 필드에 동일한 정규화 방식 적용되는 경우라면,If the same normalization method is applied to all the feature information fields,
    상기 특징정보의 속성이 숫자 속성인 경우, 특징정보의 전체 숫자패턴에 따른 제1 정규화 방식을 결정하고,If the attribute of the feature information is a numeric attribute, a first normalization method according to the entire number pattern of the feature information is determined,
    상기 특징정보의 속성이 카테고리 속성인 경우, 특징정보의 전체 카테고리 개수로 정의되는 벡터(Vector) 내 특징정보의 카테고리 별로 지정된 위치에만 0이 아닌 특성값으로 표현하는 제2 정규화 방식을 결정하고,When the attribute of the feature information is a category attribute, a second normalization method for expressing as a non-zero characteristic value is determined only at a designated location for each category of feature information in a vector defined as the total number of categories of the feature information,
    상기 특징정보의 속성이 숫자 및 카테고리 조합 속성인 경우, 상기 제2 정규화 방식 및 제1 정규화 방식을 결정하는 것을 특징으로 하는 컴퓨터프로그램.And the second normalization method and the first normalization method when the attribute of the characteristic information is a combination attribute of a number and a category.
  27. 제 26 항에 있어서,The method of claim 26,
    상기 제1 정규화 방식은, 기 정의된 우선순위에 따라 Standard score 정규화 방식, Mean normalization 정규화 방식, Feature scaling 정규화 방식을 포함하며,The first normalization method includes a standard score normalization method, a mean normalization normalization method, and a feature scaling normalization method according to a predefined priority,
    상기 결정단계는,The determining step,
    특징정보의 전체 숫자패턴에 대한 표준편차 및 정규화 스케일링 범위 상/하한 존재 여부를 근거로, 상기 제1 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하는 것을 특징으로 하는 컴퓨터프로그램.A computer program characterized by determining a normalization method having the highest priority applicable among the first normalization methods, based on whether there is a standard deviation and a normalization scaling range upper / lower limit for the entire numeric pattern of the feature information.
  28. 제 25 항에 있어서,The method of claim 25,
    상기 결정단계는,The determining step,
    상기 특징정보 전체 필드에서 필드 별로 정규화 방식 적용되는 경우라면,If the normalization method is applied to each field in the entire feature information field,
    상기 특징정보에서 속성이 종류 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고,In the feature information, for the field of the attribute type attribute, the normalization method having the highest priority applicable to the normalization method of the normalization method or the feature scaling normalization method is determined,
    상기 특징정보에서 속성이 개수 속성의 필드에 대해서는 Mean normalization 정규화 방식, Feature scaling 정규화 방식 중 적용 가능한 가장 우선순위가 높은 정규화 방식을 결정하고,In the feature information, for the field of the number attribute of attribute, the normalization method having the highest priority applicable among the means normalization method and the feature scaling normalization method is determined,
    상기 특징정보에서 속성이 비율 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키거나 또는 Standard score 정규화 방식을 결정하고,In the feature information, a normalization method is not determined for a field whose attribute is a ratio attribute and is excluded from normalization, or a standard score normalization method is determined,
    상기 특징정보에서 속성이 존재 여부 속성의 필드에 대해서는 정규화 방식을 미 결정하고 정규화 대상에서 제외시키는 것을 특징으로 하는 컴퓨터프로그램.A computer program characterized in that the normalization method is determined for the field of the attribute whether the attribute exists or not in the feature information and is excluded from the normalization target.
PCT/KR2018/015476 2018-11-17 2018-12-07 Artificial-intelligence model platform and method for operating artificial-intelligence model platform WO2020101108A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180142166A KR102271449B1 (en) 2018-11-17 2018-11-17 Artificial intelligence model platform and operation method thereof
KR10-2018-0142166 2018-11-17

Publications (1)

Publication Number Publication Date
WO2020101108A1 true WO2020101108A1 (en) 2020-05-22

Family

ID=70731462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/015476 WO2020101108A1 (en) 2018-11-17 2018-12-07 Artificial-intelligence model platform and method for operating artificial-intelligence model platform

Country Status (2)

Country Link
KR (1) KR102271449B1 (en)
WO (1) WO2020101108A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306829A (en) * 2020-10-12 2021-02-02 成都安易迅科技有限公司 Method and device for determining performance information, storage medium and terminal

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102357630B1 (en) * 2020-07-10 2022-02-07 한국전자통신연구원 Apparatus and Method for Classifying Attack Tactics of Security Event in Industrial Control System
KR102532757B1 (en) * 2020-09-24 2023-05-12 서강대학교산학협력단 Apparatus for predicting dissolved gas concentration in aqueous solution based on Raman spectral signal and method therefor
KR102470364B1 (en) * 2020-11-27 2022-11-25 한국과학기술정보연구원 A method for generating security event traning data and an apparatus for generating security event traning data
EP4254237A4 (en) * 2020-11-27 2024-10-30 Korea Inst Sci & Tech Inf Security data processing device, security data processing method, and computer-readable storage medium for storing program for processing security data
KR102433830B1 (en) * 2021-11-10 2022-08-18 한국인터넷진흥원 System and method for security threats anomaly detection based on artificial intelligence
CN116151601A (en) * 2021-11-15 2023-05-23 中兴通讯股份有限公司 Stream service modeling method, device, platform, electronic equipment and storage medium
KR20230076389A (en) 2021-11-24 2023-05-31 주식회사 윈스 Method and apparatus for generating artificial intelligence-based reconnaissance false positive identification model and method and apparatus for artificial intelligence-based reconnaissance false positive identification
KR102620130B1 (en) * 2021-12-08 2024-01-03 한국과학기술정보연구원 APT attack detection method and device
KR102381776B1 (en) * 2021-12-24 2022-04-01 주식회사 코난테크놀로지 Apparatus for data processing for simultaneously performing artificial intelligence function processing and data collection and method thereof
KR102491688B1 (en) * 2022-02-03 2023-01-26 주식회사 데이터스튜디오 Control method of electronic apparatus for determining predictive modeling of direction of financial investment products

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6018345B2 (en) * 1981-07-24 1985-05-09 東洋工業株式会社 3-piece knotless net knitting machine and knitting method
KR101623071B1 (en) * 2015-01-28 2016-05-31 한국인터넷진흥원 System for detecting attack suspected anomal event
US20160358099A1 (en) * 2015-06-04 2016-12-08 The Boeing Company Advanced analytical infrastructure for machine learning
KR20180080111A (en) * 2017-01-03 2018-07-11 한국전자통신연구원 Data meta-scaling Apparatus and method for continuous learning
KR20180120056A (en) * 2017-04-26 2018-11-05 김정희 Method and system for pre-processing machine learning data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3139297B1 (en) * 2014-06-11 2019-04-03 Nippon Telegraph and Telephone Corporation Malware determination device, malware determination system, malware determination method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6018345B2 (en) * 1981-07-24 1985-05-09 東洋工業株式会社 3-piece knotless net knitting machine and knitting method
KR101623071B1 (en) * 2015-01-28 2016-05-31 한국인터넷진흥원 System for detecting attack suspected anomal event
US20160358099A1 (en) * 2015-06-04 2016-12-08 The Boeing Company Advanced analytical infrastructure for machine learning
KR20180080111A (en) * 2017-01-03 2018-07-11 한국전자통신연구원 Data meta-scaling Apparatus and method for continuous learning
KR20180120056A (en) * 2017-04-26 2018-11-05 김정희 Method and system for pre-processing machine learning data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306829A (en) * 2020-10-12 2021-02-02 成都安易迅科技有限公司 Method and device for determining performance information, storage medium and terminal
CN112306829B (en) * 2020-10-12 2023-05-09 成都安易迅科技有限公司 Method and device for determining performance information, storage medium and terminal

Also Published As

Publication number Publication date
KR102271449B1 (en) 2021-07-01
KR20200057903A (en) 2020-05-27

Similar Documents

Publication Publication Date Title
WO2020101108A1 (en) Artificial-intelligence model platform and method for operating artificial-intelligence model platform
WO2016017975A1 (en) Method of modifying image including photographing restricted element, and device and system for performing the method
WO2018117619A1 (en) Display apparatus, content recognizing method thereof, and non-transitory computer readable recording medium
WO2016089009A1 (en) Method and cloud server for managing device
WO2012074338A2 (en) Natural language and mathematical formula processing method and device therefor
WO2010021527A2 (en) System and method for indexing object in image
WO2017084337A1 (en) Identity verification method, apparatus and system
WO2016032021A1 (en) Apparatus and method for recognizing voice commands
WO2020222539A1 (en) Hub device, multi-device system including the hub device and plurality of devices, and method of operating the same
WO2023153818A1 (en) Method of providing neural network model and electronic apparatus for performing the same
WO2019177182A1 (en) Multimedia content search apparatus and search method using attribute information analysis
WO2023080379A1 (en) Disease onset information generating apparatus based on time-dependent correlation using polygenic risk score and method therefor
WO2017054592A1 (en) Interface display method and terminal
WO2014021567A1 (en) Method for providing message service, and device and system therefor
WO2021215787A1 (en) Wireless ip camera detection system and method
WO2019035491A1 (en) Method and device for user authentication
CN107113177A (en) Data cube computation, transmission, reception, the method and system of interaction, and memory, aircraft
WO2021040419A1 (en) Electronic apparatus for applying personalized artificial intelligence model to another model
WO2019000466A1 (en) Face recognition method and apparatus, storage medium, and electronic device
WO2017113587A1 (en) Method and apparatus for creating wep password
WO2022075609A1 (en) Electronic apparatus for responding to question using multi chat-bot and control method thereof
WO2020017827A1 (en) Electronic device and control method for electronic device
WO2017188497A1 (en) User authentication method having strengthened integrity and security
WO2021187733A1 (en) Electronic device and control method thereof
WO2024185962A1 (en) Water quality management device, operation method therefor, and water quality management method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18939927

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18939927

Country of ref document: EP

Kind code of ref document: A1