Authors:
Tobias Eljasik-Swoboda
1
and
Wilhelm Demuth
2
Affiliations:
1
ONTEC AG, Ernst-Melchior-Gasse 24/DG, 1100 Vienna, Austria
;
2
Schoeller Network Control GmbH, Ernst-Melchior-Gasse 24/DG, 1100 Vienna, Austria
Keyword(s):
Industrial Applications of AI, Intelligence and Cybersecurity, Machine Learning, Natural Language Processing, Trainer/Athlete Pattern, Log Analysis, Log Management, Event Normalization, Security Information and Event Management, Big Data.
Abstract:
When introducing log management or Security Information and Event Management (SIEM) practices, organizations are frequently challenged by Gartner’s 3 Vs of Big Data: There is a large volume of data which is generated at a rapid velocity. These first two Vs can be effectively handled by current scale-out architectures. The third V is that of variety which affects log management efforts by the lack of a common mandatory format for log files. Essentially every component can log its events differently. The way it is logged can change with every software update. This paper describes the Log Analysis Machine Learner (LAMaLearner) system. It uses a blend of different Artificial Intelligence techniques to overcome variety issues and identify relevant events within log files. LAMaLearner is able to cluster events and generate human readable representations for all events within a cluster. A human being can annotate these clusters with specific labels. After these labels exist, LAMaLearner lev
erages machine learning based natural language processing techniques to label events even in changing log formats. Additionally, LAMaLearner is capable of identifying previously known named entities occurring anywhere within the logged event as well identifying frequently co-occurring variables in otherwise fixed log events. In order to stay up-to-date LAMaLearner includes a continuous feedback interface that facilitates active learning. In experiments with multiple differently formatted log files, LAMaLearner was capable of reducing the labeling effort by up to three orders of magnitude. Models trained on this labeled data achieved > 93% F1 in detecting relevant event classes. This way, LAMaLearner helps log management and SIEM operations in three ways: Firstly, it creates a quick overview about the content of previously unknown log files. Secondly, it can be used to massively reduce the required manual effort in log management and SIEM operations. Thirdly, it identifies commonly co-occurring values within logs which can be used to identify otherwise unknown aspects of large log files.
(More)