| Menu | Next Post: Elastic Anomaly Detection and Data Visualizer HandsOn|
For categorization analysis, the learning process is the same, but there are other steps to process the text.
The input data must be a text field, typically containing repeated elements such as log messages because it's not a natural language processing (NLP) and it works best on machine-written messages.
When you create a categorization anomaly detection job, the machine learning model processes the input text into different categories, identifying patterns over time, as you can see in this example:
Input text
Log message:
Jul 20 15:02:19 localhost sshd[8903]: Invalid user admin from 58.218.92.41 port 26062
Jul 20 15:02:19 localhost sshd[8903]: input_userauth_request: invalid user admin [preauth]
Jul 20 15:02:20 localhost sshd[8903]: Connection closed by 58.218.92.41 port 26062 [preauth]
Jul 20 17:10:23 localhost sshd[2074]: Received disconnect from 41.43.112.199 port 41805:11: disconnected by user
Jul 20 17:10:23 localhost sshd[2074]: Disconnected from 41.43.112.199 port 26062
Jul 20 17:10:23 localhost sshd[2072]: pam_unix (sshd:session): session closed for user ec2-user
Jul 20 19:14:55 localhost sshd[8944]: pam_unix (sshd:session): session closed for user ec2-user by (uid=0)
Jul 20 19:17:22 localhost runner: pam_unix(runuser-1:session): session closed for user ec2-user
Jul 20 19:17:22 localhost runner: pam_unix(runuser-1:session): session opened for user ec2-user by (uid=0)
Jul 20 19:17:23 localhost runner: pam_unix(runuser-1:session): session closed for user ec2-user
Step 1 - Remove mutable text
Mutable texts are not taken into account to not identify an anomaly or a pattern where there is no relevance as the value is always changing, e.g, date and time.
localhost sshd: Invalid user from port
localhost sshd: input_userauth_request: invalid user [preauth]
localhost sshd: Connection closed by port [preauth]
localhost sshd: Received disconnect from port disconnected by user
localhost sshd: Disconnected from port
localhost sshd: pam_unix session: session closed for user ec2-user
localhost sshd[8944]: pam_unix session: session closed for user ec2-user by (uid=0)
localhost runner: pam_unix session: session closed for user ec2-user
localhost runner: pam_unix session: session opened for user ec2-user by (uid=0)
localhost runner: pam_unix session: session closed for user ec2-user
Step 2 - cluster similar messages together
Which can mean a line or several lines that are part of a task, for example, and that are respecting a pattern.
->mlcategory:1
localhost sshd: Invalid user from port
->mlcategory:2
localhost sshd: input_userauth_request: invalid user [preauth]
->mlcategory:3
localhost sshd: Connection closed by port [preauth]
->mlcategory:4
localhost sshd: Received disconnect from port disconnected by user
->mlcategory:5
localhost sshd: Disconnected from port
->mlcategory:6
localhost sshd: pam_unix session: session closed for user ec2-user
localhost sshd[8944]: pam_unix session: session closed for user ec2-user by (uid=0)
localhost runner: pam_unix session: session closed for user ec2-user
localhost runner: pam_unix session: session opened for user ec2-user by (uid=0)
localhost runner: pam_unix session: session closed for user ec2-user
Step 3 - Count per time bucket
By processing analyzing time buckets, the behavior in a cluster can be better and easily identified for anomaly checking.
In the image below you can see an example of the graphic behavior of each ml category over time for a further time bucket analysis:
As an example, at a specific time bucket, we could see an mlcategory:1 followed by an mlcategory:4, twice:
mlcategory:1 -> mlcategory:4 -> mlcategory:1 -> mlcategory:4
.
We could call it bucket 1, as a reference, and so on, bucket 2...
| Menu | Next Post: Elastic Anomaly Detection and Data Visualizer HandsOn|
This post is part of a series that covers Artificial Intelligence with a focus on Elastic's (Creators of Elasticsearch) Machine Learning solution, aiming to introduce and exemplify the possibilities and options available, in addition to addressing the context and usability.
Top comments (0)