SPLK-1003 Splunk Enterprise Certified Admin – Splunk Post Installation Activities : Knowledge Objects Part 2
- Source Parameter Explaination
The next one of the default field is the source field which is similar to our source type field but it typically holds the location information of the logs. But this can be renamed to hold much more meaningful instead. Of just the location of the logs like the method used for collection of the logs for example, like Bash Python API or PowerShell this can be renamed to hold much meaningful information we’ll see from our Windows locks the last 24 hours that are collected from our universal forwarder. You can see the source contains PerfMon CPU load, PerfMon available memory and the source type also contains similar values.
This represents the method that is performance monitoring. We are collecting CPE load performance monitoring network interface which holds much meaningful information rather than simply holding the location we will see in our lab that is combined with source type source how we can modify this information using editing the configuration files. That is inputs conf.
- Field Extraction Using IFX
In this discussion we will be going through one of the most important configuration of Splunk, that is Field Extractions. Say you have 100 GB of data and you are dumping this data into Splunk and see it is a complete chunk because you will not be able to make any sense out of the data in Splunk. Without fields extraction, the data is meaningless. When we talk about field extraction in Splunk, the most important configuration which holds this field extraction configuration is Props and transforms conf. Well, we have seen similar to inputs and outputs during data collection. The Props and Transforms holds the field extraction information. The Props is a file where you will be configuring line breaks, setting character encoding to set processing of binary files. By default, binary file processing in Splunk is disabled and also to set the timestamp and field overrides like source, source type, fields extraction and many more. We will see in our further lectures.
Most of these features for fields extraction, there are multiple ways of doing it in Splunk. We will see one by one and let us start with IFX. The IFX stands for Interactive Field Extractor where it is for complete beginners who want to quickly extract a field and it is completely supported in your Splunk web that is GUI. So probably you can create your own field extractions in the lab where it will be a good exercise to create some field extractions using IFX to access this IFX, just type any search which you want to extract fields upon. I’ll search by default index, that is index is equal to me and you can select all fields and click on this Extract new Fields. Or you can scroll down to the below where you will get extract new fields. So either way you can enter IFX mode.
Once you enter IFX mode, make sure you have your source type defined so that it is used for your extraction. Let me select one of the source type for which I need to extract the logs. Let us go back couple of days where we have our access log information because that is one of the logs where you have much detailed information and loads of stuff which will add meaningful value. So source type is equal to I’ll choose Access Combine. Here I’ll show you a sample example how to extract using Delimiter. When we go to other commands like in our discussion the Racks and the rejects commands, we will be discussing how to use regular expression to use or extract this field. For simplicity of understanding IFX, we’ll be just creating using Delimiter options that are available in IFX I’ll click on all Fields, click on Exact new Fields to enter IFX mode. Now we have source type so it is showing up these fields. So I have selected one of the sample events.
You can select any of the sample events. I’ll select one you can click next. Regular expression will be deli as part of our next tutorials where we’ll understand basics of regular expression rex and rejects commands. For now, click on Delimiters. Click Next. So here you can select your delimiter whether it is a comma, separated space tab or pipe or any other delimiter you can mention. Here we know it is space. As you can see, it breaks down all the fields based on your space as your delimiter which you have provided. You can rename these fields and click on Next. It will save all these fields and also if you want to quickly review, you can review these fields here. Let’s say I’ll change my field to IP address and rename this fields. I can see all the fields that are extracted for testing purposes. It does sample on only 1000 events. If you want more, you can increase it.
So here you can see it is not extracting this IP address in the beginning it is not extracting in our first two but it is extracting in our third event. This is based on basic delimiter functionality but if we go to regex in our previous method where you write common regex patterns and you’ll be extracting each fields more accurately these are for basic IFX understanding purposes where you choose choose a field, rename it to IP address click Next. Give it a name like Custom Field or my first Field extraction. So I’ll give it some simple name if you keep it to yourself that is the creator of this extraction, it will be visible only for you. If you click on App anybody uses Search and Reporting they will be able to see your newly added fields. Similarly, if you click on all apps, it will be global so anybody who uses plug will be able to see your newly added field. I will click on finish. I’ll just click on to go back to my search and see how my newly created fields are looking. Look for all time look for field see these are the fields that we have now recently extracted using our IFX since we didn’t rename all the fields but these are space delimited auto extracted using If.
- Field Extraction Using REX
In our previous lecture we have seen how to extract fields using, IFX that is interactive, field extractor. And now let us see one more method of field extracting that is using Rex and Regex commands which is available as part of your search commands. And these are commands that you can use as part of your search query. The fields might be gone once your search query is changed and that is one of the major benefits of Splunk. That is nothing is permanent in Splunk. Anything you can create and destroy like fields, alerts, reports and dashboards which as an admin will give you major benefits of creating these objects and can be dropped if these are not adding any value and start from the scratch again. And also we’ll see as part of our lab exercise how to extract the fields using Rex and Regex and how we can make them permanent using props and transforms if we go to our searcher.
So this is our search. So this is the basic practice I follow. I access log, let’s say and out of this, let’s assume we are not parsing the logs. Splunk is not able to understand these logs and it looks something like this. We don’t have any information and I need this IP address information to be shown up here or any of these fields. What I’ll do, I regularly use a website called Rubella. com for practicing my rejects on my test data or sample data. You can call it as here there is a quick reference which can give you a good idea on how to use rejects or Rex. So here I’ll be using my IP address. I know always it starts the starting first field is my IP address. So there is a syntax for starting off the line or a regular expression that is used to determine starting of the line. I’ll type that, that is the carrot symbol. As you can see, I have not matched any condition I’ve just mentioned my field comes at the beginning of the line.
So it is already showing me the matching results. That is we have not matched anything. Once we match any of this stuff that will be highlighted, we’ll see them as we progress. The next step is I know IP address is always a number. So what I’ll do, I’ll look for a digit that is backslashd. So I’ve selected Backlash D. As you can see, our first digit in our IP address is highlighted. So what do I do? I need one to three digits. We all know the IP address varies from one to three digits. So what do I do? I can either go for something like this. This is if you put it in a curly braces, it represents one, two, three, any values that is one to three it will match. As you can see here, peer wrote starting of the line, a digit which can have a minimum of one and a maximum of three characters.
So it has matched R 91. The next one is the dot. So in order to mention any single character, there is already a dot in reject syntax. But if you want to match a dot itself, you need to escape that is backwards. Then enter your dot. So that as you can see, now we have matched our 91 followed by dot. The next is simple. Just copy our previous and paste it next to our dot. As you can see, we again match it. It digits it, which can be minimum of one and a maximum of three. So now we have our second update of our IP address where we need to match the rest of the stuff. So what I’ll do, I’ll just copy the command complete with a dot, so that I know after the second it will be the next dot. So here, as you can see, we have successfully masked our IP address, which is our primary focus of interest. So how do you add this as a field?
- Adding Field Extraction to Search
To add this as a field, there is a syntax for rejects field Definition that is open, braces, question mark, lesser and greater sign here. This is your field name. This is your field condition, that is matching condition, condition, ending condition and Beginning Condition. So in our condition or in our scenario of extracting an IP address, here we have a beginning condition of carrot symbol that demonstrate the starting of the line. Then we have the definition, the Field match condition. I’ll copy it from here. This is our IP matching field condition. So as you can see, this is part of our definition open dresses, question Mark followed by less than and greater than.
After matching condition closes, then the ending condition. Here we can see after IP address terminates, there is a space followed by iPhone. So I’ll add yes for space. As you can see here, any white space character for a slash, S and iPhone. Because iPhone is a special character, we’ll escape and mention the character it should match. So this is our ending condition. So here we have our complete rejects to extract IP address from our logs. So now we have our rejects. Let’s see how we can use our search command, that is Rex. So I’ll use Rex without mentioning anything. We can mention field values on multiple, but for simplicity we’ll keep it as it is rex and I copy pasted my rejects, whichever I built here and the same rejects with field definition. You can paste it here so that you will get an idea. You have extracted an IP address and it matches this value. So the beginning condition, the field name definition, matching condition and your ending condition. So make sure you extract these fields as part of your lab exercise.
Once you are thorough with field extraction in Splunk, nobody can stop you from getting what you want from the data that is present in the Splunk. So just try to make sure you master your field extraction. If you have difficulties, just leave a comment in the discussion so that I’ll be able to assist you and probably find some of the easiest methods to make you understand. And always go through this reference in order to meet your criteria of matching any fields in your logs. So let’s moving on. So this, I just used a quotation and copy pasted my rejects which I wrote here. So I’ll hit Enter and let’s see if we get a field name IP address. As of now, here we can’t see, don’t try to go searching on these huge list of fields, just click on Hold all fields and type the field you are looking for.
As you can see, we already got IP address that has been extracted successfully and we see all the IP addresses values that are present. Now I’ll mark this as selected fields. So here we have a newly extracted IP Address field. So using this field I can write any commands stats count by IP address. As you can see, we have count based on IP address. Similarly, we can have IP location based on the newly extracted field. So this IP location by default adds three fields. That is, country one, region two, and city. So if you want to see this information, you can use geostats, which will display it in fantastic manner. So I’ll display geostats of the count. So Pi is not the recommended one. As you can see, the cluster map is the recommended one. So here you’ll be able to see all the location of the IP. Address field that you have extracted that was never extracted by default in Splunk. And we have plotted a graph completely on our custom field.
- REGEX searching in Splunk
We have learned how to extract the field using revex and now we will learn how to search using regular expression using the command reject. This is our base search. Consider we have run last 30 days. Search for source type is equal to access and I want only the search wherever match some specific item. Let’s say action is equal to add to Cart. I can just click on this, but in order to understand rejects, I’ll write a query where you match just add to Cart. Or you can mention slash word that is from our previous encounter with rejects. We know W means any word character, any word character that is let’s say we need to match add. So we’ll start with a followed by it should match another two word character. For add Cart it will be minimum one, maximum of two, similar to the IP address that we used previously. We are trying to match it’s better if we use it here.
So add cart. I’m trying to match my initial condition is it starts with a followed by a word character that is maximum of a minimum of one, maximum of two. As you can see, it has matched two terms add and card and there can be any number in between. And it ends with a T, so it ends with an actual T. So since it is actual character, we can’t escape because slash T represents a tab character. So we’ll keep it as it is. Here I’ll get sorry, here I’ll be getting all the strings that matches my rejects that is add to Cart. So this is one of the methods where if you want to filter all the logs that contain IP address or all the logs that contain some specific string which you know partially and which you know the syntax, how it occurs you can write a reject to just search for the locks instead of source type.
Index is equal to main and stuff. You can directly write your rejects pattern to match your search criteria. As you can see, there is an add to Cart which has been matched successfully and also purchase the purchase if you see the referral is add to Cart. So here it is matching so that our rejects is working fine. Wherever the add to Cart in the logs is mentioned, it is filtering those logs. So let’s say we are done previously, sorry, we had done previously to match IP address. We’ll say our rejects command to display us only IP address or the results which contain IP address. I’ve entered the rejects that is used for matching the IP address and we start to see.
We match all these events which has IP address in them. Let’s say if I try to add another byte for an IP which doesn’t make any sense, just to prove that we will not get any results because there is no five byte IP address. As you can see, we will not get any results because our Regex is trying to match something that is not there. Let us go back and we’ll see how our Regix is filtering all these IP addresses based on without any criteria of specifying the IP address. Let’s say if I want to search for all the IP addresses that are beginning with this string, so I can mention these information, the three information which I know it is expected, and the last as a regular expression, let us see those results. Let’s say if I want to search for complete subnet of information. As you can see, the only IP addresses that are extracted are from these two subnets. This is how you can use your regex to narrow down your searches in your day to day operations.