Unlocking the Potential of Splunk: A Deep Dive into Its Core Components and Features

Ulises Magana
16 min readMay 16, 2023

--

When hearing about Splunk, what it is about and where they use it can seem a little bit overwhelming. This is why in this article I want to make it simple for you to get started using it, see what you can do with it and what are the essentials for you to know to do your own projects.

Splunk is a data analysis and monitoring tool that is highly useful when wanting to gain insights on the most relevant data obtained from machine generated data like log files, metrics, events, and even in IoT sensors, but also when you want to have more visibility on what are the tendencies on them. You can look for specific information on the files, getting useful statistics, creating reports and dashboards that are customized. It is used in different IT fields, such as Cybersecurity, IT operations, Data analytics, Cloud engineering, etc.

Domino’s Pizza successful and yummy story!

First, I would like to tell you why Splunk is so great. If you don’t feel yet convinced to learn about this tool, there’s a great success story where Domino’s Pizza implemented it on their ordering systems and gained valuable insights on their clients’ behavior, especially when it contributed to their most successful Super Bowl where they could manage to maximize their profits by learning more about their response time, the most used coupons and finding patterns and popularity in their toppings and sides.

You can learn more about this story in this video.

Photo by Brett Jordan on Unsplash

Where should you start?

My top recommendation is to follow this article since it covers the most important parts of the Splunk tutorial and gives a brief overview of the most relevant components.

By having followed the Splunk tutorial I could learn how to install Splunk in Windows and get started right away with it. Then I could explore the different capabilities it has to only then start uploading a zip file with the data that is required to work.

The first thing that you do in the tutorial is to learn how to look for specific data using search queries. Once you know how to do the previous, you move on to work with some of their commands to transform data, for example, in statistics. Afterwards, you can use your own sets of data to create lookups and get your data to be more meaningful and find relationships between them. Finally, you learn to analyze the information and get to interact with the visualizations Splunk provide like reports and dashboards.

Understanding the key concepts

Before putting hands on practice, let’s understand first the Splunk architecture and what each component does.

Splunk Architecture

In the picture above we can see how Splunk works. The data source which regularly is a file sends its data to the Splunk Forwarder to collect the logs. There are three types of forwarders:

  • Universal Forwarder: It is the most common forwarder used in Splunk and forwards data from a variety of sources, including logs, metrics, and other types of data.
  • Heavy Forwarder: Provides advanced features, such as parsing and transforming data before forwarding it to the Splunk instance. It is often used in environments where data requires additional processing before indexing. It analyzes, examines, and transforms the data to extract the relevant information, also known as event processing, where the data stream is broken into individual events.
  • Light Forwarder: This type of forwarder is designed specifically for IoT environments. It forwards data from sensors and other types of IoT devices. It also compresses and encrypts to help reduce data transmission costs and improve security.

Then the Splunk Indexer parses and stores the incoming data (indexing) as events and is made available for searching and analysis in Splunk’s search processing language, SPL. An event in Splunk is a record or message, it means it can be any kind of data such as logs, metrics, or other types of machine-generate data. This latter one contains a timestamp, source, host, and raw data.

Afterwards, the Splunk Search Head searches and retrieves the data from the indexer, this is shown in the Splunk Web Interface so that users can interact and visualize with the data.

Installing Splunk

To get started, go to their official website and create an account where it says ‘Free Splunk’.

Splunk’s official website

Once you have your account, go to ‘Free Trials and Downloads Page’.

Getting started with Splunk’s trials

Download Splunk Enterprise. In my case I downloaded it for Windows for practical purposes.

Splunk Enterprise for Windows

Follow the wizard to set up your admin account.

Splunk’s Windows Wizard

Start Splunk after it is installed and download the tutorial data.

Splunk Sign In

Recommendation: After signing in, navigate and explore all of what draws your attention from the Splunk Home Page. In that way, you can visualize all of what is mentioned previously and get a general idea of how Splunk works.

Splunk Home

Making sense out of the uploaded data

Click on ‘Add Data’ from the Splunk Home menu and upload the tutorialdata.zip file leaving it compressed.

Uploading the tutorial data

Since a ZIP file will be used, the Host setting needs to be modified to use a portion of the path name of the files within the ZIP file.

Host setting for Linux or MAC:

  • Select ‘Segment in path’
  • Segment number: 1

Host setting for Windows:

  • Select ‘Regular expression on path’
  • Regular expression: \\(.*)\/

Leave the ‘Source type’ and ‘Index’ as shown in the picture below.

Input Settings

Click review to check your input settings.

Input Review

Click submit to add the data and then start searching.

Input data submitted

Now you’re able to see the data on the Search App.

Search App

Data Summary

In the Apps Bar go to the ‘Search’ view.

Apps Bar

Then click on the ‘Data Summary’ button which is behind where it says ‘How to Search’.

The Data Summary section in Splunk provides the structure and content of the data that has been indexed to build search queries and reports. You can also find here the number of hosts and sources from which the data was collected, apart from the total number of events, their data size and the time range of them.

Hosts Data Summary
Sources Data Summary

The sourcetypes are a way of categorizing data based on where it came from or how it was generated. It helps Splunk to understand how to parse or interpret the data.

In the picture below, the data from the web server access log has a sourcetype of ‘access_combined’, while in other cases the data, for example, from a firewall log may have a sourcetype of ‘cisco:asa’. By categorizing data in this way, Splunk can better organize and make sense of the data as it is indexed and searched.

Sourcetypes Data Summary

Basic Searches, fields, search language, and subsearches

Basic search

  • Search for a ‘buttercupgames’ as a keyword in the events.
  • From the time range at the top right, select ‘All time’.
  • Note: As you can see below, there are several options when you want to look for a specific time, even you can look for data in real time.
Time Ranges

The keyword that you searched will be highlighted in yellow in the ‘Event’ column.

Keyword search results
Relative time ranges
  • You can type the following to understand more about the behavior of Splunk when searching:
# The Search Assistant shows matching data as you type.
category
# Finding errors using Boolean operators. 'AND' and 'NOT' are other Booleans.
# '*' is a wildcard, thus, below it will look for: 'fail', 'failed', 'failure', etc.
buttercupgames (error OR fail* OR severe)
# Retrieve only events from your web access logs
# Look for successful requests with the '200' HTTP status
sourcetype=access_* status=200

Fields sidebar

The fields sidebar displays the list of fields extracted from the events that match your search. It also shows the count of unique values for each field, allowing you to quickly identify the most relevant fields for your analysis.

You can use the fields sidebar to add, remove, and modify fields in your search, as well as to sort and filter your search results based on specific fields.

For example, to add a new field in the ‘Selected Fields’ you only need to click where it says ‘All Fields’ and then check the boxes of the fields you want to add, modify or remove.

Fields Sidebar
Adding fields to the Selected Fields

When you click in one of the selected fields as shown below, you’ll see its values, their counts, and the percentage each of them represent among all of them.

categoryId Reports and Values

When selecting one of the values from the fields, like the value ‘SIMULATION’ from the picture above, you’ll see how it is added on the search bar. Play with the fields by selecting them and they work like queries:

Adding to the search bar a value from the Fields Sidebar

Patterns, Statistics, and Visualizations Tabs

  • Patterns: It shows the most common patterns among the set of events according to your search.
  • Statistics: It populates when you run a search with transforming commands such as ‘stats’, ‘top’, ‘chart’, etc.
  • Visualizations: It also populates itself with transforming commands. Their results include a chart and the statistics table used to generate the chart.
Events, Patterns, Statistics, and Visualization Tabs

Search Processing Language (SPL)

SPL is mainly used to search, filter, and manipulate data. It allows to perform complex data analysis and extract meaningful insights from your data, refining your search results by using filters, grouping data, calculating statistics, and creating visualizations. In this way, you can find patterns, anomalies, trends and correlations.

In the Splunk tutorial, you want to find the most popular items bought at the Buttercup Games online store. In order to achieve this, the ‘top’ command is used to generate real-time statistics about the most frequently occurring values in a field.

  • Type the following command.
# The pipe character pass the results of one command to another command
sourcetype=access_* status=200 action=purchase | top categoryId

Since ‘top’ is a transformation command all of its results will be shown on the Statistics and Visualization tabs.

Statistics Results Tab
Visualization Results Tab — Column Chart

You can change the type of visualization as shown below:

Visualization Type
Visualization Results Tab — Pie Chart

Subsearches

The subsearch is embedded inside the main search and is used to filter the results of a search based on the results of another search, in other words, the results of the subsearch are used as input to the main search to refine the results.

They are useful when you need to perform searches that involve multiple search terms or search conditions. Besides that, they simplify your searches and make them more efficient at breaking them down into smaller and more manageable pieces. Finally, subsearches are evaluated first and you can find them enclosed in square brackets.

Note: In the following search, we can find the ‘AS’ operator which basically renames the columns on the fields in your search.

sourcetype=access_* status=200 action=purchase 
[search sourcetype=access_* status=200 action=purchase
| top limit=1 clientip
| table clientip]
| stats count AS "Total Purchased", distinct_count(productId) AS "Total Products", values(productId) AS "Product IDs" by clientip
| rename clientip AS "VIP Customer"
Subsearch to find the most frequent shopper

Understanding Lookups

Lookups are tables of field-value pairs to enrich the events with additional information. They are relevant since they add context to events, such as adding the product name to the product ID as will be shown in the examples.

Among the different types of lookups, we can find CSV lookups, key-value (KV) lookups, and external lookups which allow you to retrieve data from an API or an external database.

Overall, lookups can help you to better understand your data and gain additional insights. Apart from this, you can use the lookup command in SPL and add the lookup fields to your search results.

To enable field lookups, you need to follow the steps below:

  • Go to ‘Settings’ and select ‘Lookups’ under ‘Knowledge’.
Splunk Settings
  • Add a new option to create a new lookup table and click to ‘Add new’. In this tutorial, you will upload the prices.csv file.
  • Destination app: search
  • Destination filename: prices.csv
Adding a new lookup
Adding a lookup table file
  • Change the permission of the prices.csv lookup file to share it with other applications. Its default sharing is ‘Private’ and it should appear as ‘Global’.
Lookup table files
  • Select ‘All apps (system)’ and save.
prices.csv permissions
prices.csv set to Global
  • Define the information that is within the lookup table file to relate it to the fields in your event, this is known as lookup definition.
  • Go to the ‘Lookups’ breadcrumb.
Lookup table files
  • Add a new Lookup definition
Lookups
  • Destination app: search
  • Name: prices_lookup
  • Type: File based
  • Lookup file: prices.csv
Adding a new lookup definition
  • Your lookup definition should appear as below.
  • Go to the ‘Sharing’ column and select ‘Permissions’ to share the lookup definition with all the apps.
Lookup definitions
  • Select ‘All apps (system)’ and save.
prices_lookup permissions
  • In order to avoid writing the ‘lookup’ command whenever you’re searching, the lookup will be enabled to run automatically.
  • Add a new automatic lookup.
Lookups
  • Fill the fields as in the picture below.
  • Input fields: productId = productId
  • Output fields: product_name = productName
  • Output fields: price = price
  • Note 1: Lookup input fields are used to connect fields in lookup tables with fields in your events. These fields should have the same name and values in both the lookup table and the events. By doing this, you can link information from the lookup table with the events, making it easier to analyze and understand your data.
  • Note 2: When you create a lookup in Splunk, you can choose which fields from the lookup table file you want to add to your event data. These fields are called lookup output fields. You can also give them different names than what they are called in the lookup table file.
Adding a new automatic lookup
  • Share the automatic lookups with all the apps as it has been done above with the other lookups.
Automatic lookups
  • Move the price, productId, and productName fields from ‘Interesting Fields’ to ‘Selected Fields’. Just click on each of them and as shown below, select the ‘Yes’ button at the right of ‘Selected’.
productName values in ‘Selected Fields’
  • Perform the following search.
  • The results from the search are more meaningful since now the product names are coming from the lookup table and are clearer than to show only the productId for each of them. What Splunk did here was to make a relationship from the lookup with the productId and the productName field.
sourcetype=access_* status=200 action=purchase 
[search sourcetype=access_* status=200 action=purchase
| top limit=1 clientip
| table clientip]
| stats count AS "Total Purchased", dc(productId) AS "Total Products", values(productName) AS "Product Names" BY clientip
| rename clientip AS "VIP Customer"
Search results with lookups

Creating Reports, Charts, and Dashboards

In this last part of explaining the different capabilities Splunk has, we’re covering the use of reports with charts and then its integration into dashboards. These tools make it easier to interpret and communicate complex data, as they also enable users to spot patterns, identify anomalies, and make data-driven decisions. It’s relevant to know how to use them as they provide a visual context and communicate insightful findings to technical and non-technical audiences.

Reports

A report in Splunk can include tables, charts, and other visual elements and they are useful for sharing detailed findings or analysis in an organized format.

Using the last search we did on the lookups section, we will use it to save that as a report as shown in the following picture.

Saving a report
  • It will appear a popup. Write the details as below.
Saving a report
  • After saving it, select ‘Permissions’ on ‘Additional Settings’ and save as shown below.
Report’s Permissions
  • Finally, go to reports and you will see the VIP Customer report there.
Splunk’s Reports

Creating charts

Charts allow you to visualize trends, comparisons and distributions in your data with graphical representations, such as line charts, bar charts, or pie charts.

  • Perform the following search. Just as the ‘top’ command, the ‘chart’ command is a transforming command, so all the results will be only displayed in ‘Statistics’ and ‘Visualization’.
sourcetype=access_* status=200 
| chart count AS views count(eval(action="addtocart")) AS addtocart
count(eval(action="purchase")) AS purchases by productName
| rename productName AS "Product Name", views AS "Views", addtocart AS "Adds to Cart",
purchases AS "Purchases"
Search to create a new chart
Column Chart
  • Perform the following search.
  • Note 1: The eval command is used to create or modify fields in search results. It allows you to perform calculations, manipulate strings, extract values, and apply transformations to data.
  • Note 2: The count function counts the number of events that a field has.
sourcetype=access_* status=200 
| stats count AS views count(eval(action="addtocart")) AS addtocart
count(eval(action="purchase")) AS purchases by productName
| eval viewsToPurchases=(purchases/views)*100
| eval cartToPurchases=(purchases/addtocart)*100
| table productName views addtocart purchases viewsToPurchases cartToPurchases
| rename productName AS "Product Name", views AS "Views",
addtocart as "Adds To Cart", purchases AS "Purchases"
  • Go to the ‘Visualization’ tab and then go to the format button, just beside the types of charts.
  • Go to ‘Chart Overlay’ and in ‘Overlay’ add ‘viewsToPurchases’ and ‘cartToPurchases’ just as shown below.
Chart overlay
  • Finally, perform the following search and save the report as ‘Purchasing trends’.
sourcetype=access_* status=200 action=purchase
| chart sparkline(count) AS "Purchases Trend" count AS Total BY categoryId
| rename categoryId AS Category
Sparkline chart

Creating dashboards

Finally, we will cover dashboards in this article. Basically, they are customizable to combine multiple charts, reports, and other components into a single view. Additionally, they are useful when we want to have a comprehensive overview of our data when we want real-time data, tracking performance and key metrics at a glance.

  • Perform the following search.
sourcetype=access_* status=200 action=purchase | top categoryId
  • Go to ‘Save As’ and then ‘New Dashboard’. Then save the details as below.
Defining a new dashboard
  • Go to ‘Dashboards’ in the App bar to edit the new dashboard and look for it there.
  • To add new panels into the dashboard, go again to ‘Dashboards’ and then to ‘Add Panel’. You can add your saved reports there as shown in the picture below.
Adding a new panel into the dashboard
  • When you’re adding new panels to your dashboard, you can rearrange them by doing drag and drop and get something similar to the following pictures.
Adding reports to our Dashboard
Adding and customizing our dashboard

Summary and key takeaways

After having completed the above walkthrough, being inspired by a success story from Domino’s Pizza and understanding the different components of Splunk and its architecture, we should keep in mind its various capabilities and how it can help certain industries to monitor and troubleshoot IT infrastructure, detect anomalies in real-time, analyze logs, user activity, gain insights to propel business objectives, among others.

If you made it up to here, you already know how to search, transform and visualize data, apart from creating reports and dashboards to leverage all the powerful features Splunk offers. It also gives you better career prospects since it is used in a wide variety of fields, such as IT operations, cybersecurity, data analytics, cloud engineering, etc.

Now, let’s get to implement Splunk in future projects with other technologies get a better understanding of it!

--

--

Ulises Magana

Cloud & Infrastructure Engineer with diverse experience in software development, database administration, SRE & DevOps.