Log Analytics with ElasticSearch, Kibana, and Logstash

Update: Automate analytics and monitoring install and configuration.

How often do you find yourself ssh’ing into a server or multiple servers and grepping through log files trying to identify a problem?  It’s like looking for a needle in a hay stack.  It takes a lot of time, you have to remember where each different log file is located on the filesystem, and there is no way to visually represent your log data.  SSH’ing and grepping is only for the technically savvy…   There is a better way!

There are multiple tools available to help “aggregate” your log data from multiple sources into a single central location.  Some of these tools also provide added benefits like log parsing and normalization of things like timestamps, full-text indexing for efficient searching, and visualization of log data through a web based dashboard.

We will be using a combination of multiple open source tools that integrate well together to provide each of the above described benefits.

ElasticSearch



Is a highly scaleable full-text indexing, search, and storage system built on top of Apache Lucene.  ElasticSearch provides a REST API for searching and analyzing your data.

Elasticsearch allows you to start small, but will grow with your business.  It is built to scale horizontally out of the box.  As you need more capacity, just add more nodes, and let the cluster reorganize itself to take advantage of the extra hardware.

ElasticSearch will be our storage and search engine for our data.

Logstash



Logstash helps you take logs and other event data from your systems and store them in a central place.

Logstash is mainly used as the server receiving the incoming data to aggregate.  It can move the received data through a series of “filters” to transform and/or normalize the data, before sending the data to an output.  In our case the output for our Logstash server will be ElasticSearch for storage and indexing.

LogStash v1.2.1 is the latest as of these notes.
Here is a great (and funny) Logstash presentation from PuppetConf 2012 discussing sysadmin duties (that suck sometimes), log formats, crazy date formats, monitoring, and how to make sysadmin lives better:

Logstash can also be used to “ship” the data from the remote machines that have log files you want to aggregate and monitor.  Since Logstash is written in Java it does have a significant memory footprint (typically 512MB to 1GB) as well as requiring Java to be installed on each system.  There are lighter weight solutions to “shipping” data from remote servers to the central Logstash receiver…  like Lumberjack which we will discuss next.

Lumberjack

Lumberjack is a light weight daemon for “shipping” log data to a central location.  It’s memory usage has been reported to consistently be about 64K making it a much lighter process than using Logstash to ship.

Update: Lumberjack has now been renamed simply “Logstash Forwarder” and taken under the umbrella of ElasticSearch products is seems – https://github.com/elasticsearch/logstash-forwarder. The above Lumberjack link now redirects here.

Kibana

Kibana is the front-end web dashboard for searching, analyzing, and exploring you data that is stored within ElasticSearch.  It is completely standalone HTML, CSS, and javascript meaning there is no backend infrastructure required besides the ElasticSearch server(s) that it will communicate directly with.  This means all that is required is a simple webserver, like Apache, to serve up the static HTML, CSS, and javascript files. Below are a couple images of the Kibana dashboard.


Our installation setup

Since we are just starting out with these tools we will be begin with a fairly simple depolyment and installation.  We will run ElasticSearch, the Logstash receiver, and Kibana all on the same server.  Lumberjack will get installed on each of our remote servers that we wish to monitor.

ElasticSearch 0.90.5 is the latest as of these notes. Below getting started link for LogStash indicates that ElasticSearch 0.90.3 must be used however.

http://logstash.net/docs/1.2.1/tutorials/getting-started-centralized

Note: Redis is not required if using Lumberjack to ship logs instead of LogStash to ship.

Central aggregation server:

  1. Make sure Java is installed.  The latest version of Java 7 is a good choice.  ElasticSearch and Logstash both require Java.
  2. Download and install ElasticSearch:
        wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.3.deb
        sudo dpkg -i elasticsearch-0.90.3.deb
        

    This will install and start ElasticSearch.  It creates a start/stop script in /etc/init.d

  3. Download the LogStash JAR:
        mkdir -p /opt/logstash/conf
        cd /opt/logstash
        wget https://download.elasticsearch.org/logstash/logstash/logstash-1.2.1-flatjar.jar
        
  4. Generate an SSL certificate for verifying authenticity of data being received by Logstash:
        sudo openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout /etc/pki/tls/private/logstash.key -out /etc/pki/tls/certs/logstash.crt
        
  5. Create a config file for Logstash (See: http://logstash.net/docs/1.2.1/filters/multiline about the multiline filter used in the below config.):
    /opt/logstash/conf/logstash.conf

    input {
      lumberjack {
        port => 5555
        type => notype
        ssl_certificate => "/etc/pki/tls/certs/logstash.crt"
        ssl_key => "/etc/pki/tls/private/logstash.key"
      }
    }
    filter {
      if [type] == "java" {
        multiline {
          pattern => "^\s"
          what => "previous"
        }
        multiline {
          pattern => "^Caused by:"
          what => "previous"
        }
      }
    }
    output {
      elasticsearch {
        host => "127.0.0.1"
      }
    }
    
  6. Run Logstash (You should create an upstart script/job for running and managing this process. RedHat 6 now supports upstart too!):
        cd /opt/logstash
        java -jar logstash-1.2.1-flatjar.jar agent -f conf/logstash.conf &
        
  7. Install Kibana – Place the Kibana html, css, and javascript files onto your webserver (this example installs apache webserver and puts Kibana into the webroot):
        sudo apt-get install apache2
        cd /var/www
        wget https://download.elasticsearch.org/kibana/kibana/kibana-latest.tar.gz
        tar zxf kibana-latest.tar.gz
        mv kibana-latest kibana
        
  8. You should now be able to point your browser to: http://<your_central_log_host>/kibana

Remote server installs of Lumberjack:

I plan on a later blog entry that will show how to automate the installation and configuration of lumberjack via Puppet on a cloud VM utilizing userdata and cloud-init to bootstrap the whole process.

  1. Download and install the binary and packaged Debian package:
    1. Retrieve the deb package for lumberjack (I had to build from source). Once you have your deb package created a good place to keep this is in object storage if using cloud infrastructure, like S3 for AWS or Swift for OpenStack. Once the installer is available in a linux package repository it might be better to install via your package manager.
    2. sudo dpkg -i lumberjack_0.1.2_amd64.deb
      1. This will install the Lumberjack binaries to /opt/lumberjack/bin
      2. A rpm installer can also be created if running RedHat based systems. See: https://github.com/elasticsearch/logstash-forwarder#packaging-it-optional
  2. Create the Lumberjack config file.
    /etc/lumberjack.conf
    "network": {
      "servers": [ "<host_or_instance_private_IP>:5555" ],  # same port as lumberjack input in logstash config
      "ssl ca": "/etc/pki/tls/certs/logstash.crt",  # same cert as logstash server
      "timeout": 15
    },
    "files": [
      {
        "paths": [
          "/var/log/syslog"
        ],
        "fields": { "type": "syslog" }
      }, {
        "paths": [
          "/var/log/apache2/*.log"
        ],
        "fields": { "type": "apache" }
      }, {
        "paths": [
          "/var/log/tomcat6/catalina.out"
        ],
        "fields": { "type": "java", "source": "tomcat" }
      }
    ]
    
  3. scp (or transfer by some other means) the SSL .crt file generated on the Logstash server to this remote server
    1. place it at the same path as on the Logstash server for consistency: /etc/pki/tls/certs/logstash.crt
    2. this path is used as the value in the above Lumberjack config for the “ssl ca” value in the “network” section.
  4. Run the Lumberjack shipper:
        sudo service lumberjack start
        

Now generate some logs and search for them in your browser using Kibana!