Intro to the ELK Cluster

A bit better than tail

James Fuller, DevOps for Web Express
Office of Strategic Relations

Follow along at: http://rawgit.com/jwfuller/elk_tech_talk/master/index.html#/

Afternoon!

  • I work on the Web Express team.
  • We run a SaaS that currently has ~1000 instances of Drupal (a CMS written in PHP).

Overview

  • Our logging problem
  • Tour of the ELK stack
  • Easy to deploy - examples of some basic configuration
  • Easy to use - answering real questions

Web Express and it's logging problem

Everything has data

Log files

  • Drupal application
  • Drupal authentication (for ITSO compliance)
  • Apache access
  • Apache error
  • MySQL slow transaction
  • Python application
  • Apache for python application
  • MongoDB for python application
  • Varnish access
  • Error logging from 3rd party hosting
  • Logs from CI server

Metrics

  • DevOps applications
  • CI server
  • Sysadmin-y stuff like CPU and memory utilzation

They are all over the place

  • Drupal and Apache logs come from 5 servers
  • Varnish logs from 4 servers
  • Databse logs from 2 servers
  • The cluster has ~16 servers
  • That is just in our Production cluster, we also have DEV and TEST enviroments

No one cares unless something goes wrong

When something goes wrong, you need grep, awk, | and RegEx black magic to find what you are looking for

grep "Invalid user" /var/log/httpd/auth.log | grep -Eo "([0-9]{1,3}\.){3}[0-9]{1,3}" | uniq

Enter ELK

Cat wearing antlers hat

http://www.lalocadelosgatos.com/wp-content/uploads/2012/10/gato-disfraz-ciervo.jpeg

Elasticsearch logo
logstash logo
Kibana logo
Beats logo

Gathering logs

Beats

Lightweight shippers written in Golang that send data from edge machines to Logstash and Elasticsearch.

  • Filebeat
  • Metricbeat
  • Packetbeat
  • Winlogbeat
  • Heartbeat
  • And a bunch of community beats

Beats

  • Has templates for some common use cases
  • Can handle basic parsing, like NCSA common log format

Broker

You don't have to use a broker, but there are some benefits

  • Enhances performance of the indexer
  • Provides some resiliency if the indexer goes down.

Redis is commonly used as a broker.

Beats now fufills this need to some degree.

Transform

Logstash has three basic parts

  • Input - where is this coming from?
    • File, Beat, HTTP, etc.
  • Filter - break monolithic messages into data
    • Grok, date, geoip, mutate, etc.
  • Output - what should we do with the events after we chop?
    • Elasticsearch, a ton of messaging protocols, other databases

Store and index

Elasticsearch

Elasticsearch

  • Distributed, multitenant-capable full-text search engine built on Apache Lucene
  • Provides a RESTful web interface and schema-free JSON documents
  • Logstash creates indicies in Elasticsearch
  • Indicies are divided into Shards
  • Shards are replicated across Nodes

Visualization

Types include:

  • Bar, line, pie and dounut charts
  • Data tables
  • Maps

Kibana

  • Browser based analytics and search dashboard for Elasticsearch
Kibana dashboard

Input


input {
  beats {
    port => 5044
    ssl_certificate => "/usr/local/openssl/certs/crt/beat-selfsigned.crt"
    ssl_key => "/usr/local/openssl/certs/key/beat.key"
  }
  file {
    type => "mysql-slow"
    path => "/var/log/mysql/mysql-slow.log"
    codec => multiline {
      pattern => "^# User@Host:"
      negate => true
      what => previous
    }
  }
}
          

Filter - Varnish


filter {
  if [type] == "varnishncsa" {
    grok {
      match => { "message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:varnish_timestamp}\] \"%{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response} %{NOTSPACE:bytes} %{QS:referrer} %{QS:agent} %{QS:varnish_response} %{QS:varnish_backend} %{QS:varnish_f5_destination}' }
    }
    mutate {
      convert => [ "bytes", "integer" ]
      add_field => { "signal" => "signal" }
    }
    if [bytes] == "-" {
      mutate {
        replace => { "bytes" => 0 }
      }
    }
    date {
      locale => "en"
      match =>["varnish_timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
      target => ["@timestamp"]
      add_tag => [ "tmatch" ]
    }
  }
}
          

Grok

From the logstash docs:

Grok is currently the best way in logstash to parse crappy unstructured log data into something structured and queryable.

Grok

Grok pattern include:

  • Integers, Numbers, Words, Spaces
  • IPv4, IPv6, MAC addresses
  • URIs, Pathes (unix and windows)
  • Whole mess of date/timestamp components and formats
  • Common log formats: Syslog, Apache, Ruby, Java

Filter - Apache


            filter {
              if [type] == "access" {
                grok {
                  match => {
                    "message" => '%{IPORHOST:first_ip}? %{COMBINEDAPACHELOG}'
                  }
                }
                date {
                  locale => "en"
                  match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
                  timezone => "America/Denver"
                  target => ["@timestamp"]
                  add_tag => [ "tmatch" ]
                }
                mutate {
                  rename => [ "clientip", "varnish_ip"]
                  rename => [ "first_ip", "client_ip"]
                  convert => [ "bytes", "integer" ]
                }
                geoip {
                  source => "client_ip"
                }
              }
            }
          

Outputs


            output {
              if [type] == 'drupal_syslog' {
                elasticsearch {
                  index => "logstash-drupal_syslog-%{+YYYY.MM.dd}"
                  host => "welastic.colorado.edu"
                }
              } else if [mysql] == "mysql" {
                elasticsearch {
                  index => "logstash-mysql-%{+YYYY.MM.dd}"
                  host => "welastic.colorado.edu"
                }
              } else if [varnishncsa] == "varnishncsa" {
                elasticsearch {
                  index => "logstash-varnishncsa-%{+YYYY.MM.dd}"
                  host => "welastic.colorado.edu"
                }
              } else {
                elasticsearch {
                  host => "welastic.colorado.edu"
                }
              }
            }
          

Stories from the Support queue

I could use some post-launch support and help for {URL}

At times, including right now, the home page displays the words "Information for People" instead of the correct home page. I thought I had made it go away myself just before launch, but clearly not.

It may have started when a few weeks ago when I renamed the page that is now "Overview" to "Home."

Thanks for your help.
Pat

A few weeks? Lets grab 30 days of logs

Kibana dashboard show 163 million events

163 million events

Limit search to Production and our Application log

gif of Kibana dashboard filtering from 163 million to 10 million events

Down to 10 million events

Limit further to the specific instance and for 'Interference'

gif of Kibana dashboard filtering from 10 million to 309 events

Down to 309 events

Limit further to 'content update' events

gif of Kibana dashboard filtering from 309 to 14 events

14 events

Answer?

User and another person got into an edit war

Is this build stable on Test and ready for Production?

Kibana dashboard showing an increase in error rate
Kibana dashboard selecting a specific time range

Answer?

Nope, increase in 'signal' logging

Future for OSR

  • Adding beats, particularly Metrics
  • Getting more logs
  • Grafana - Trending, analytics, alerting.

Questions?

Other tools

  • Graylog
  • Splunk ($$$)
  • Saas (Loggly, Papertrail)

Elasticsearch API

curl 'localhost:9200/_cat/health?v'

            epoch      timestamp cluster       status node.total node.data shards pri relo init unassign
            1394735289 14:28:09  elasticsearch yellow        1         1      5   5    0    0        5
          
curl 'localhost:9200/_cat/indices?v'

            health status index    pri rep docs.count docs.deleted store.size pri.store.size
            green  open   varnish    5   0      11434            0       64mb           32mb
            green  open   access     5   0       2030            0      5.8mb          5.8mb
            green  open   mysql      5   0       1054            0      8 8mb           45mb
            green  open   drupal     5   0      12030            0       1.2G           1.2G
          

Additional Kibana Vizualizations

Default screen for Kibana
Kibana vizualization creation screen
Kibana map vizualization
Kibana Dashboard of 4 indicies
Kibana Dashboard of Web Express Overview
Kibana Dashboard of Web Express Overview

Filter - Drupal Syslog


filter {
  if [type] == "drupal_syslog" {
    grok {
      match => {
        "message" =>
        '%{SYSLOGBASE} %{URI:drupal_base_url}\|%{INT:drupal_unix_timestamp}\|%{DATA:drupal_category}\|%{IP:ip}\|%{URI:drupal_request_url}\|(?:%{URI:drupal_referrer}|)\|%{INT:drupal_uid}\|(?:%{URI:drupal_link}|)\|%{GREEDYDATA:drupal_message}' }
      }
      mutate {
        gsub => [ "drupal_category", "\s", "_"]
        add_field => { "signal" => "signal" }
      }
    }
}
              

Filter - MySQL Slow Queries


filter {
  if [type] == "mysql" {
    grok {
      match => [
        "message",
        "^# User@Host: %{USER:user}(?:\[[^\]]+\])?\s+@\s+%{HOST:host}?\s+\[%{IP:ip}?\]"
      ]
    }
    grok {
      match => [
        "message",
        "^# Query_time: %{NUMBER:duration:float}\s+Lock_time: %{NUMBER:lock_wait:float} Rows_sent: %{NUMBER:results:int} \s*Rows_examined: %{NUMBER:scanned:int}"
      ]
    }
    grok {
      match => [
        "message",
        "^SET timestamp=%{NUMBER:timestamp};"
      ]
    }
    date {
      match => [ "timestamp", "UNIX" ]
    }
  }
}