logstash


Jordan Sissel

DreamHost

# problems
## access * Who manages access? * How fine-graned are access controls? * Who needs access?
## consumption * who can understand/process the logs? * are these the same who need to consume them? * have any tools that can process the logs?
## curation * how long to keep logs? * until they fill up disk * ... and crash production * ... when you're asleep and on-call.
## compliance * PCI * FISMA * Sarbanes-Oxley * HIPAA
## tooling * what helps you do things with logs? * search, analyze, report! * too specialized (webalizer, etc) * don't scale (ssh + grep)

case study:

email at DreamHost

Help! I'm not getting my mail!

- A DreamHost Customer

## Why do we care? * does this need fixing? * what's the business requirement?

A DreamHost Core Value

Provide Superhero Service: DreamHost strives to max out the value and customer service we put into everything we do.
- The DreamHost Way

Situation

  • Customer is having email problems,
  • and Customers having email problems is bad,
  • Logs are necessary for diagnosis,
  • so Technical Support needs access,
  • but Operations has access,
  • and InfoSec policies restrict certain access.
## Old Solution * access: Ops gives Technical Support ssh access. * tooling: scripts ssh+grep hundreds of servers. * curation: many GBs of logs per day, per server.
## Old Solution * problem: grep is slow on gigabytes of data. * result: *many minutes* to search today's logs. * reduced support efficiency due to bad tooling
## Let's Reduce MTTSR (Mean Time to Technical Support Resolution)
## New Solution
## New Solution * hundreds of servers ship logs to a central logstash cluster * provide single search interface on the web
## results * log search results in seconds, not minutes * search specific time spans, not just "today" or "yesterday" * happier, more effective customer support folks
## helps the business
Empower People: By empowering our employees to be truly helpful, we empower our customers with products and services they need to do their business.
- The DreamHost Way

How much improved?

Technical Support Says

by an order of awesome
10000000000%
## implementation details * /var/log/mail.{info,warn,log}, /var/log/amavisd.log * 600-ish servers shipping logs with lumberjack * 7-node logstash/elasticsearch cluster * each node: 8 cores, 4TB, 16GB
## rough numbers * peak 10,000 events/second * 500,000,000 events/day * 4,000,000,000 events/week * 1 terabyte/week * peak 10% cpu utilization across cluster

So, about logstash.

how can it help you?


  • powerful and flexible log processing
  • excellent search/analytics
  • integrates well with your infrastructure
  • helpful and friendly community

got logs, now what?

apache response codes to graphite

how do we get there?

## the agent inputs | filters | outputs
## inputs where logs come from.
## inputs: local * files * standard input * windows eventlog * external programs

inputs: network

gemfire irc
lumberjack rabbimq
redis relp
stomp syslog
tcp and udp websockets
xmpp zeromq
## inputs: external * Heroku * Amazon SQS * Twitter Stream API * Drupal DBLog
## filters processing events for awesome
## filters: grok See this log: 37.57.128.238 - - [13/Sep/2012:02:37:24 -0400] "GET / HTTP/1.1" 200 41687 ( ip address) ( some timestamp thing ) (http request) rsp bytes Describe it in grok: %{IP:client} - - \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{URIPATH:request} HTTP/%{NUMBER:http_version}" %{NUMBER:response} %{NUMBER:bytes} becomes: { "client": "37.57.128.238", "timestamp": "13/Sep/2012:02:37:24 -0400", "verb": "GET", "request": "/", "http_version": "1.1", "response": 200, "bytes": 41687, }
## filters: date
1304060505 29/Apr/2011:07:05:26 +0000
Fri, 21 Nov 1997 09:55:06 -0600 Oct 11 20:21:47
020805 13:51:24 110429.071055,118
@4000000037c219bf2ef02e94

            filter {
              date {
                # Parse apache's "13/Sep/2012:02:37:24 -0400"
                match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
              }
            }
            

filters: complex parsing

  • key=value things="like this"
  • { "json": "parsing" }
  • <xml parsing="true" />
  • csv,parsing,enabled
  • also=have&url=parsing
## filters: improve event data * geoip lookups * user agent parsing * syslog decoding * dns lookups * zeromq call outs
## filters: modify events * mutate * anonymize * translate
## outputs where events go
## outputs: metrics * boundary * circonus * cloudwatch * datadog * ganglia * graphite * librato * opentsdb * riemann
## outputs: storage * elasticsearch * file * mongodb * redis * riak
## outputs: monitor & report * nagios * email * pagerduty * zabbix * irc
## deployment scenarios

simple standalone

inputs filters outputs
/var/log/messages
/var/log/apache/access.log
/var/log/mysql/slow.log
grok
date
multiline
anonymize
elasticsearch
ops irc channel
## scale out? * expand on (inputs | filters | outputs) * "inputs" are your business machines * "filters" are a processing tier * "outputs" are a storage and integration tier

tiered

inputs filters outputs
frontends
databases
routers
parse
anonymize
elasticsearch
graphite
nagios
live tail
  • inputs are your business machines
  • filters are machines that processes events
  • outputs are machines that store events

[image made by Rashid Khan]

## about the project
## extendable * 100+ plugins (inputs, filters, outputs) * something missing? Easy to add!
## project values * newbies have a bad time? It's a bug. * all contributions are valued (code, bugs, docs, support, etc)
## community * communication: irc, mailing list * book: http://www.logstashbook.com/ * tips/tricks: http://cookbook.logstash.net/ * extra goodness: kibana, logstash-cli, puppet modules, chef cookbooks
## project metrics * 18000 unique monthly visitors to logstash.net * 736 users on [email protected] * 540 users on logstash.jira.com * 275 folks in #logstash on freenode IRC * 130 code contributors