Search this site


Metadata

Articles

Projects

Presentations

MITRE's CEE is a failure for profit.

I wrote this post a few months ago, but never got around to publishing it.

Anyway, someone mentioned 'project lumberjack' and I found it was based on CEE: Common Event Expression. CEE is a sort of comedic tragedy of design.

The effort is owned by a "non-profit" (MITRE), but the complexity and obfuscation in CEE can only drive towards one thing: consultant profits. I had a go at explaining what I describe in this post on the 'project lumberjack' mailing list, but I did it quite poorly and got a few foot-stomps in response, so I gave up.

CEE is a failure because, while claiming to be a standards effort, it maximizes incompatibility between implementations by doing the following:

  • poorly describes multiple serialization formats, requires none of them. This ensures maximum incompatibility in event serialization
  • defines requirements for log transport protocols, but does not describe an actual protocol. This ensures maximum protocol incompatibility
  • naming style inconsistencies This ensures confusion

In general, the goal of CEE sounds like, but is actually not, creating a standard for common event expression. Instead, CEE is aimed at ensuring consulting dollars through obfuscation, complexity, and inconsistency.

Inconsistency.

No consistency in naming. Some names are abbreviations like ‘crit’, some are prefixed abbreviations (“p_” prefix), some are full english words like ‘action’.

If the goal was to be inconsistent, CEE has succeeded.

  • Mysterious ‘p_’ prefix. Base fields are abbreviated names like “crit”, “id”, “pri”, yet others are called “p_app”, “p_proc”, and “p_proc_id”.
  • Some fields are abbreviated, like “crit” and “pri”, but others are full english words, like “action” and “domain”

  • there’s “id” which marks the “event type” by identifier, but “uuid” which marks the event instance identifier. You are confusing. I’m still not sure what the purpose of ‘uuid’ is.

ETOOMANYPROTOCOLS

  • Serializations: JSON, RFC3164, RFC5424, and XML serializations
  • 4 conformance levels.

Pick one serialization and one transport (“conformance”) requirements list. Describe those two. Drop all the others.

If I pick JSON, and you pick XML, we can both be CEE-conforming and have zero interoperability between our tools. Nice!

Serialization underspecified

JSON for event serialization is fine, but no message framing is defined. Newline terminated? Length prefixed? You don’t know :(

JSON

  • “Reserved Characters” - I don’t think you have read the JSON spec. Most (all?) of the ‘escaping’ detailed in CEE JSON is already specified in JSON: http://www.json.org/string.gif

Specific comments on the ‘json’ format inline as comments, this example was copied verbatim from the CEE :

{
    # Forget this "Event" crap, just move everything up a level.
    "Event": {
        "p_proc": "proc1",
        "p_sys": "host.vendor.com",
        "time": "2012-01-18T05:55:12.4321-05:00",
        "Type": {
            # Action is like, edge-trigger information.
            # Status is like, line-trigger information.
            # You don't usually have both edge and line triggers in the
            # same event. Confusing.
            "action": "login",
            "status": "ongoing",

            # Custom tag values also MUST start with a colon? It's silly to make
            # the common case (custom tags) the longer form.
            # Also, tags is a plural word, but the value is a string. What?
            "tags": ":hipaa"
        },
        "Profile": {
            # This is a *huge* bloat, seriously. Stop making JSON into XML, guys.
            "CustomProfile": {
                "schema": "http://vendor.com/events/cee-profile.xsd",
                "new_field": "a string value",
                "new_val": 1234,
                "product_host": "source.example.com"
            }
        }
    }
}

If you include the schema (CEE calls it a Profile) in every message, you’re just funding companies who’s business model relies on you paying them per-byte of log transit.

Prior art here for sharing schema can be observed in the Apache Avro project, for example.

CLT Protocol

Just define the protocol, all these requirements and conformance levels make things more confusing and harder to write tooling for.

If you don’t define the protocol, there’s no incentive for vendors to use prior art and are just as likely to invent their own protocols.

Combining this incentive-to-invent with the whole of CEE that underspecifies just about every feature, this guarantees incompatibilities in implementations.

Smart logging hacks in ruby

Ruby has Logger. It is good, but strings suck. In a world where more and more people are using log data for inputs and analysis, structured data is good. I want to log structured data.

This lead to me subclassing Logger and additionally providing my own logger format class. The code for this is in logstash, logging.rb.

What did I add? Two main goals: First, improve context. Second, log structured data (objects). This is achieved by style changes (log objects, not strings), adding awesome_print support, adding code context to each log (line/file/method), etc.

To support the first goal (context), if the loglevel is 'debug' I will inspect the call stack and include the file and line of code that is logging. I also set the 'progname' to the name of the program by default. To support the second goal, log objects and improve how objects are formatted into strings with Object#inspect (or awesome_inspect, if available).

Some examples:

>> logger = LogStash::Logger.new(STDOUT)
I, [2010-11-12T15:19:48.388469 #18782]  INFO -- : ["Failed: require 'ap' (aka awesome_print); some logging features may be disabled", #<LoadError: no such file to load -- ap>]
# This is an example of what javascript folks call 'progressive enhancement'
# - we still function if awesome_print is not available.

>> logger.level = Logger::WARN
>> logger.warn("Hello")
W, [2010-11-12T15:20:05.465705 #18782]  WARN -- irb: Hello

>> logger.warn(["rejecting bad client", { :client => "1.2.3.4" }])
W, [2010-11-12T15:21:04.639404 #18782]  WARN -- irb: ["rejecting bad client", {:client=>"1.2.3.4"}]

>> logger.level = Logger::DEBUG
>> logger.warn("Hello")
W, [2010-11-12T15:21:57.754874 #18782]  WARN -- (irb):12#irb_binding: Hello
# Notice the context (file, line, method)       ^^^^^^^^^^^^^^^^^^^^
# When DEBUG level is set only due to performance and verbosity concerns.
The main benefit personally is logging objects instead of strings, which you can do, today, with the standard Logger. However, standard logger doesn't make nice with awesome_print or add file/line/method context. Anyway, logging objects lets you later hook a smarter error handling tool up to your logging that can inspect the structured data rather than having to regex your way through a single string.

If you have awesome_print available, the object output by my formatter gets even more useful for human viewing:

Why log structured? Easier to parse and query later, like in a logstash query.

Net-SNMP and tcp-wrappers verbosity

I see this in my server logs quiet often:
Oct 23 05:37:48 pww-5 snmpd[23946]: Connection from UDP: [XX.XX.XX.XX]:34650 
Oct 23 05:37:48 pww-5 last message repeated 16 times
Oct 23 05:37:48 pww-5 snmpd[23946]: Connection from UDP: [XX.XX.XX.XX]:34652 
Oct 23 05:37:48 pww-5 last message repeated 24 times
Googling points out that in snmpd.conf we should use "dontLogTCPWrappersConnects" - but thet top search results claim that it doesn't work (syntax errors, etc). I tried this:
dontLogTCPWrappersConnects
This makes an error of:
/etc/snmp/snmpd.conf: line 29: Error: Blank line following dontLogTCPWrappersConnects token.
So I took a guess and changed it to:
dontLogTCPWrappersConnects 1
This works to quiet the 'Connection from UDP: ...' messages. However, it still logs things like:
Oct 22 23:17:35 pww-4 snmpd[29383]: Received SNMP packet(s) from UDP: [XX.XX.XX.XX]:42926
Fixing this requires telling snmpd to log less stuff to syslog. The '-L' logging options support upper-case versions which set the level at which it will log. Fixing syslog to not log the snmp packet info means setting this flag "-LSnd". This means we'll log at 'notice' levels and above to syslog with the daemon facility. Setting this flag seems to make snmpd less chatty in logs about packets it gets. Setting the log level to '-LSid' (info level) will make it once again log the packet receipts.

In CentOS (and other redhat variants) you'll edit this file to make this change permanent: /etc/sysconfig/snmpd.options - just change "-Lsd" (default in my version of net-snmp) to "-LSnd" and make sure the OPTIONS line is uncommented.