Search this site





Puppet "pure fact-driven" nodeless configuration

Truth should guide your configuration management tools.

Truth in this case is: what machines you have, properties of those machines, roles for those machines, etc. For example " is a webserver" is a piece of truth. Where and how you store truth is up to you and out of scope for this post.

My goal is to have truth steer everything about my infrastructure. Roles, jobs, and even long-term one-offs get put into the truth source (like a machine role, etc). That way, if the machine with a one-off dies, I can just add that machine role to another system and puppet will configure it - no pain and no fire to fight.

A simple example of truth in puppet is puppet's "node" type. A simple example:

node "" {
  include apache
Specifying each node in puppet doesn't scale very well for many people. Further, you may already have your node information in another tool (ldap, mysql, etc). To allow this, puppet lets you feed 'node' information from an external tool (called an external node classifier). However, I found that using an external node classifier also has its drawbacks (also out of scope for this post).

To avoid complex logic in a node classifier, I've got with a pure fact-based puppet configuration which I call "nodeless."

My puppet site.pp looks basically like this:

node default {
  include truth::enforcer
That's it. No extra nodes, no 'include' or conditionals randomly sprayed around.

From there, I have the truth::enforcer class include other classes, do sanity checking, etc, all based on facts (and properties if you use external node classifier)

Fully standalone example with code can be found here:

Puppet data into mcollective facts

There's a plugin for mcollective that lets you use puppet as your mcollective fact source. However, it doesn't seem to support the plugins-in-modules approach that puppet allows, and I don't want to have to regenerate and restart mcollective any time I add new fact modules.

Mcollective comes with a yaml fact plugin by default which will load facts from a yaml file of your choice. Exporting puppet facts in yaml format is super trivial during a puppet run:

  package {
    "mcollective": ensure => "0.4.10-1";

  file {
      ensure => file,
      content => inline_template("<%= scope.to_hash.reject { |k,v| !( k.is_a?(String) && v.is_a?(String) ) }.to_yaml %>"),
      require => Package["mcollective"];
Easy. Now each puppet run updates will dump it's fact/parameter knowledge to that file and mcollective can use those facts:
% mc-facts lsbdistrelease
Report for fact: lsbdistrelease                            

        10.04                                   found 18 times

Finished processing 18 hosts in 5533.86 ms
An added benefit of this is that any puppet variables (not just facts!) that are in scope are included in the yaml output. This lets you write "facts" to feed mcollective and puppet from plain puppet manifests. Awesome!

Update: Looks like there's a bug/feature somewhere that causes puppet to output yaml that mcollective can't handle due to sorting problems (like '!ruby/sym _timestamp'). To solve this, filter the scope hash for keys and values that are not strings. I have updated the code above to reflect this. Future mcollective releases will handle funky data more safely.

Puppet Trick - Exported Resource Expiration

I've finally taken the plunge with puppet's exported resources.

"Exported resources" is a feature of puppet that allows your nodes to export resources to other nodes. You can read more about this feature on puppet's exported resources documentation. Covering how to setup exported resources or storeconfigs is out of scope, but if you need help read the docs and come to #puppet on freenode IRC.

Exported resources are pretty cool, but they lack one important feature - expiration/purging. The storeconfigs database has no idea about nodes that you have decommissioned or repurposed, so it's very possible to leave exported resources orphaned in your database.

I worked around this by making my resources such that I can expire them. This is done by making a custom define that has a 'timestamp' field that defaults to now, when registering each time. If a node has not checked in (and updated its resources) recently, I will consider that resource expired and will purge it.

I made a demo of this and put the code on github: jordansissel/puppet-examples/exported-expiration. More details (and example output of multiple runs with expiration) are available in the README.

The demo is runnable by itself (standalone, no puppet master), so you can test it without needing to mess with your own puppet installations.

Puppet Camp San Francisco 2010

Another puppet camp has come and gone, and I'm certainly glad I went. Puppet, the surrounding ecosystem, and its community has grown quickly since last year.

The conference was the same format as last year. The morning was single-track presentations from various puppet users, and the afternoon was openspace/barcamp-style break out sessions. It was good to see some old faces and also to finally put faces to IRC and twitter names.

One of the bigger announcements was that mcollective would join the Puppet project. Other announcements included new employees and other good news. Beyond that, I picked up a few tricks and learned more about the puppet roadmap.

In no particular order - some thoughts and notes.

Facter 2.0 will be good. Take lessons learned from Facter 1.x and improve things - Make the DSL for making facts simpler, add structured data, add caching, etc.

Puppet supports a "config_version" option that specifies a script that will override how the version of a given catalog is determined. Useful for tagging based on revision control or deployment versions.

Scoped defaults such as 'File { owner => root }' apply downwards in all cases, something I hadn't considered before. That is, if you are class 'foo' and define a default and also include class 'bar', the default in foo will apply to bar as well. This was new to me, and I will be cleaning up some of my manifests as a result (I use defaults in some classes but not others). Best practice here is to either use no class-specific defaults or use class-specific defaults in every class.

Twitter operations (John Adams) gave a talk covering their automation/puppet stuff. John talked about problems with sysadmins trying to hack around puppet by using chattr +i to prevent puppet from modifying certain files - a practice they heavily discouraged. He also mentioned problems with poor cron scheduling and presented the usual sleep $(($RANDOM % 600))-style solution. I didn't get around to sharing my cron practices (sysadvent) with John before the end of the con. He also mentioned having problems with home directory syncing using puppet, which was another solution I'd covered that here and better solved previously on sysadvent.

During some downtime at the conference, I started working on an ssh key authorization module for mcollective. The ruby ssh key code is here and the mcollective fork with the sshkey security plugin is here. It works pretty well:

snack(~) % sudo grep security /etc/mcollective/{server,client}.cfg
/etc/mcollective/server.cfg:securityprovider = sshkey
/etc/mcollective/client.cfg:securityprovider = sshkey
snack(~) % mc-ping                                         
snack.home                               time=97.81 ms
The gist of the key signing pieces is that your ssh agent signing authenticates you as a user for requests, and for responses the server signs messages with its own ssh host key (like /etc/ssh/ssh_host_rsa_key). Validation of you as a user is done through your authorized_keys file, and validation for the reply uses your known_hosts file to verify the host signature.

It was a good conference, though I would've enjoyed a more hackathon-style atmosphere. We tried to do a facter hackathon, but there wasn't enough time, so instead we code reviewed some of the sillier parts of facter and talked about the future.

Puppet to manage $HOME dotfiles

On SysAdvent Day 11 I covered the importance of having your own personal environment carry with you. This includes your rc files like .vimrc, etc.

If you manage users with puppet, here's how to manage it with puppet while falling back to a default skeleton (similar to /etc/skel) directory for users who don't have individual ones:

class user::people {
  # ...

  define localuser($uid, ...) {
    user {
        uid => $uid,

    file {
        ensure => directory,                                                      
        owner => $name,                                                           
        group => $gid,                                                            
        source => [ "puppet:///user/home/$name/",                                 
                    "puppet:///user/skel/" ];                                     

  localuser {
    "jls": uid => 10000;
Puppet will first search for 'user/home/$name' on it's file server, and sync it if it exists. If it does not exist, it falls back to 'user/skel'. The benefit here is that whenever you update the default 'skel' directory, all users who depend on it will be automatically updated by puppet.

Update: I've since stopped using this method to sync homedirectories. Puppet (at time of writing, 0.25.1) is not smart about how it scans for files to update when recursion is on - puppet walks all files in each homedir, even if they aren't part of the file list to be pushed by puppet. I use the script mentioned in sysadvent 2008 day 11 with a cron job. This cron job is deployed with puppet.