Hey, Rafael!

De-registering an instance from Chef and Sensu on termination in EC2


Rafael Fonseca

Rafael Fonseca

howto chef sensu aws

De-registering an instance from Chef and Sensu on termination in EC2

Posted by Rafael Fonseca on .

howto chef sensu aws

De-registering an instance from Chef and Sensu on termination in EC2

Posted by Rafael Fonseca on .

There are many posts on the Internet about Amazon Web Service's (AWS) AutoScaling, just as many posts about Chef and, as you'd expect, there is a fair selection of posts about using Chef to bootstrap new instances in AutoScaling.

What seems to be lacking from the above picture, however, are posts explaining what happens when your instances get terminated. This post aims to address that. Sysadmins that have setup AutoScaling clusters to both scale out (increase servers) and scale in (decrease servers), pay attention.

The company formely known as Opscode created a brilliant tool for configuration management. One key aspect of Chef is the way it auto-registers a new node against your Chef server during the initial bootstrap process. This makes AutoScaling a breeze.

Sensu, an open source monitoring framework, takes the 'node registration' process to an even higher level, letting nodes register themselves without any user intervention, provided the node knows where to find the Sensu server. When the Sensu agent starts up, it automatically notifies the server about the checks it is configured to run and how they're doing.

The beauty of these tools, in this particular case, is how they're driven by an API. Both Chef and Sensu servers provide an easy way to automate certain actions with very little coding. As it happens, both of them will let a node de-register itself with only a couple of commands.

To de-register a node from Chef, you can call knife node delete like this:

knife node delete -y NODE_NAME
knife client delete -y NODE_NAME

Because knife requires a valid knife.rb file saved somewhere, and a knife.rb must point to a valid client certificate, we can be cheeky and save a file similar to the below under /etc/chef/knife.rb (paths applicable to Ubuntu servers):

log_level           :info
log_location        STDOUT
node_name           'NODE_NAME'
client_key          '/etc/chef/client.pem'
chef_server_url     'https://YOUR_CHEF_SERVER_URL'

Then we can reference this file with the -c flag when running knife:

knife node delete -y -c /etc/chef/knife.rb NODE_NAME

To remove a node from Sensu, a simple cURL command does it:

curl -XDELETE --silent -L http://YOUR_SENSU_SERVER:4567/clients/CLIENT_NAME

If you're smart, you'll glue it all together into an ERB template...

# Provides:          instance_termination
# Required-Start:    $network $named $remote_fs $syslog
# Required-Stop:     $network $named $remote_fs $syslog
# Default-Stop:      0
set -e

case "$1" in
    /etc/init.d/sensu-client stop   # stops Sensu service so that it doesn't re-register with server
    /usr/bin/curl -XDELETE --silent -L http://<%= node[:sensu][:api][:host] %>:4567/clients/<%= node.fqdn %>  # remove from Sensu server
    /usr/bin/knife node delete -y -c /etc/chef/knife.rb <%= node.fqdn %>  # remove node from Chef
    /usr/bin/knife client delete -y -c /etc/chef/knife.rb <%= node.fqdn %>  # deletes the client certificate from Chef
    echo "Nothing to do."
    echo "Usage: auto-scaling-shutdown {start|stop}" >&2
    exit 1

exit 0

And use Chef to drop the script into /etc/init.d/instance_termination (or any name you'd like) and link it to the shutdown stages of init.d:

template "/etc/init.d/instance_termination" do
    source "instance_termination.erb"
    mode 0755
    owner "root"
    group "root"

link "/etc/rc0.d/S01instance_termination" do
    to "/etc/init.d/instance_termination"

If all went well, your terminating instances should trigger the script above and remove themselves from your monitoring and configuration management servers, keeping your node lists always consistent, and one step closer to DevOps nirvana! ;)

Footnote: For bonus points, turn knife.rb into an ERB template and deploy it together with the instance_termination script.*

Rafael Fonseca

Rafael Fonseca

View Comments...