Alexa on Rails – how to develop and test Alexa skills using Rails


Alexa is awesome and I think that conversational software is the future. This post documents what I set myself as a technical learning challenge:

  • Host the skill locally, to allow a fast development feedback cycle prior to pushing code.
  • To find a way to automated tests (unit, functional and end-to-end), as most demos refer to manual testing.
  • To use something other than JS (like most of the demos do)
  • To write an Alexa skill that’s backed by a data store
  • To be able to handle conversations.

The way Alexa services interact with apps is the following:

User->Echo: “Alexa, …”
Note right of Echo: Wakes on ‘Alexa’
Echo->Amazon: Streams data spoken
Amazon->Rails: OfficeIntent
Rails->SkillsController: POST
SkillsController->Amazon: reply (text)
Amazon->Echo: reply (voice)
Echo->User: Speaks

The skill

The skill is a data retrieval one, giving information about the company’s offices and the workers there.

Alexa, Rails, git, ngrok and an Amazon account

I bought a dot and set up an Amazon account to register the skill on.

Install Rails and git for your OS. You’ll also need a data-store, easily using sqlite, or mysql gems.

ngrok is a nifty tool that will tunnel Alexa calls in to our local server.

Get the code

Fork or clone the repo for a head-start, or read along taking only pieces you need from this post.

Set up the app

  • Setting some environment variables

The database connection use the following environment variables:

  • Setting up the database
rake db:create db:migrate db:seed spec

This will create and setup the database tables, seed the development tables and run the unit and integration tests.

  • Running tests

Will run all tests excluding the audio tests, which I’ll describe below. Make sure all tests pass.

Connecting to the real thing

When a user invokes your skill, Amazon will route requests to an endpoint listed on the Alexa site. In order for this to function, you must first configure the skill there. It’s straightforward, but must be manually uploaded to the skill’s configuration page on Amazon’s site.

Intent schema

This is where you define the intents the user can express to your skill. I think of ‘intents’ as the skill’s ‘methods’, if you think of the skill as an object.


Permutations on the intent’s syntax. For example:

Bookit for vacant rooms between {StartDate} and {EndDate}
OfficeWorkers who the {Staff} from {Office} are

Slot types

Here are the slot types for our skill, defining synonyms for our slots, being the parameters for intents. If you think this is complex, please remember that I am only the messenger here…


Now that you have configured the skill’s interfaces, we now need to route communications from Amazon to our local server running Rails as we develop and debug. This is easily done using ngrok, explained below.


ngrok is a service, with a free tier, that will redirect traffic from outside your home/office’s firewall into your network. Once configured, it will route traffic from Amazon to our http://localhost:3000, essential for our aspired fast development cycle.

Run it using:

ngrok http 3000

Your configuration may vary, depending on whether you are paying customer or not, so change ‘endpoint’ accordingly.

You’ll see something like this once you run it:


Add your endpoint to Amazon’s skill page under configuration:


Generating a certificate

Once you’ve settled on the endpoint URL, you’ll need to create or reuse a certificate for Amazon to use when communicating with your server process.

genrsa 2048 > private-key.pem
openssl req -new -key private-key.pem -out csr.pem
openssl req -new -x509 -days 365 -key private-key.pem -config cert.cnf -out certificate.pem

Copy the the contents of ‘certificate.pem’ to the skill’s page on Amazon:


Toggle the test switch to ‘on’, otherwise Amazon will think you’re trying to publish the skill on their Skills store:


Last but not least, enable the skill on your iPhone or Android by launching the Alexa app and verifying that the skill exists in ‘Your skills’ tab.

Amazon recap

We uploaded the skill info, including:

  • The Interaction model, uploading the ‘intent schema’, ‘Custom slot types’, and ‘Sample utterances’.
  • Configured the end-point
  • Uploaded the SSL cert
  • Enabled the test flag
  • Verified that the skill is enabled by using your Alexa app on your mobile device

The moment we’ve been waiting for

Run your rails app:

rails s

Run ngrok in another terminal window:

ngrok http 3000

Say something to Alexa:

Alexa, tell Buildit to list the offices

If all goes well, you should:

  • See the request being logged in the ngrok terminal (telling you that Amazon connected and passed the request to it)
  • See that the rails controller got the request by looking at the logs
  • Hear the response from your Alexa device

If there was a problem at this stage, please contact me so I can improve the instructions.

Code walkthrough

Route to a single skills controller:

 Rails.application.routes.draw do
   # Amazon comes in with a post request
   post '/' => 'skills#root', :as => :root

Set up that controller:

class SkillsController < ApplicationController
  skip_before_action :verify_authenticity_token

  def root
    case params['request']['type']
      when 'LaunchRequest'
        response =
      when 'IntentRequest'
        response =['request']['intent'])
     render json: response

Handle the requests:

def respond intent_request
  intent_name = intent_request['name']

  Rails.logger.debug { "IntentRequest: #{intent_request.to_json}" }

  case intent_name
    when 'ListOffice'
      speech = prepare_list_office_request
    when 'OfficeWorkers'
      speech = prepare_office_workers_request(intent_request)
    when 'OfficeQuery'
      speech = prepare_office_query_request(intent_request)
    when 'Bookit'
      speech = prepare_bookit_request(intent_request)
    when 'AMAZON.StopIntent'
      speech = 'Peace, out.'
      speech = 'I am going to ignore that.'

  output =

Test walkthrough

Unit tests

Really fast, not touching any Alexa or controller code, just making sure that the methods create the correct responses:


require 'rails_helper'

RSpec.describe 'Office' do
  before :all do
    @intent_request =
  describe 'Intents' do
    it 'handles no offices' do
      expect(@intent_request.handle_list_office_request([])).to match /We don't have any offices/

    it 'handles a single office' do
      expect(@intent_request.handle_list_office_request(['NY'])).to match /NY is the only office./

    it 'handles multiple offices' do
      expect(@intent_request.handle_list_office_request(['NY', 'London'])).to match /Our offices are in NY, and last but not least is the office in London./

Integration tests

Mocking out Alexa calls, ensure that the JSON coming in and out is correct:

describe 'Intents' do
  describe 'Office IntentRequest' do
    it 'reports no offices' do
      request = JSON.parse('spec/fixtures/list_offices.json'))
      post :root, params: request, format: :json
      expect(response.body).to match /We don't have any offices/

    it 'reports a single office' do
      request = JSON.parse('spec/fixtures/list_offices.json'))
      Office.create name:'London'
      post :root, params: request, format: :json
      expect(response.body).to match /London is the only office/

    it 'reports multiple offices' do
      request = JSON.parse('spec/fixtures/list_offices.json'))
      Office.create [{name: 'London'}, {name: 'Tel Aviv'}]
      post :root, params: request, format: :json
      expect(response.body).to match /Our offices are in London, and last but not least is the office in Tel Aviv./

Audio tests

I was keen on finding a way to simulate what would otherwise be an end-to-end user-acceptance test, like a Selenium session for a web-based app.

The audio test I came up with has the following flow:

describe 'audio tests', :audio do
  it 'responds to ListOffice intent' do
    london = 'Paris'
    aviv = 'Tel Aviv'

    Office.create [{ name: london }, { name: aviv }]

    pid = play_audio 'spec/fixtures/list-office.m4a'

    client, data = start_server

    post :root, params: JSON.parse(data), format: :json
    result = (response.body =~ /(?=#{london})(?=.*#{aviv})/) > 0

    reply client, 'The list offices intent test ' + (result ? 'passed' : 'failed')
    expect(result).to be true


Line 6: Creates some offices.
Line 8: Plays an audio file that asks Alexa to list the offices
Line 10: Starts an HTTP server listening on port 80\. Make sure that rails is not running, but keep ngrok up to direct traffic to the test.
Line 12: Will direct the intent request from Alexa to the controller
Line 13: Makes sure that both office names are present in the response
Line 15: Replaces the response that would have been sent back to Alexa with a curt message about the test passing or not.
Line 16: Relays the test status back to RSpec for auditing.

This is as close as I got to an end-to-end test (audio and controller). Please let me know if you have other ways of achieving the same!


What was technically done here?

  • We registered an Alexa skill
  • We have a mechanism to direct traffic to our server
  • We have a mechanism to unit-test, integration-test and acceptance-test our skill
  • We have a mechanism that allows for a fast development cycle, running the skill locally till we’re ready to deploy it publicly.

My main learning, however, was not a technical one (despite my thinking that the audio test is nifty!). Being an advocate for TDD and BDD, I realise that now there’s a new way of thinking about intents, whether the app is a voice-enabled one or not.

We may call it CDD, being Conversation Driven Development.

The classic “As a..”, “I want to…”, “So that…” manner of describing intent seems so static compared to imagining a conversation with your product, whether it’s voice-enabled or not. In our case, try to imagine what a conversation with an office application would be like?

“Alexa, walk me through onboarding”. Through booking time, booking conference rooms, asking where office-mates are, what everyone is working on etc.

If the app happens to be a voice-enabled one, just make audio recordings of the prompts, and employ TDD using them. If it’s a classic app, use those conversations to create BDD scripts to help you implement the intents.


Arduino programming using Ruby, Cucumber & rSpec

The project

This project serves as a sanity check that all is in order with the hardware, without the need to write on-board code using the IDE nor use the avr toolchain. What better tool than Ruby to do so?

The first thing we’ll do is to assure that the board and its built-in LED are responsive. Let’s define the behviour we would like, and implement it using Cucumber, in true BDD fashion:

  Assure board led is responsive

    Given the board is connected

  Scenario: Turn led on
    When I issue the led "On" command
    Then the led is "On"

  Scenario: Turn led off
    When I issue the led "Off" command
    Then the led is "Off"

The step implementation follows:

require 'driver'

Given(/^the board is connected$/) do
  @driver ||=

When(/^I issue the led "([^"]*)" command$/) do |command|
  value = string_to_val command
  expect(@driver.set_led_state value).to be value

Then(/^the led is "([^"]*)"$/) do |state|
  expect(@driver.get_led_state).to eq string_to_val state

def string_to_val state
  case state.downcase
    when 'on'
      my_state = ON
    when 'off'
      my_state = OFF

Some things to note:

  • We don’t have an assertion on @driver ||= because the driver will simulate a connection in case the phyical board is disconnected or unavailable due to disrupted communications.
  • The user communicates using the words “on” and “off”, which are translated to ON and OFF for internal use.

This test will fail, of course, as we have yet to define the Driver class and we drop to rSpec, in TDD fashion:

require 'driver'

describe "led functions" do
  before(:each) do
    @driver =

  it "turns the led on" do
    expect(@driver.set_led_state ON).to eq ON

  it "turns the led off" do
    expect(@driver.set_led_state OFF).to eq OFF

  it "blinks" do
    @driver.blink 3

This too fails, of course, and we implement Driver thus:

class Driver
  def initialize 
    @arduino ||= ArduinoFirmata.connect nil, :bps =&gt; 57600 
  rescue Exception =&gt; ex 
    puts "Simulating. #{ex.message}" if @arduino.nil?
  def set_led_state state 
    result = @arduino.digital_write(LED_PIN, state)
  rescue Exception =&gt; ex 
    @state = state 

  def get_led_state 
  rescue Exception =&gt; ex 

  def blink num 
    (0..num).each do 
      set_led_state ON 
      sleep 0.5 
      set_led_state OFF 
      sleep 0.5 


Some things to note:

  • I am using the arduino_firmata gem, please see the Gemfile for details.
  • The initialize method catches the exception thrown when the Arduino is not connected, as the other methods do, in order to simulate the board in such circumstances. The simulation is always succeeds, by the way, and was coded to allow development without the board connected.
  • arduino.output_digital_read is a monkey-patch to the gem, as I could not find a way to query the board if an output pin was on or off:
module ArduinoFirmata
  class Arduino
    def output_digital_read(pin)
      raise ArgumentError, "invalid pin number (#{pin})" if pin.class != Fixnum or pin &lt; 0
      (@digital_output_data[pin &gt;&gt; 3] &gt;&gt; (pin &amp; 0x07)) &amp; 0x01 &gt; 0 ? ON : OFF

All green

Having implemented the code, the tests should now pass and running rake again will run both Cucumber and rSpec, yielding:

~/Documents/projects/arduino (master)$ rake
/Users/ThoughtWorks/.rvm/rubies/ruby-2.2.1/bin/ruby -I/Users/ThoughtWorks/.rvm/gems/ruby-2.2.1/gems/rspec-support-3.3.0/lib:/Users/ThoughtWorks/.rvm/gems/ruby-2.2.1/gems/rspec-core-3.3.1/lib /Users/ThoughtWorks/.rvm/gems/ruby-2.2.1/gems/rspec-core-3.3.1/exe/rspec --pattern spec/\*\*\{,/\*/\*\*\}/\*_spec.rb

Finished in 7.56 seconds (files took 0.27749 seconds to load)
3 examples, 0 failures

/Users/ThoughtWorks/.rvm/rubies/ruby-2.2.1/bin/ruby -S bundle exec cucumber 
  Assure board led is responsive

  Background:                    # features/initial.feature:4
    Given the board is connected # features/step_definitions/initial_steps.rb:3

  Scenario: Turn led on               # features/initial.feature:7
    When I issue the led "On" command # features/step_definitions/initial_steps.rb:7
    Then the led is "On"              # features/step_definitions/initial_steps.rb:12

  Scenario: Turn led off               # features/initial.feature:11
    When I issue the led "Off" command # features/step_definitions/initial_steps.rb:7
    Then the led is "Off"              # features/step_definitions/initial_steps.rb:12

2 scenarios (2 passed)
6 steps (6 passed)


Make this better!

The project is here. Please feel free to fork and contribute.


How much is “good enough”? If you notice, the assertions are implemented using the data structure exposed by arduino_firmata, not with a call to the board itself. This is always a tradeoff in testing. How far should we go? For this project, testing via data structure is “good enough”. For a medical application, or something that flies a plane, it’s obviously not good enough and we would have to assert on an electric current flowing to the LED. And again, who is to assure us that the LED is actually emitting light?

There’s not much else we can do with a standalone Arduino without any periferals connected, but it’s enough to make sure that everything is set up correctly for future development.


This installment was to show a quick-and-dirty sanity check without bothering to flash the device.


The testing and writing of this installment were made while flying to Barcelona, hoping that fellow passengers would not freak out seeing wires and blinking lights mid-flight.

Happy Arduinoing!

What is the difference between TDD and BDD?

The short answer is: none.

All variants of Driven Development (henceforth the ‘xDDs’) strive to attain focused, minimalistic coding. The premise of lean development is that we should write the minimal amount of code to satisfy our goals. This principal can be applied to any development management methodology the team has, whether it be Waterfall, Agile or any other.

A way to ensure that code is solving a given problem over time and change is to articulate the problem in machine-readable form. This allows us to programatically validate its correctness.

For this reason, xDD is mainly used the context of testing frameworks. Goals, as well as the code to fulfil them, are run by a framework as a series of tests. In turn, these tests may be used in collaboration with other tools, such as continuous integration, as part of the software development cycle.

We’ve now established that writing tests is a Good Thing(tm). We now turn to answer “when”, “which” and “how” tests should be written, as we strive to achieve a Better Thing(tm).

Defining goals in machine-readable form in itself does not assure the imperative of minimalistic development. To solve this, someone had a stroke of genius: The goals, now viewed as tests, are to be written prior to writing their solutions. Lean and minimalistic development is attained as we write just enough code to satisfy a failing test. As a developer I know, from past experience, that anything I write will ultimately be held against me. It will be criticised by countless people in different roles over a long period of time, until it will ultimately be discarded and rewritten. Hence, I strive to write as little code as possible, Vanitas vanitatum et omnia vanitas.

However, the shortcomings of this methodology are that we need a broad test suite to cover all the goals of the product along with a way to ensure that we have implemented the strict minimum that the test required. I’ll be visiting these two points later, but would like to primarily describe the testing pyramid and enumerate the variants of DD and their application to the different layers.

Having established when to write the tests (prior to writing code), we now turn to discuss “which” tests we should write, and “how” we should write them.

The Testing Pyramid

The testing pyramid depicts the different kinds of tests that are used when developing software.

A graphically wider tier depicts a quantitatively larger set of tests than the tier above it, although some projects may be depicted as rectangles when there is high complexity and the testing technology allows for them.

The testing pyramid


Unit Tests

Although people use the term loosely to denote tests in general, Unit Tests are very focused, isolated and scoped to single functions or methods within an object. Dependencies on external resources are discounted using mocks and stubs.


Using rSpec, a testing framework available for Ruby, this test assures that a keyword object has a value:

it “should not be null” do
  k1 = => ”)
  k1.should_not be_valid

This example shows the use of mocks, which are programmed to return arbitrary values when their methods are called:

it “returns a newssource” do
  news_source = NewsSource.get_news_source
  news_source.should_not == nil

NewsSource is mocked out to return an empty set of active news sources, yet the test assures that one will be created in this scenario.

By virtue of being at the lowest level of the pyramid, Unit Tests serve as a gatekeeper to the source control management system used by the project: These tests run on the developer’s local machine and should prevent code at the root of failing tests to be committed to source control. A counter-measure to developers having committed such code is to have a continuous integration service revert those commits when the tests fail in its environment. When practicing TDD (as a generic term), developers would write Unit Tests prior to implementing any function or method.

Functional or Integration Tests

Functional or integration tests span a single end-to-end functional thread. These represent the total interaction of internal and external objects expected to achieve a portion of the application’s desired functionality.
These tests too serve as gatekeepers, but of the promotion model. By definition, passing tests represent allegedly functioning software, hence failures represent software that does not deliver working functionality. As such, failing tests may be allowed to source control yet will be prevented from being promoted to higher levels of acceptance.


Here we are assuring that Subscribers, Articles and Notifications work as expected. Real objects are used, not mocks.

it “should notify even out of hours if times are not enabled” do
  @sub.times_enabled = 0
  @notification = Notification.create!(:subscriber_id =>, :article_id =>
  @notification.subscriber.should_notify(Time.parse(@sub.time2).getutc + 1.hour).should be_true

A “BDD” example is:

Feature: NewsAlert user is able to see and manage her notifications

  Given I have subscriptions such as “Obama” and “Putin”
  And “Obama” and “Putin” have notifications
  And I navigate to the NewsAlert web site
  And I choose to log in and enter my RID and the correct password

Scenario: Seeing notifications
  When I see the “Obama” subscription page
  Then I see the notifications for “Obama”

This is language a BA or Product Owner can understand and write. If the BAs or POs on your project cannot write these scenarios, then you can “drop down” to rSpec instead, if you think the above is too chatty.

Performance and Penetration Tests

Performance and penetration tests are cross-functional and without context. These test performance and security across different scope of the code by applying expected thresholds to unit and functional threads. At the unit level, they will surface problems with poorly performing methods. At the functional level poorly performing system interfaces will be highlighted. At the application level load/stress tests will be applied to selected user flows.


A “BDD” example is:

Scenario: Measuring notification deletion
When I decide to remove all “1000” notifications for “Obama”
Then it takes no longer than 10 milliseconds

User Interface and User Experience Tests

UI/UX tests validate the user’s experience as she uses the system to achieve her business goals for which the application was originally written.
These tests may also validate grammar, layout, style and other standards.
Of the testing framework, these are the most fragile. One reason is that their authors do not separate essence from the implementation. The UI will likely have the greatest rate of change in a given project as product owners are exposed to working software iteratively. Having UI tests that rely heavily on the UI’s physical layout will lead to their rework as the system undergoes change. Being able to express the essence, or desired behaviour, of the thread under test is key to writing maintainable UI tests.


Feature: NewsAlert user is able to log in

  Given I am a Mobile NewsAlert customer
  And I navigate to the NewsAlert web site
  Then I am taken to the home page which has “Log in” and “Activate” buttons

Scenario: Login
  When I am signed up
  When I choose to log in and enter my ID and the correct password
  Then I am logged in to my account settings page

BDD or ATDD may be used for all these layers, as it is more convenient to use User Story format for integration tests in some instances than low-level Unit Test syntax. ATDD is put to full use if the project is staffed with Product Owners that are comfortable using the English-like syntax of Gherkin (see example below). In their absence or will, BAs may take on this task. If neither are available nor willing, developers would usually “drop down” to a more technical syntax such as used in rSpec, in order to remove what they refer to as “fluff”. I would recommend writing Gherkin as it serves as functional specifications that can be readily communicated to non-technical people as a reminder of how they intended the system to function.

Exploratory Testing

Above “UI Tests”, at the apex of the pyramid, we find “Exploratory Testing”, where team members perform unscripted tests to find corners of functionality not covered by other tests. Successful ones are then redistributed down to the lower tiers as formally scripted tests. Since these are unscripted, we’ll not cover them further here.

Flavours of Test Driven Development

This author thinks that all xDDs are basically the same, deriving from the generic term of “Test Driven Development”, or TDD. When thinking of TDD and all other xDDs, please bear in mind the introductory section above: we develop the goals (tests) prior to developing the code that will satisfy them. Hence, the the “driven” suffix: nothing but the tests drives our development efforts. Given a testing framework and a methodology, we can implement a system purely by writing code that satisfies the sum of its tests.

The dichotomy of the different xDDs can be explained by their function and target audience. Generically, and falsely, TDD would most probably denote the writing of Unit Tests by developers as they implement objects and need to justify methods therein and their implementation. Applied to non-object oriented development, Unit Tests would be written to test single functions.

The reader may contest to this being the first step in a “driven” system. To have methods under test, one must have their encapsulating object, themselves borne of an analysis yet unexpressed. Subscribing to this logic, I usually recommend development using BDD. Behaviour-driven development documents the system’s specification by example (a must-read book), regardless of their implementation details. This allows us to distinguish and isolate the specification of the application by describing value to its consumer, with the goal of ignoring implementation and user interactions.

This has great consequences in software development. As one writes BDD scripts, one shows commitment and rationale to their inherent business value. Nonsensical requirements may be promptly pruned from the test suite and thus from the product, establishing a way to develop lean products, not only their lean implementation.

A more technical term is Acceptance Test Driven Development (ATDD). This flavour is the same as BDD, but alludes that Agile story cards’ tests are being expressed programatically. Here, the acceptance criteria for stories are translated to machine readable acceptance tests.

As software development grows to encompass Infrastructure as Code (IaC), there are now ways to express hardware expectations using MDD, or Monitor-driven Development (MDD). MDD applies the same principles of lean development to code that represents machines (virtual or otherwise).


This example will actually provision a VM, configure it to install mySQL and drop the VM at the end of the test.

Feature: App deploys to a VM

  Given I have a vm with ip “”

Scenario: Installing mySQL
  When I provision users on it
  When I run the “dbserver” ansible playbook
  Then I log on as “deployer”, then “mysql” is installed
  And I can log on as “deployer” to mysql
  Then I remove the VM

The full example can be viewed here.

ServerSpec gives us bliss:

describe service(‘apache2’) do
  it { should be_enabled }
  it { should be_running }

describe port(80) do
  it { should be_listening }

For a more extreme example of xDD, please refer to my blog entry regarding Returns-driven Development (RDD) for writing tests from a business-goal perspective.

Tools of the trade

.net: nUnit | SpecFlow

Java: jUnit | jBehave

Ruby: rSpec | Cucumber | ServerSpec


The quality of the tests is measured by how precisely they test the code at their respective levels, as well as how they were written with regards to the amount of code or spanning responsibility and the quality of their assertions.

Unit tests that do not use stubs or mocks when accessing external services of all kinds are probably testing too much and will be slow to execute. Slow test-suites will, eventually, become a bottleneck and may be destined to be abandoned by the team. Conversely, testing basic compiler functions will lead to a huge test-suite, giving false indication of the breath of the safety-net it provides.

Similarly, tests that lack correct assertions or have too many of them, are either testing nothing at all, or testing too much.

Yet there is a paradox: The tests’ importance and impact are proportionally inverse to their fragility in the pyramid. In other words, as we climb the tiers, the more important the tests become, yet they become less robust and trustworthy at the same time. A major pitfall at the upper levels is the lack of application or business-logic coverage. I was on a project that had hundreds of passing tests, yet the product failed in production as external interfaces were not mocked, simulated nor tested adequately. Our pyramid’s peak was bare, and the product’s shortcomings were immediately visible in production. Such may be the fate of any system that interacts with external systems across different companies; Lacking dedicated environments, one must resort to simulating their interfaces, something that comes with its own risks.

In summary, we quickly found that the art and science of software development is no different than the art and science of contriving its tests. It is for this reason that I rely on the “driven” methodologies to save me from my own misdoings.


Write as little code as you can, using TDD.

Happy driving!

From Zero to Deployment: Vagrant, Ansible, Capistrano 3 to deploy your Rails Apps to DigitalOcean automatically (part 0)


Use Cucumber to start us off on our Infrastructure as code journey.



Part 1 of this blog series demonstrates some Ansible playbooks to create a VM ready for Rails deployment using Vagrant. This is a prequel in the sense that, as a staunch believer in all that’s xDD, I should have started this blog with some Cucumber BDD!
Please forgive my misbehaving and accept my apologies with a few Cucumber scenarios as penance. Hey, it’s never too late to write tests…

The Cucumber Scenarios

As BDD artefacts, they should speak for themselves; write to me if they don’t as it means they were not clear enough!


Feature: App deploys to a VM
Given I have a vm with ip ""
Scenario: Building the VM
When I provision users on it
Then I can log on to it as the "deploy" user
And I can log on to it as the "root" user
And I can log on to it as the "vagrant" user
Then I remove the VM
Scenario: Adding Linux dependencies
When I provision users on it
When I run the "webserver" ansible playbook
And I log on as "deploy", there is no "ruby"
But "gcc" is present
Then I remove the VM
Scenario: Installing mySQL
When I provision users on it
When I run the "dbserver" ansible playbook
Then I log on as "deploy", then "mysql" is installed
And I can log on as "deploy" to mysql
Then I remove the VM

The Cucumber Steps

Given(/^I have a vm with ip "(.*?)"$/) do |ip|
@ip = ip
output=`vagrant up`
assert $?.success?
When(/^I provision users on it$/) do
output=`vagrant provision web`
assert $?.success?
Then(/^I can log on to it as the "(.*?)" user$/) do |user|
output=`ssh "#{user}@#{@ip}" exit`
assert $?.success?
When(/^I run the "(.*?)" ansible playbook$/) do |playbook|
output=`ansible-playbook devops/"#{playbook}".yml -i devops/webhosts`
assert $?.success?
When(/^I log on as "(.*?)", there is no "(.*?)"$/) do |user, program|
@user = user
output = run_remote(user, program)
assert !$?.success?
When(/^"(.*?)" is present$/) do |program|
output = run_remote(@user, program)
assert $?.success?
Then(/^I log on as "(.*?)", then "(.*?)" is installed$/) do |user, program|
output = run_remote(user, program)
assert $?.success?
Then(/^I remove the VM$/) do
output=`vagrant destroy -f`
assert $?.success?
Then(/^I can log on as "(.*?)" to mysql$/) do |user|
`ssh "#{user}@#{@ip}" 'echo "show databases;" | mysql -u "#{user}" -praindrop'`
def run_remote(user, program)
`ssh "#{user}@#{@ip}" '"#{program}" --version'`

Returns Driven Development

The premise of all the “DD” acronyms is to minimise technical debt in one way or another and otherwise drive us to being lean.

The motivation for this article is “writing the minimum amount of code” in the spirit of Agile in general and TDD/BDD specifically. As someone who has developed code for more than a quarter of a century, I have learned that anything I write as code will be used against me as long as the software is in use. I don’t want to write more code than I need to in order to justify my reward. In this case, my reward is to have the RDD monitor set off an alert that serves as feedback to knowledgable people to make decisions about the product such that I will continue to be rewarded.
So, what is RDD?

TDD instructs us to write as little code as we can to assure a passing set of tests.
BDD instructs us to write as little code as we can to assure a valuable set of features.
I’d like to extend these guides to a methodology that instructs us to write as little code as we can to assure a specific level of business returns (i.e. ROI). I’ll call it RDD for fun. Returns Driven Development (thanks to my fellow ThoughtWorker Kyle for coming up with the name!).

In most cases, there is an underlying business case for creating or modifying software. Of those, some are justified by a business plan that shows how much more money the business would make if only the requested features were implemented. Of those, only a few are borne of a real market analysis. In the rest of the cases, the primary motivation is the product manager’s intuition that it would be nice to have these new features.

I wanted a way for the product owner to convey her ideas about the modifications, without regard to her motivation. RDD is a way to describe software feature requests without having to make up financial data to justify the requests. It’s also a way to validate the intuition of the product team.

Some examples:

Our customer acquisition rate will increase if we made signup easier.

Our salespeople will sell more licenses if they could demonstrate the software at trade shows with preloaded customer accounts.

Our sales will increase if we exposed our B2B services to the public Internet.

All these sound valid points for a product manager to present as justification in embarking on a technical investment in creating or modifying existing software.

The only change I would make to the above examples is to add a quantity. Acquisition rate will increase by 30%; we’d have 25% more sales etc.

This is the starting point of RDD: in order to assure the growth of this business, we need to increase sales by X%.

Now that we have that statement, it will be scrutinised by the company’s board and a decision will be made regarding its implementation. If action were to be taken, RDD is now charged by proving those statements.

RDD assures statement validation by providing business feedback to the product owner that the course charted is indeed driving towards that stated goal. The sooner and more precise the feedback, better decisions will be made to adjust the statement or the course of action.

RDD proposes to set up the monitors first and develop minimal software to satisfy them. The monitors will provide a baseline of the current situation and, prior to development, will indicate whether the premise was indeed factual and worthwhile.

As an example, an RDD monitor will state:

Generate an alert if the number of the daily sales of licenses is below 30 or is in decline more than 5% week over week.

Generate an alert if the number of B2B API calls originates from more than 10% of our customers.

Primarily, the alerts will indicate movement in their business domains and will set a baseline of alert frequency. They can also serve as indicators that something is not functioning on a technical level, but that’s secondary as other IT alerts exist for that purpose.

Now comes the fun, DD, part:

The monitor’s premise is dependent on much more than meets the eye at first reading.
The data for the alarm may not exist. The transaction table may or many not exist, depending on the state of the product that the alarm is set to monitor. If this is a new product, a transaction data source may actually have to be created just for the monitor. That alone is an excellent improvement to the organisation.
Following that scenario, a data table without data is not much use; enter BDD. Enter TDD. Soon, you may have developed the app from scratch. A whole system may have been created as the result of a Returns statement made by the product manager, yet we invested the minimal amount of development needed to satisfy the monitor. Feedback is guaranteed as the monitor was implemented up front even before the software was.
RDD is also effective when extending existing software as well, while assuring that the minimum code was written to satisfy the monitor’s goal.
The claim that a simple sign up form will boost customer acquisitions will soon be proven right or wrong. The monitor will raise an alert if signups have increased week-over-week. If it does not, we may need another monitor that observes another aspect of the product that questions its value to the users.

So, the next time you are involved in a product’s inception or new feature, start with business monitors!

Try asking for a business returns monitor from the operations group. At first, their mouths will open and close but without words coming out. Soon after, they will realise that it is nothing but another monitor. You then employ DDD/BDD/TDD To develop it and the system that feeds it information. You then sit back and wait for the product owners to request new monitors or features as they attempt to regulate the reported data to prove their original claims either right or wrong, or a little of both.

Weekend warrior – MacRuby and rSpec, Mac OS X Lion, Xcode V4.3.2

Inspired by the recent buzz over RubyMotion, of which I am a proud licensee, I wanted to play a little with MacRuby just to get into the swing of things.

After deciding that doing so was more worthwhile than to mow the lawn, I set out to see what it took to start a project in MacRuby with rSpec support as a basis to start work.

MacRuby’s article got me started, but did not work because the test target could not find the framework that I wanted tested. I don’t know why, since I (sort of) follwed the instructions there. I say “sort of” since the article shows screen-shots of an older Xcode, and even though I thought I set things correctly in my version (Xcode V4.3.2), it still would not build. Also, I am on Mac OS X Lion and that may have had something to do with it.

After realising that if I did not continue trying, a certain member of the household would make me mow that lawn, Google found another article here by Steve Madsen.

It too looked promising, but again, needed tweaking to get working in my environment. It’s thanks to Steve’s post that I managed to get it working.

Here were my steps:
a. Create a new project in Xcode (or use an existing one that you want to rSpec)
b. Install MacRuby
c. Follow Steve Madsen’s instructions

At that stage it still did not work for me, but that was because of a misunderstaning that was clarified quickly enough:

Steve’s screen-shot for the scheme settings on the Specs framework is cut off and does not show the “Expand Variables Based On” setting, so $(SRCROOT) was never expanded for me. I replaced it with an absolute path (ugh) and it worked, so I knew something was not picking up that macro. The solution was to give a value to that drop-down, as shown in the screen-shot below.

If, like me, you’re on Xcode V4.3.2, you might find the following screen-shots useful (just refer to them as you follow Steve’s post):

a. Build settings:

b. Scheme settings:


You cannot imagine the joy of seeing Ruby code drive an Objective-C framework testing session using rSpec in Xcode.

Now to that mower…

DDD – Document Driven Development

We rarely document. We are used to being handed a set of PowerPoint slides that describe, on a very high level, the business need for software. We roll our eyes at the slides, and get to work, asking questions, clarifying the needs, hope to understand them and start imagining features and how we can deliver the implementation within the requested timeline.

If we follow the Agile framework, we’ll translate the transformed slides into stories. We do so and derive tasks from them. If we’re lucky, we might be able to condition the business to accept deliverable milestones that are aligned with those stories.

Using BDD, we’ll transcribe the stories into Gherkin and using TDD, we’ll start coding tests at that time (rSpec, Cucumber).

As development gets under way, we cycle through iterations and we deliver collaboratively.

After the celebrations, all the good things mentioned above (stories, milestones, BDD, TDD) evaporate as the project starts gliding at low altitude as the business moves to new territories. We’re left with mundane maintenance and tickets are opened for small bug fixes and minor enhancements. Stories are no longer written as “it’s not worth it” and small changes are never fully documented.

The project stops being documented and over time, as the team members rotate and business rules change, people no longer remember why we check-off the ‘accept contract’ terms after signup and not on the page where the user enters their email address. It so happens that there will be a major impact on the back-end provisioning system if we change that.

I think the pattern is clear – If we don’t use our documents, the whole eco-system of our product degrades to entropy and will ultimately lead us to revival by rewrite, or at least by going through the analysis again and likely to some re-engineering. Time wasted.

What I would love to see is a system whereby the development and maintenance is driven by documentation and that the documentation drives the deliverables.

The pieces are there, we just need to use them:
Participate in the requirements phases, translate them to stories, deliver story implementions. Always, recurringly. Never stopping this cycle.

Months from now, anyone reading your stories will fully understand why the system behaves the way it does – people like to read stories and will understand the system on their own terms. New hires in the business will use them as a guidline on how to perform their jobs. New developers to the team will have a standard to meet when fixing bugs or evaulating new or changed requirements.

We will end up with a document-driven system, accumulating a library of living documents that drove our software development effort. Any new contradictory story will violate the automated validations for previous generations of stories and will stop us in our tracks, showing us exactly where the business flow will break if we add that new feature. No one actually needs to know this in advance: Let the business tell new stories and see how the system reacts. It’ll tell us whether we’re in violation of any existing processes and alert us automatically.

If you’re using Gherkin and Cucumber already, put them front and center of your development workflow and don’t let go of them!