Maciej Ciemborowicz

Hello, my name is Maciej Ciemborowicz.

This will be my monument.

Randomly Failing Cucumber Scenarios in the Continuous Integration Systems

Make Your Specs 🚀 Faster with Poltergeist. And ❌ Fail.

Some time ago we decided to make our acceptance tests faster. We were using Cucumber with Selenium and we replaced it with the Poltergeist driver. Poltergeist uses the PhantomJS engine and thanks to that our tests run around three times faster then they did before. Everything works smoothly on our machines, but there is one small problem; Sometimes, in some steps PhantomJS crashes on CircleCI :).

Rerunning Cucumber Scenarios illustration
Scenario: xxx # features/xxx.feature:19
PhantomJS has crashed. Please read the crash reporting guide at https://github.com/ariya/phantomjs/wiki/Crash-Reporting and file a bug report at https://github.com/ariya/phantomjs/issues/new with the crash dump file attached: /tmp/1a24bbe0-87e2-2753-3444a3a4-65f405d9.dmp
  [1.78] Given Logged in User # features/step_definitions/support_steps.rb:9
    PhantomJS client died while processing {"name":"visit","args":["http://127.0.0.1:39185/users/sign_in"]} (Capybara::Poltergeist::DeadClient)
    ./features/step_definitions/support_steps.rb:13:in `/^Logged in User$/'
    features/xxx.feature:20:in `Given Logged in User'

This forces us to click “rebuild” a few times in a row. This doesn’t make our tests faster, but the direction is good. So what could we do? We could:

  1. Connect with CircleCI VM using SSH.
  2. Download the crash dump.
  3. Notice that it contains sensitive data.
  4. Report the crash by creating an issue on GitHub. (I surrender)
  5. Wait for someone to fix it or fix it by ourselves.
  6. Wait for a new version of Poltergeist.
  7. Wait for CircleCI to update their Poltergeist version.

Or maybe…

🔄 Rerun Failing Specs

Cucumber, as with most of the testing tools out there, allows you to choose an output format. What’s more, is it has one specific format called rerun which writes a list of failing scenarios to a specified file.

cucumber logo
cucumber -f rerun --out failing_scenarios.txt

Once you have this file, you can run these scenarios again:

cucumber @failing_scenarios.txt

It’s as easy as that! Let’s write rake tasks which do this:

namespace :failing_cucumber_specs do
  FAILING_SCENARIOS_FILENAME = 'failing_scenarios.txt'
  task :record do
    # run cucumber and record failing scenarios to the file
    # exit 0, we don't want to fail here
    exec("bundle exec cucumber -f rerun --out #{FAILING_SCENARIOS_FILENAME}; exit 0")
  end
  task :rerun do
    # we don't need to run cucumber again if all scenarios passed
    unless File.zero?("#{FAILING_SCENARIOS_FILENAME}")
      # run cucumber with failing scenarios only
      exec("bundle exec cucumber @#{FAILING_SCENARIOS_FILENAME}")
    end
  end
end

At the beginning I was afraid that this would not work with parallel nodes; failing_scenarios.txt shouldn’t be shared between them. But every CircleCI node is an independent virtual machine, with it’s own filesystem, so every node has separate file.

Now you can type:

rake failing_cucumber_specs:record
rake failing_cucumber_specs:rerun

I also updated the test section of circle.yml:

test:
  override:
    - bundle exec rake failing_cucumber_specs:record:
        parallel: true
    - bundle exec rake failing_cucumber_specs:rerun:
        parallel: true

It’s good idea to put failing_scenarios.txt to the .gitignore file before committing changes.

🎒 Usage with Knapsack

We use Knapsack (written by Artur Trzop) which splits tests among multiple nodes. Knapsack has it’s own adapter for Cucumber, so I had to modify the failing_cucumber_specs:record task. Here is a version for Knapsack:

namespace :failing_cucumber_specs do
  FAILING_SCENARIOS_FILENAME = 'failing_scenarios.txt'

  task :record do
    # run cucumber and record failing scenarios to the file
    begin
      Rake::Task['knapsack:cucumber'].invoke("-f rerun --out #{FAILING_SCENARIOS_FILENAME}")
    rescue SystemExit => e
      # exit 0, we don't want failed build because of this task
      puts "#{e.class}: #{e.message}"
      exit 0
    end
  end

  task :rerun do
    # we don't need to run cucumber again if all scenarios passed
    unless File.zero?("#{FAILING_SCENARIOS_FILENAME}")
      # run cucumber with failing scenarios only
      exec("bundle exec cucumber @#{FAILING_SCENARIOS_FILENAME}")
    end
  end
end

🤔 Possible Troubles

🏃🚪 Exit 0 Is Not a Perfect Solution

If you look closely at the rerun task, you can see exit 0 after running Cucumber. We must return a successful exit code, because we don’t want our build to be interrupted during recording failing scenarios. The problem with Cucumber is that it returns 1 when some scenarios fail as well as when it fails itself for any reason. Imagine such a situation:

knapsack logo
  1. Cucumber doesn’t run specs, creates an empty failing scenarios file and crashes.
  2. CircleCI doesn’t notice that, because we force exit 0.
  3. Second Cucumber execution run specs from an empty file. No specs, so it returns 0.
  4. Build is green.

Fortunately, the first point seems to be very unlikely. Even if Cucumber fails for any reason other than red specs (that’s already unlikely), it doesn’t create an empty file, so the second Cucumber run fails. However there was a feature request regarding Cucumber exit status codes. It’s implemented and merged to the master branch so in future releases we will be able to determine whether scenarios failed (exit status 1) or application returned error (exit status 2).

📉 Less Trust in Specs

Imagine some functionality which doesn’t work as expected from time to time, let’s say, because of a race condition. This problem could be noticed when it’s test fail. Rerunning failing tests decreases the probability of detecting this issue. I don’t think it’s a huge problem in our case as I’ve never encountered this in any project I was working on at our company, but I feel obligated to mention this.