Fixing meteor test CI timeout

In the beginning of this year I started helping Openki to integrate their test suite with GitLab's CI.

We built on their existing test script. The original script is used for local testing. This is the final script running inside the CI infrastructure:

#!/bin/bash

set -xe

export METEOR_ALLOW_SUPERUSER='true'

apt-get update -qq && apt-get upgrade -qqy
apt-get install -qq build-essential python git libxss1 libappindicator1 libindicator7 curl wget xvfb libxtst6 libxss1 libgconf2-4 libnss3 libgtk2.0-0 libgtk-3-0 libasound2
curl -sL https://deb.nodesource.com/setup_8.x | bash -

export DISPLAY=':99.0'
Xvfb "${DISPLAY}" -screen 0 1024x768x24 > /dev/null 2>&1 &

# Download Meteor
PATH=$PATH:$HOME/.meteor
mkdir -p .meteor/ .npm/ node_modules/
if [ ! -e $HOME/.meteor/meteor ]; then curl -k https://install.meteor.com | sh; fi

#install google-chrome
#wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
#sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
apt-get install -qqy ./google-chrome*.deb
rm google-chrome-stable_current_amd64.deb


mv /usr/bin/google-chrome /usr/bin/google-chrome-original

printf  '#!/bin/bash\n\ngoogle-chrome-original --no-sandbox --headless --disable-setuid-sandbox --disable-gpu --enable-debugging "$@"\n' > /usr/bin/google-chrome

chmod +x /usr/bin/google-chrome

export CHROME_BIN=/usr/bin/google-chrome

meteor update --patch
meteor npm i
meteor node -v && meteor npm version


meteor npm run app-test

We tried many different containers and configurations. However, the tests would timeout and fail on most systems and the CI.

So we started comparing the systems to find differences in behaviour. I started with the obvious and found some lower bounds:

  • Available memory, at least 512MB must be free
  • Num CPUs, at least two CPUs
  • Storage
    • Space, at least 2GB free before the job starts
    • HDD vs. SSD, both would fail most of the time
  • Kernel, old or new no pattern

Months later ...

After months of randomly changing parameters I had an this week: What if the response time was a problem?

  • What if our old testing hardware was to slow?
  • What if the CI instance running on GCE had not enough IO Bandwidth?

Lucky for us GitLab offers internal runners that one can enable by tagging the jobs with gitlab-org:

  tags:
  - gitlab-org

In addition the test now runs 3-4 times faster, which is great.

Here is the complete .gitlab-ci.yml:

image: ubuntu:latest

stages:
  - cleanup
  - test
  - deploy

cache:
  paths:
  - node_modules/
  - .npm/
  - .meteor/

variables:
  # DEBUG: "selenium*, chromedriver*"
  DISPLAY: ":99.0"
  # Reduce git traffic
  GIT_DEPTH: "10"
  METEOR_ALLOW_SUPERUSER: "true"
  TOOL_NODE_FLAGS: '--max_old_space_size=4096'

cleanup:
  stage: cleanup
  script:
  - rm -rf .meteor/ .npm/ node_modules/
  only:
  - staging
  - master

test:
  stage: test
  tags:
  - gitlab-org
  script: ./ci/test2.sh

links

social