Fun with WSL and local GitLab Runner

I was looking for a solution to run a GitLab pipeline locally. I haven’t really figured out an easy way, but apparently one could use the gitlab-runner tool to run individual jobs. Although you can install all tools for Windows I wanted to run the tools a bit more isolated. Therefore I decided to use wsl.

This is what I had to do!

  • install Ubuntu distribution
  • install gitlab-runner tools
  • install docker
  • run the gitlab commands

The list is quite short, but I spent quite some time figuring out how I can make caching happen.

In a nutshell I run an Ubuntu VM using wsl in which I can execute my pipeline jobs using gitlab-runner. The runner is spinning up Docker containers to execute the jobs as declared in .gitlab-ci.yml.

Ubuntu / WSL

First I had to install the Ubuntu WSL distro. Although the command line tells me where to find the distros (i.e. the Microsoft Store) I had a bit a hard time finding it. But the link WSL | Ubuntu helped me out as there is a link to directly get to the proper distro.

I have a complete Ubuntu environment ready in seconds and the integration with Windows works really well. I start WSL by typing wsl -d Ubuntu in my command line.

Ubuntu, ready in seconds

Install the tools

First of all I installed gitlab-runner:

sudo apt install gitlab-runner

Then I installed docker, which is a bit of a pain if you just want to get started quickly. I basically followed this guide and it worked well: How To Install and Use Docker on Ubuntu 20.04 | DigitalOcean

WSL 2

I first tried to run docker on my VM, but it failed. I had to upgrade my distro to WSL 2 by invoking this command:

wsl –set-version Ubuntu 2

After launching the VM again, I was able to run docker commands.

Docker

When I run my GitLab pipeline, I want to use Docker as executor. GitLab runner basically spins up containers (as per image defined in the .gitlab-ci.yml) and executes the job. The Docker daemon doesn’t start automatically, this is not hard to configure, but to first test my setup I had to start it manually by invoking sudo service docker start

I verified my setup by running docker run hello-world. If it works, it will print something like:

running a container in a VM running on Windows. Cool!

Running GitLab

Although it reads pretty simple, I spent quite some time understanding how to use the gitlab-runner tool. My main issue was to ensure the cache is working between the job executions. All the builds runs in a container and my initial assumption that caching just works was wrong. The tool tells me that instead of a distributed cache a local cache is used, but it never worked.

The trick is to mount a volume, so that the cache created inside the container is persisted on the host.

So, to actually run a job from my pipeline I navigated to a project with the .gitlab-ci.yml in it and executed the following command:

sudo gitlab-runner exec docker build-web –docker-volumes /home/gitlab-runner/cache/:/cache

Where build-web is the job I want to run and /home/gitlab-runner/cache the directory on the host system where the cache should be stored. By default the runner will put the cache in the /cache directory in the container.

Final Thoughts

I was hoping that I can execute the whole pipeline using the command line. Seems with gitlab-runner I can only run a single job. Still good to test stuff – definitely good to learn more about how GitLab runners work. And maybe this guide helps someone setting up their local GitLab runner.

Speed up builds and separating infrastructure (update on becoming an Azure Solution Architect)

It has been a while since I last posted an update on becoming an Azure Solution Architect. When I started this journey in 2020 I didn’t have a lot of hands on experience with Azure. One year later I still feel like I’m learning new things every day. 🙂

Working on a real project helped me a lot in understanding things better and automating the whole setup with Terraform and GitLab was a great experience. I really recommend to think about CI/CD first when starting a new project, altough it isn’t easy.

But it pays off very soon, as you just dont have to care anymore about infrastructure and you can recreate your resources any time. Just run terraform apply when starting to work on the project and run terraform destroy at the end of the coding session to avoid unnecessary costs during development. It is pretty cool watching terraform setting up and tearing down all the resources.

Terraform supports Azure quite well, altough I encountered some limitations. The documentation is really good!

Separating Infrastructure and App Deployment (and sharing data)

One lesson I had to learn (thanks to the guidance from a colleague at work): it is better to separate the cloud infrastructure and the application build and deployment. I may sound tempting to put it all together, but it grows in complexity quite fast. I ended up having two projects with two pipelines:

  • my-project
  • my-project-infra

The infra project contains the terraform declarations and a simple pipeline to run the terraform commands. The client and client secret I provide via GitLab variables. This works very well, but you will typically require some keys, URLs, connection strings or the like when deploying the application. Terraform allows to store and access the required attributes by declaring outputs

output "storage_connection_string" {
  description = "Connection string for storage account"
  value       = azurerm_storage_account.my_storage.primary_connection_string
  sensitive = true
}

Terraform allows us to access the connection string any time later by invoking terraform commands, as the data is kept together with the state. This is where the concept clicked for me. I use them in the pipeline like so, exporting them via dotenv

terraform_output:
  stage: terraform_output
  image:
    name: hashicorp/terraform:1.0.8    
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'    
  script:
    - terraform init  
    - echo "AZURE_STORAGE_CONNECTION_STRING=$(terraform output --raw storage_connection_string)" >> build.env        
  artifacts:
    reports:
      dotenv: build.env
  only:
    - master

When deploying the web app, I could then just access the connection string. For me this was not very intuitive, I think tools could support such use cases better, unless I’m just doing it wrong. 🙂 Happy to hear about better ways. But essentially this is the way I could access the connetion string as an environment variable in a later stage, using a different image.

deploy-web:
  stage: deploy
  image: mcr.microsoft.com/azure-functions/node:3.0-node14-core-tools	
  script:
   - az storage blob delete-batch --connection-string $AZURE_STORAGE_CONNECTION_STRING -s "\$web"
   - az storage blob upload-batch --connection-string $AZURE_STORAGE_CONNECTION_STRING -d "\$web" -s ./dist/my-app
  only:
    refs:
      - master        
  dependencies:
    - terraform_output
    - build-web

Optimize the build

A downside of the way we are building software today: there is no built in incremental build support. At least my pipelines tend to be quite slow without optimization and proper caching and it takes minutes to build and redeploy everything, even if the project is rather simple. So, knowing which parts of the build you can cache can save you a lot of time and money, but it may also not be super intuitive.

That’s why I would like to share one pattern that I use for my Angular applications (and it should work for any node / npm based project).

Who doesn’t get sleepy waiting for npm to install all the project dependencies?

I have split up the job into two parts to only run npm install when really required, i.e. when something in the package-lock.json changes – and then cache the result for the next stage (and subsequent runs).

install_dependencies:
  stage: install_dependencies
  image: node:14-alpine
  cache: 
    key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
    paths:
      - ./node_modules/
  script:
    - npm ci
  only:
    changes:
      - ./package-lock.json

only/changes will ensure the job only runs if the package-lock.json has been changed, for example when you add or upgrade a dependency.

The cache configuration then keeps the node_modules handy for the next job or stage:

build-web:
  stage: build
  image: node:14-alpine
  cache: 
    key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
    paths:
      - ./node_modules
    policy: pull-push 
  script:
    - npm install -g @angular/cli
    - ng build
  artifacts:
    paths:
      - ./dist/my-app

Have fun speeding up your pipelines!