Speed up builds and separating infrastructure (update on becoming an Azure Solution Architect)

It has been a while since I last posted an update on becoming an Azure Solution Architect. When I started this journey in 2020 I didn’t have a lot of hands on experience with Azure. One year later I still feel like I’m learning new things every day. 🙂

Working on a real project helped me a lot in understanding things better and automating the whole setup with Terraform and GitLab was a great experience. I really recommend to think about CI/CD first when starting a new project, altough it isn’t easy.

But it pays off very soon, as you just dont have to care anymore about infrastructure and you can recreate your resources any time. Just run terraform apply when starting to work on the project and run terraform destroy at the end of the coding session to avoid unnecessary costs during development. It is pretty cool watching terraform setting up and tearing down all the resources.

Terraform supports Azure quite well, altough I encountered some limitations. The documentation is really good!

Separating Infrastructure and App Deployment (and sharing data)

One lesson I had to learn (thanks to the guidance from a colleague at work): it is better to separate the cloud infrastructure and the application build and deployment. I may sound tempting to put it all together, but it grows in complexity quite fast. I ended up having two projects with two pipelines:

my-project
my-project-infra

The infra project contains the terraform declarations and a simple pipeline to run the terraform commands. The client and client secret I provide via GitLab variables. This works very well, but you will typically require some keys, URLs, connection strings or the like when deploying the application. Terraform allows to store and access the required attributes by declaring outputs

output "storage_connection_string" {
  description = "Connection string for storage account"
  value       = azurerm_storage_account.my_storage.primary_connection_string
  sensitive = true
}

Terraform allows us to access the connection string any time later by invoking terraform commands, as the data is kept together with the state. This is where the concept clicked for me. I use them in the pipeline like so, exporting them via dotenv

terraform_output:
  stage: terraform_output
  image:
    name: hashicorp/terraform:1.0.8    
    entrypoint:
      - '/usr/bin/env'
      - 'PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'    
  script:
    - terraform init  
    - echo "AZURE_STORAGE_CONNECTION_STRING=$(terraform output --raw storage_connection_string)" >> build.env        
  artifacts:
    reports:
      dotenv: build.env
  only:
    - master

When deploying the web app, I could then just access the connection string. For me this was not very intuitive, I think tools could support such use cases better, unless I’m just doing it wrong. 🙂 Happy to hear about better ways. But essentially this is the way I could access the connetion string as an environment variable in a later stage, using a different image.

deploy-web:
  stage: deploy
  image: mcr.microsoft.com/azure-functions/node:3.0-node14-core-tools	
  script:
   - az storage blob delete-batch --connection-string $AZURE_STORAGE_CONNECTION_STRING -s "\$web"
   - az storage blob upload-batch --connection-string $AZURE_STORAGE_CONNECTION_STRING -d "\$web" -s ./dist/my-app
  only:
    refs:
      - master        
  dependencies:
    - terraform_output
    - build-web

Optimize the build

A downside of the way we are building software today: there is no built in incremental build support. At least my pipelines tend to be quite slow without optimization and proper caching and it takes minutes to build and redeploy everything, even if the project is rather simple. So, knowing which parts of the build you can cache can save you a lot of time and money, but it may also not be super intuitive.

That’s why I would like to share one pattern that I use for my Angular applications (and it should work for any node / npm based project).

Who doesn’t get sleepy waiting for npm to install all the project dependencies?

I have split up the job into two parts to only run npm install when really required, i.e. when something in the package-lock.json changes – and then cache the result for the next stage (and subsequent runs).

install_dependencies:
  stage: install_dependencies
  image: node:14-alpine
  cache: 
    key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
    paths:
      - ./node_modules/
  script:
    - npm ci
  only:
    changes:
      - ./package-lock.json

only/changes will ensure the job only runs if the package-lock.json has been changed, for example when you add or upgrade a dependency.

The cache configuration then keeps the node_modules handy for the next job or stage:

build-web:
  stage: build
  image: node:14-alpine
  cache: 
    key: $CI_COMMIT_REF_SLUG-$CI_PROJECT_DIR
    paths:
      - ./node_modules
    policy: pull-push 
  script:
    - npm install -g @angular/cli
    - ng build
  artifacts:
    paths:
      - ./dist/my-app

Have fun speeding up your pipelines!

Separating Infrastructure and App Deployment (and sharing data)

Optimize the build

Leave a comment Cancel reply