Cloud Resume Challenge, Step 14 & 15, CI/CD

I’m nearly on the home stretch of the Cloud Resume Challenge! In this penultimate step (technically two steps but I’ll discuss them at the same time here), you’re required to set up CI/CD pipelines for your front and back ends, such that when commits are pushed to GitHub, your code is tested and – following a manual check – merged into your main branch, deployed to production, then tested again to make sure everything’s working OK, all as part of one automatic process.

CI/CD stands for Continuous Integration / Continuous Delivery (or Deployment.) This is a practice intended to accelerate the pace of software development by automating tasks such as testing, merging and deployment to production. The intention is that, rather than having large and infrequent code merges, small merges are happening regularly, and being tested automatically, before they are reviewed to be merged into the stable code base. As these commits are smaller, review times should be shorter and bugs should be found earlier and more often. That’s the theory, so let’s put it into practice!

There are various CI/CD tools available now (including GCP’s own Cloud Build.) I have tried to keep things GCP-native in this project as much as possible but I also wanted to get some more experience using GitHub Actions (GH’s own automation solution) as I’d heard great things about it’s integration with GitHub itself (and, also, it’s free 🙂 )

As is the case with many CI/CD tools, GitHub Actions uses YAML files to create workflows, which run jobs on distinct worker VMs. All that’s essentially happening is you’re instantiating a VM for the duration of the job, and giving it a bunch of tasks to do. As these tasks run, you get a nice visual representation so you can see how it’s going:

GitHub Actions will automatically attempt to run any valid YAML file it finds in the ‘github/workflows/’ folder at the root of your repo. Workflows run in response to events, which can be triggered manually, scheduled, or run in response to events such as merges / pushes or other workflows. Here I’m kicking off my QA workflow in response to code being pushed on the dev branch:

You can then start adding jobs to your workflow. My plan here is to use GitHub Actions to:

  • Build the container image for latest revision of my Python app
  • Push it to Artifact Registry
  • Provision my QA infrastructure with Terraform
  • Deploy the container to Cloud Run
  • Run all my automated tests in the QA environment
  • If the tests pass, raise a Pull Request to merge dev into main
  • Create a Terraform plan and add it to the Pull Request for approval
  • On approval of the Pull Request, provision the resources on the prod project
  • Push the new container to Cloud Run
  • Run tests on the updated live environment
  • (this is going to be a long post!)

Below I’ve started adding steps to a job called ‘build_and_artifact’. Each job can have multiple steps which must either use ‘run’ to run a cli command or ‘uses’ to call pre-built actions from the GitHub Marketplace. I’m using one such action below to check out my repo on the job runner VM:

Next I’ll use a Google-owned action to authenticate with my GCP project:

(Quick side note here that I’m following best practice by using Workload Identity Federation to authenticate with GCP, rather than the less secure JSON key method. There’s a great guide about this here, and here’s the Terraform code I wrote to set this up in my project.)

Next step is to build my container and push it to Artifact Registry. I was expecting to have to use a Marketplace action to run Docker here but it seems to work out of the box using the usual CLI commands, so happy days! One thing to mention here is that even though I’ve already authenticated with my GCP project, I have to additionally authenticate with Artifact Registry in order to push to it from Docker:

Next, I’m going to begin a new job called ‘deploy’. Because I’ve stated needs: [build_and_artifact], this job will only run on successful completion of the previous one.

Few things to note here; firstly, as this is a new job, it runs on a new VM, therefore it’s necessary to checkout my repo again and reauthenticate with GCP. Secondly, I’m avoiding having to install Terraform on the runner by calling the uses: hashicorp/setup-terraform Marketplace action:

The final thing of note above is that, although my Terraform code to provision my Cloud Run service specifies a container image, that image won’t update to the newest version unless I manually change the URL in my Terraform config. For this reason, I’m using gcloud to run the command to deploy the new image. This does feel a little hacky but it keeps things automated, so I’m sticking with it for now!

Next, I want to run all my Cypress tests to ensure that everything in my newly deployed QA environment is working as intended. I had a little bit of trouble getting Cypress to play nice with the environment variables I needed to pass to it; the below method with 'env:‘ in the step itself worked in the end.

Finally in the QA workflow, assuming the tests all pass, I wanted to create a pull request to merge dev into main. As one might expect from a GitHub tool, the gh CLI tool is also pre-installed, so this works out of the box with the usual command (you do need to pass it a personal access token though):

All being well, this should have created a fresh pull request; however I don’t want to just deploy to production without first checking what Terraform intends to do in that environment. Fortunately HashiCorp suggests a great workflow to handle this that uses a few clever GitHub actions tricks. The first step is to start a workflow that begins in response to either a PR or a push to main (ie, an approved PR):

The rest of the workflow pans out much the same the QA one, except for the Terraform section:

You’ll see that the ‘plan’ section contains an if statement (if: github.event_name == 'pull_request') which causes the step only to run in response to a PR. The following GitHub script then employs a bunch of a native variables and methods to update the PR with the output of the Terraform plan (note also the step to exit the workflow if the plan fails.) Then the tf_apply step only runs in response to a push to main; the assumption here being that someone must have seen the updated PR and approved it.

The once all is deployed, the tests run on prod to ensure that everything live is working as it should.

The full workflow files for the above are here, and I built a similar process for my front end site, which you can see here.

And with that, unless I’m very much mistaken, I think I just finished the Cloud Resume Challenge! The last step is a summary blog post, so I’ll do that next. But phew! 🙂