Scheduling dbt with GitHub Actions: Pros & Cons
Scheduling dbt with GitHub Actions: Pros & Cons
![Airport departures timetable](https://miro.medium.com/v2/resize:fit:700/1*VTzXC8qPqVW9XlLgtD1bpA.jpeg)
GitHub Actions is a powerful CI/CD pipeline automation tool for automating software development workflows and is fast becoming one of the go-to choices for pipeline orchestration. Other options include Airflow or dbt Cloud’s browser-based IDE. In this article, we will explore some of the reasons why GitHub Actions is one of the best pipeline orchestrators available today. First, GitHub Actions is tightly integrated with the rest of the GitHub ecosystem, making pipelines easier to use and manage. GitHub Actions allows you to create workflows triggered by various events such as commits, pull requests, and releases. This allows you to automate pipelines that are tightly linked to your development process, thus reducing errors and increasing efficiency. GitHub Actions is built on open standards: this means it can be easily integrated with other tools and services. This also makes it easy to create pipelines that span multiple stages: from development and testing to deployment and monitoring. You can also use various third-party actions and plugins that help you further extend the functionality of your pipeline such as sqlfluff. Additionally, GitHub Actions has a large and active community. That means you’ll find a wealth of resources and best practices to help you get started and optimize your pipeline. This community offers a variety of tutorials, sample workflows, and other resources to help you get the most out of GitHub Actions. Another aspect to emphasize is cost efficiency: GitHub Actions offers a free tier with a generous number of hours per month, so you can use it for small or personal projects (knock knock.. entrepreneurs!). And, thanks to GitHub Actions Importer you can now plan and automatically migrate your CI/CD pipelines to GitHub Actions from Azure DevOps, CircleCI, GitLab, Jenkins, and Travis CI. Here is a simple illustration of how to use Github Actions to deploy a dbt project: All in all, a complete GitHub Action workflow for deploying a dbt project on schedule can look like this: This workflow runs on a schedule set by cron to checkout the repository, sets up Python, installs the dbt package, and runs the dbt project. You can also see how to use environment variables to pass database connection information that can be stored as GitHub secrets. As you may have noticed, before using GitHub Actions, you need to create service credentials to access your data warehouse. For example, you can do this with BigQuery using a dedicated service account. Additionally, you can use the workflow in conjunction with re_da open-source library. Here’s an example from re_data’s repo below: This is a Github Actions workflow written in YAML that drives the data analysis process. trigger: Environment variables: The workflow sets several environment variables, including : Job: The work consists of several steps: I think the free tier limitation is the downside of GitHub Actions. GitHub Actions’ free tier has the following usage limits: As of now, if you need to run workflows that exceed these limits, you should consider using a paid plan or contact GitHub support for more information. In summary, GitHub Actions is a powerful CI/CD pipeline automation tool that offers benefits such as ease of use, integration with the rest of the GitHub ecosystem, cost-effectiveness, and a large and active community. By creating a workflow and following the steps described in the article, you can use it to deploy your dbt project on a schedule. However, the free tier has usage limits, such as a limit on the number of concurrent jobs and a limit on the number of hours per month. Nonetheless, GitHub Actions is still a great option for those looking to automate their pipelines and streamline their software development workflows. Source: Scheduling dbt with GitHub Actions: Pros & Cons | by Idriss Chebak | re_data | MediumIntroduction
Why choose GitHub Action?
How :
name: Deploy dbt
on:
schedule:
- cron: 0 4 * * *
env:
DB_HOST: '${{ secrets.DB_HOST }}'
DB_PORT: '${{ secrets.DB_PORT }}'
DB_USER: '${{ secrets.DB_USER }}'
DB_PASS: '${{ secrets.DB_PASS }}'
DB_NAME: '${{ secrets.DB_NAME }}'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@masterbranch
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.8'
- name: Install dependencies
run: |
pip install dbt
dbt deps --target {{target}}
- name: Run dbt
run: |
dbt run --target {{target}}
- name: Test using dbt
dbt test --target {{target}'
name: analysis
on:
push:
branches: [ main ]
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
env:
DBT_PROFILES_DIR: ${{ github.workspace }}
RE_DATA_CONFIG_DIR: ${{ github.workspace }}
RE_DATA_SEND_ANONYMOUS_USAGE_STATS: 0
RE_CLOUD_API_KEY: ${{ secrets.RE_CLOUD_API_KEY }}
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_DBT_PASSWORD: ${{ secrets.SNOWFLAKE_DBT_PASSWORD }}
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v1
with:
python-version: "3.7.x"
- name: Install re_data and python dependencies
run: |
pip install dbt-core==1.3.0
pip install dbt-snowflake==1.3.0
pip install re_data
pip install re_cloud
- name: Run dbt
run: |
dbt --version
dbt deps
dbt run
re_data run
- name: Generate dbt docs
run: |
dbt docs generate
- name: Generate re_data overview
run: |
re_data overview generate
- name: Configure re_data.yml for re_cloud production
uses: mikefarah/yq@master
with:
cmd: |
yq -i '.re_cloud.api_key = "${{secrets.RE_CLOUD_API_KEY}}"' re_data.yml
- name: Upload dbt docs and re_data to production cloud
run: |
re_cloud upload dbt-docs --name analytics-docs
re_cloud upload re-data --name analytics-report
- name: Configure re_data.yml for re_cloud testing
uses: mikefarah/yq@master
with:
cmd: |
yq -i '.re_cloud.api_key = "${{secrets.RE_CLOUD_TEST_API_KEY}}"' re_data.yml
yq -i '.re_cloud.base_url = "https://test.getre.io"' re_data.yml
- name: Upload dbt docs to testing cloud
run: |
re_cloud upload dbt-docs --name analytics-docs
re_cloud upload re-data --name analytics-report
GitHub Actions’ Downsides :
Conclusion :