diff --git a/content/getting-started/github-flow.md b/content/getting-started/github-flow.md new file mode 100644 index 00000000..43db3598 --- /dev/null +++ b/content/getting-started/github-flow.md @@ -0,0 +1,308 @@ +--- +uid: github-flow +title: GitHub Flow and the Octopus Merge pattern +author: Just Blindbæk +updated: 2026-07-03 +applies_to: + products: + - product: Tabular Editor 2 + full: true + - product: Tabular Editor 3 + editions: + - edition: Desktop + none: true + - edition: Business + full: true + - edition: Enterprise + full: true +--- + +# GitHub Flow and the Octopus Merge pattern + +This article covers the day-to-day **GitHub Flow** workflow recommended in [Enabling parallel development using Git and Save to Folder](xref:parallel-development), and the **Octopus Merge** pattern that supports it: a way of keeping a shared test environment continuously up to date with everything currently in progress. The second half of the article walks through a complete reference pipeline that implements this — which, as you'll see, ends up covering considerably more than the merge step alone. + +## GitHub Flow in daily use + +GitHub Flow's rule is simple — `main` is always deployable, all work happens on short-lived branches off `main` — but a few details are worth making explicit for a semantic model team. + +**Creating a feature branch:** + +```cmd +git checkout main +git pull +git checkout -b feature/add-tax-calculation +``` + +**Local development.** The developer works in Tabular Editor 3. Two things happen every time they hit **Ctrl+S**: + +- The model metadata is saved to disk in [Save to Folder (database.json) format](xref:parallel-development#what-is-save-to-folder) and committed to the feature branch in Git. +- If [Workspace Mode](xref:workspace-mode) is enabled, the model is simultaneously synced to the developer's personal workspace database in a shared dev workspace — allowing live testing in Tabular Editor, and letting Power BI Desktop connect [directly to the workspace database](xref:workspace-mode#advantages-of-workspace-mode) for report-side validation. + +```cmd +git add . +git commit -m "Add tax calculation measure and supporting columns" +git push +``` + +> [!WARNING] +> Do not enable Fabric Git integration on the workspace hosting your workspace databases. Tabular Editor writes to workspace databases directly through the XMLA endpoint, and those writes have no relationship to your Git branches — enabling Git integration on the same workspace creates conflicting, out-of-band changes to the same database. This is also called out in the [Workspace Mode documentation](xref:workspace-mode). + +**Opening a pull request.** When the developer is ready for broader testing, they open a pull request targeting `main`. This is the point where GitHub Flow, on its own, leaves a question open for BI teams: with several developers each having an open PR at once, what should the shared test environment actually reflect? That's what Octopus Merge answers — see below. + +**Approval and merge.** Once technical and business reviewers have signed off using the shared test environment, the feature branch is merged into `main` and deleted. + +**Deploy to UAT / production.** Either every merge to `main` triggers deployment automatically, or merges accumulate and are deployed on a scheduled cadence (for example, weekly). Both are compatible with GitHub Flow — the branch structure is the same either way, only the release trigger differs. + +## Octopus Merge: keeping the test environment current + +##### Note — naming disambiguation + +"Octopus merge" is used in the Git ecosystem to mean three related but distinct things. It's worth being precise about which one we mean here: + +1. **Git's native octopus merge strategy** — the merge strategy Git automatically uses when you run `git merge branch-a branch-b branch-c`, combining more than two branch heads into a single merge commit *as long as there are no conflicts*. If any branch conflicts with the merge in progress, the whole command fails — Git makes no attempt to resolve or isolate conflicts across more than two branches. This is a low-level Git mechanism, not a workflow. +2. **`lesfurets/git-octopus`** — a now-archived open-source command-line tool that wrapped this native strategy into a "continuous merge" workflow: resolve a set of branches by naming pattern, merge them, push the result to a disposable branch, and repeat on every push. It also included tooling to iterate through branches one-by-one to pinpoint which one caused a conflict. The tool itself is no longer maintained and isn't what we recommend implementing directly, but the workflow it pioneered is exactly the pattern described below. +3. **The Octopus Merge pattern described in this article** — a custom CI/CD pipeline that discovers all currently open (non-draft) pull requests targeting `main`, merges their source branches together using Git's native octopus strategy from (1), pushes the result to a disposable branch, and deploys that branch to a shared test environment. The pattern is the same idea as (2), reimplemented as a pipeline script you own — for example a GitHub Actions workflow, or an Azure Pipelines script calling the Azure DevOps REST API — rather than a standalone third-party tool. + +When this article says "Octopus Merge," it means (3). Note that (3) *uses* the native strategy from (1) as its actual merge mechanism — the value it adds is the automation and branch lifecycle around that merge, not an alternative way of merging. + +The pattern, in short: **your test environment always reflects the combination of everything currently in progress** — not just one feature in isolation. Every time a developer pushes to any open, non-draft pull request, the pipeline rebuilds the combined branch from scratch and redeploys it. + +```mermaid +flowchart LR + main(["main"]) -.->|"deleted + recreated"| temp + prA["feature/A
(open PR)"] --> temp + prB["feature/B
(open PR)"] --> temp + prC["feature/C
(draft PR — excluded)"]:::draft + temp[["♻️ octopus/temp
deleted + recreated from main
on every run"]] --> test(["Test workspace"]) + + classDef draft stroke-dasharray: 5 5,opacity:0.55; +``` + +> [!NOTE] +> Tabular Editor now has a cross-platform CLI (`te`) in Limited Public Preview, purpose-built for CI/CD use — non-interactive mode, native GitHub Actions/Azure DevOps annotations, VSTEST output, and a `te test run` command for running regression tests as part of a pipeline. It's a natural fit for the kind of pipeline described below, and worth watching. As of this writing, Tabular Editor's own documentation advises against using it in production pipelines during preview (the preview build is stated to expire 2026-09-30), so the reference implementation in this article uses the established `TabularEditor.exe` CLI instead. See [CI/CD Integration](xref:te-cli-cicd) for the new CLI's current capabilities and examples. + + + +## Reference implementation + +What follows is a complete, working pipeline that implements Octopus Merge — but it's worth being upfront that it does considerably more than the merge step alone. A full run also downloads Tabular Editor, and validates the merged model against your best-practice rules and live data source schema before deploying it to the shared test workspace. Octopus Merge is job 1 of 5; the rest is a general-purpose CI/CD pipeline for semantic models that happens to consume Octopus Merge's output. Deploying reports on top of that model — a separate concern with its own variability by organization — is addressed briefly at the end. + +The examples below show both **Azure Pipelines** (calling the Azure DevOps REST API) and **GitHub Actions** (calling the GitHub REST API) for the merge job, since the two platforms differ mainly in how they authenticate and query pull requests — the underlying Git operations and Tabular Editor CLI invocations are identical either way. + +### Overview of the pipeline + +A full run consists of several jobs, each with explicit dependencies on the ones before it: + +```mermaid +flowchart TD + merge["Octopus merge"] --> dl["Download Tabular Editor"] + dl --> bpa["BPA verification"] + dl --> schema["Schema validation"] + bpa --> deploy["Model deployment"] + schema --> deploy +``` + +Running each stage as its own job — rather than one long script — gives you independent pass/fail signal for each concern (merge conflicts vs. BPA violations vs. schema drift vs. deployment failures), which makes it much faster to diagnose what actually went wrong when a run fails. + +##### Note — pipeline agent requirements + +Since `TabularEditor.exe` only runs on Windows, every job that invokes it needs a Windows-based agent/runner — this includes the BPA verification, schema validation, and model deployment jobs. A cloud-hosted Windows agent works fine as long as it can reach your test workspace and data source over the network; a self-hosted agent is only necessary if those endpoints aren't reachable from outside your network (an on-premises data source, for example). The Octopus merge job itself has no such constraint, since it only needs Git. + +### Triggering the pipeline + +The pipeline isn't triggered by a normal Git push trigger. Since it needs to merge *all* currently open pull requests — not just the one that changed — it's typically set up with no automatic branch trigger, and instead invoked in one of two ways: + +- **From a pull request pipeline or branch policy**, so it runs whenever a pull request targeting `main` is created, or whenever a new commit is pushed to any branch with an open pull request. +- **On a schedule** (for example, every few minutes), as a simpler alternative if your CI/CD platform makes "run on any open PR's branch update" awkward to configure directly. + +Either approach achieves the same effect: any push to any open pull request causes the combined test environment to be rebuilt. + +### Job 1: Octopus merge + +This job is responsible for discovering all currently open pull requests, merging them together, and publishing the result to a disposable branch. + +**What it does, step by step:** + +1. **Authenticate and query pull requests.** The job calls the source-control platform's REST API for open pull requests targeting `main`, authenticated with a token that has permission to list pull requests (including drafts — the filtering happens next, not at the API level). +2. **Filter to non-draft pull requests.** Draft pull requests are excluded — this gives developers a way to push work-in-progress commits without pulling them into the shared test build. Only when a PR is marked ready for review does it join the merge. +3. **Clone the repository fresh.** Rather than reusing a previous checkout, the job clones the repository from scratch on every run, authenticating with the pipeline's own access token. This guarantees the merge always starts from a clean, known state. +4. **Delete and recreate the disposable branch.** Both the remote and local copies of the disposable output branch (for example, `octopus/temp`) are force-deleted if they exist, then recreated fresh from `main`. The branch is never fast-forwarded or reused between runs — it's always rebuilt from scratch. +5. **Merge all qualifying pull request branches in a single command.** Passing more than two branches to `git merge` invokes Git's native octopus merge strategy automatically — this is the point where the pattern uses the underlying Git mechanism described above. +6. **Push the result**, if the merge succeeded. + +**Azure Pipelines**, calling the Azure DevOps REST API: + +```yaml +- task: PowerShell@2 + displayName: Git octopus merge + inputs: + targetType: 'inline' + script: | + $prs = Invoke-RestMethod -Uri "https://dev.azure.com/$(Org)/$(Project)/_apis/git/repositories/$(Repo)/pullrequests?api-version=7.0" ` + -Headers @{ Authorization = "Bearer $(System.AccessToken)" } + $branches = $prs.value | Where-Object { $_.isDraft -eq $false -and $_.targetRefName -eq "refs/heads/main" } | + ForEach-Object { $_.sourceRefName -replace 'refs/heads', 'origin' } + + git clone $(Build.Repository.Uri) repo --quiet + cd repo + git checkout main --quiet + git push origin --delete octopus/temp --quiet 2>$null + git checkout -b octopus/temp --quiet + if ($branches.Count -gt 0) { + git merge --quiet $branches + } + git push --set-upstream origin octopus/temp --quiet +``` + +**GitHub Actions**, calling the GitHub REST API via the `gh` CLI: + +```yaml +- name: Git octopus merge + env: + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} + run: | + branches=$(gh pr list --base main --state open --json isDraft,headRefName \ + --jq '.[] | select(.isDraft == false) | .headRefName') + + git clone "$GITHUB_SERVER_URL/$GITHUB_REPOSITORY" repo --quiet + cd repo + git checkout main --quiet + git push origin --delete octopus/temp --quiet || true + git checkout -b octopus/temp --quiet + if [ -n "$branches" ]; then + git merge --quiet $(echo "$branches" | sed 's/^/origin\//') + fi + git push --set-upstream origin octopus/temp --quiet +``` + +Both versions do the same thing: list open, non-draft PRs targeting `main`, resolve them to branch references, and merge them into a freshly recreated `octopus/temp` branch. + +**Handling a failed merge:** + +If the merge fails — most likely due to a conflict between two or more of the open pull requests — don't just log an error and stop. A well-behaved implementation should reset the working directory and push an **empty placeholder commit** to the disposable branch before failing the pipeline run: + +``` +git reset --hard --quiet +git checkout main --quiet +git branch -D octopus/temp --quiet +git checkout -b octopus/temp --quiet +git commit --allow-empty -m "init" --quiet +git push origin octopus/temp --quiet +``` + +This matters because downstream jobs (BPA verification, schema validation, deployment) may depend on the disposable branch existing in *some* well-defined state. Without this step, a failed merge could leave the branch missing or half-merged, causing confusing secondary failures in later jobs rather than a single clear error at the merge step. + +##### Note — diagnosing which branch caused the conflict + +A straightforward implementation of this pattern does not automatically identify which pull request caused a merge conflict — it only reports that the merge failed. This is a real limitation compared to the archived `lesfurets/git-octopus` tool, which included tooling to iterate through branches one-by-one to isolate the culprit. In practice, most teams resolve this manually: temporarily unpublish (convert back to draft, or close) the pull requests you suspect, and re-run the pipeline until the merge succeeds again, to narrow down which branch was responsible. If this trial-and-error process becomes a bottleneck for your team, it's worth building an automated one-by-one bisection step into your own pipeline. + +### Job 2: Download Tabular Editor + +Since the jobs that follow need to invoke the Tabular Editor CLI, and build agents can't be assumed to have it pre-installed, a separate job downloads a portable copy of Tabular Editor at the start of every run: + +- Fetches the latest release directly (for example, from Tabular Editor's GitHub releases). +- Unzips it and discards the downloaded archive. +- Makes the extracted `TabularEditor.exe` available to subsequent jobs on the same agent/runner. + +Downloading the latest version fresh on every run keeps the pipeline current automatically, without needing to track and update a pinned version number — though if your team wants deterministic, reproducible builds, pinning to a specific release and updating it deliberately is worth considering as an alternative. + +### Job 3: BPA verification + +This job runs Tabular Editor's [Best Practices Analyzer](xref:best-practice-analyzer) against every semantic model produced by the merge, validating it against your team's central quality rules. + +If your repository contains more than one semantic model — common for BI teams serving multiple business areas — each model typically lives in its own subfolder, and the job loops over each one: + +``` +TabularEditor.exe "" -A "" -V +``` + +- `-A` points Tabular Editor at the BPA rules file to check against. +- `-V` verifies the model, reporting the result. + +> [!NOTE] +> Decide up front whether a BPA violation should **fail** the pipeline or only **warn**. It's tempting to start with warnings while your rule set is still being tuned, but if that's left in place long-term, violations can silently accumulate without ever blocking a deployment. Treat a warn-only BPA step as a temporary state to graduate out of, not a permanent configuration. + +### Job 4: Schema validation + +This job compares each model's expected schema against its real, live data source — catching a renamed or missing column, for example, before it causes a broken refresh in the test environment. + +``` +TabularEditor.exe "" -S ".cs" -SC -V -W +``` + +- `-S` runs a C# script that sets the model's data source connection string — typically reading it from a pipeline environment variable or secret, so the real connection details never need to be committed to source control. +- `-SC` performs the schema check itself, comparing the model's metadata against the live source. +- `-V -W` verify the result and control how warnings are handled. + +If your models depend on database objects that are themselves deployed as part of your pipeline — for example, SQL views published from source control — make sure that deployment step runs *before* schema validation, so the check runs against the exact objects the model will see once everything is deployed to the test environment. This ordering dependency is easy to miss if the two jobs are written independently of each other. + +> [!NOTE] +> The specific mechanism for deploying upstream data objects (SQL views, other database artifacts) is going to be specific to your organization's data platform, and isn't part of the Octopus Merge pattern itself. What matters for this pattern is only that schema validation happens after your data source is in its expected state for the test environment — whatever populates that state is up to you. + +### Job 5: Model deployment + +Once BPA verification and schema validation have both succeeded, this job deploys the merged model to the shared test workspace, using Tabular Editor's Save to Folder (`database.json`) format deployed directly over the XMLA endpoint: + +``` +TabularEditor.exe "\database.json" -D "Provider=MSOLAP;Data Source=;User ID=app:@;Password=;LocaleIdentifier=1033" "" -O -P -R -W -V -E +``` + +A few things worth calling out: + +- Authentication is via a **service principal** (an Azure AD app registration), not a user account — appropriate for an unattended pipeline, and avoiding the need to keep a real user's credentials in your pipeline secrets. +- The model name passed to Tabular Editor typically matches the folder name, so that a repository containing multiple models deploys each one to a correspondingly named dataset. +- The `-O -P -R -W -V -E` flags cover overwrite, processing, roles, warnings, verification, and error handling — see the [Tabular Editor CLI reference](xref:command-line-options) for the full flag list if you need to adjust any of these for your own setup. + +> [!NOTE] +> Business reviewers signing off in the shared test environment are validating a report, not a raw XMLA connection — in practice, something still needs to deploy and bind Power BI reports to the freshly deployed test model (and, optionally, update any published Power BI Apps) before that sign-off can happen. Whether every report is redeployed on every run, or only the ones affected by the current changes, is the kind of decision that varies enough by organization to be out of scope here — see [Power BI CI/CD with Azure DevOps and Tabular Editor](xref:powerbi-cicd) for that part of the pipeline. + +### Full workflow diagram + +```mermaid +flowchart TB + subgraph dev["Developer"] + direction LR + d1["Create feature branch"] --> d2["Local commits"] --> d3["Open PR
(ready for review)"] + end + + subgraph ci["CI Pipeline"] + direction LR + c1["Octopus merge"] -->|"success"| c2["Download Tabular Editor"] + c1 -.->|"conflict"| c1f["Empty placeholder commit"] + c2 --> c3["BPA verification"] + c2 --> c4["Schema validation"] + c3 --> c5["Model deployment"] + c4 --> c5 + end + + subgraph review["Review"] + direction LR + r1["Technical sign-off"] --> r2["Business sign-off"] + end + + subgraph release["Release"] + direction LR + rel1["Merge to main"] --> rel2["Deploy to UAT / production"] + end + + d3 --> c1 + c5 --> r1 + r2 --> rel1 + c1f -.->|"developer resolves conflict, pushes fix"| d2 + r1 -.->|"changes requested, developer pushes fix"| d2 +``` + +## Key principles + +- `main` is always in a deployable state; feature branches are short-lived and independent. +- The disposable branch is deleted and recreated from `main` on every run — never fast-forwarded or reused. +- A failed merge should leave the disposable branch in a well-defined (even if empty) state, not a missing or half-merged one. +- Each validation stage (BPA, schema) should be a distinct pipeline job with its own pass/fail signal, not folded into one script. +- Organization-specific steps (like a SQL views deployment) should be clearly separated from the generic pattern, both in your pipeline code and in how you document it internally — so the pattern remains portable if you need to apply it to a different project. + +## Next steps + +- [Enabling parallel development using Git and Save to Folder](xref:parallel-development) — the branching strategy this pipeline supports. +- [CI/CD Integration](xref:te-cli-cicd) — the new Tabular Editor CLI's CI/CD patterns, currently in Limited Public Preview. +- @powerbi-cicd +- @as-cicd diff --git a/content/getting-started/parallel-development.md b/content/getting-started/parallel-development.md index 452cc79d..eefa9de5 100644 --- a/content/getting-started/parallel-development.md +++ b/content/getting-started/parallel-development.md @@ -2,7 +2,7 @@ uid: parallel-development title: Enabling parallel development using Git and Save to Folder author: Daniel Otykier -updated: 2026-06-11 +updated: 2026-07-03 applies_to: products: - product: Tabular Editor 2 @@ -125,66 +125,128 @@ What follows is a discussion of branching strategies to employ when developing t The branching strategy will dictate what the daily development workflow will be like, and in many cases, branches will tie directly into the project methods used by your team. For example, using the [agile process within Azure DevOps](https://docs.microsoft.com/en-us/azure/devops/boards/work-items/guidance/agile-process-workflow?view=azure-devops), your backlog would consist of **Epics**, **Features**, **User Stories**, **Tasks** and **Bugs**. -In the agile terminology, a **User Story** is a deliverable, testable piece of work. The User Story may consist of several **Tasks**, that are smaller pieces of work that need to be performed, typically by a developer, before the User Story may be delivered. In the ideal world, all User Stories have been broken down into manageable tasks, each taking only a couple of hours to complete, adding up to no more than a handful of days for the entire User Story. This would make a User Story an ideal candidate for a so-called Topic Branch, where the developer could make one or more commits for each of the tasks within the User Story. Once all tasks are done, you want to deliver the User Story to the client, at which time the topic branch is merged into a delivery branch (for example, a "Test" branch), and the code deployed to a testing environment. +In the agile terminology, a **User Story** is a deliverable, testable piece of work. The User Story may consist of several **Tasks** — smaller pieces of work performed by a developer before the User Story can be delivered. In an ideal world, all User Stories are broken down into manageable tasks, each taking only a couple of hours to complete, adding up to no more than a handful of days for the entire User Story. This makes a User Story an ideal candidate for a short-lived feature branch, where the developer makes one or more commits per task before the branch is merged and the code deployed for testing. -Determining a suitable branching strategy depends on many different factors. In general, Microsoft recommends the [Trunk-based Development](https://docs.microsoft.com/en-us/azure/devops/repos/git/git-branching-guidance?view=azure-devops) ([video](https://youtu.be/t_4lLR6F_yk?t=232)) strategy, for agile and continuous delivery of small increments. The main idea is to create branches off the "Main" branch for every new feature or bugfix (see image below). Code review processes are enforced through pull requests from feature branches into Main, and using the Branch Policy feature of Azure DevOps, we can set up rules that require code to build cleanly before a pull request can be completed. +Determining a suitable branching strategy depends on many different factors: team size, release cadence, regulatory constraints, how many semantic models you maintain, and how mature your CI/CD setup already is. This article presents three strategies: -![Trunk Based Development](~/content/assets/images/trunk-based-development.png) +- **[GitHub Flow + Octopus Merge](#github-flow--octopus-merge)** — our recommended approach for most semantic model teams, and the primary focus of this article. +- **[GitFlow](#gitflow-branching-and-deployment-environments)** — a valid alternative, particularly suited to teams with formal, infrequent release cycles or regulatory sign-off requirements. +- **[Plain trunk-based development](#trunk-based-development)** — the simplest approach, worth understanding as a baseline even if most BI teams will want the additional structure GitHub Flow provides. + +> [!NOTE] +> Tabular Editor is agnostic to branching strategy. Save to Folder and Workspace Mode work identically regardless of which of the strategies below you choose — the recommendation in this article is based on patterns we've seen succeed across enterprise engagements, not a constraint imposed by the tool. + +## GitHub Flow + Octopus Merge + +For teams building semantic models with Tabular Editor and Power BI, we recommend **[GitHub Flow](https://docs.github.com/en/get-started/using-github/github-flow)** combined with an **Octopus Merge** pattern for continuous integration testing. + +GitHub Flow is a lightweight branching model with a single hard rule: **`main` is always deployable.** All work happens on a short-lived feature branch created off `main`; nobody commits directly to `main`; branches are merged back via pull request after review and automated checks pass. Unlike GitFlow, there's no `develop` branch and no separate branch per environment — environment promotion (dev → test → UAT → production) is handled by the deployment pipeline, not by long-lived branches. -### Trunk-based Development +```mermaid +gitGraph + commit id: "initial" + branch "feature/add-tax-calculation" + commit id: "add measure" + commit id: "add column" + checkout main + merge "feature/add-tax-calculation" id: "PR merged: tax calculation" + branch "feature/fix-rls" + commit id: "fix role" + checkout main + merge "feature/fix-rls" id: "PR merged: fix RLS" + branch "feature/new-report-page" + commit id: "wip" + checkout main + commit id: "hotfix" + merge "feature/new-report-page" id: "PR merged: new report page" +``` -However, such a strategy might not be feasible in a Business Intelligence development teams, for a number of reasons: +`main` stays on a single line and is always deployable; short feature branches fork off it and merge straight back via pull request. Contrast this with the GitFlow diagram further down the page, which has five parallel, long-lived lines. -- New features often require prolonged testing and validation by business users, which may take several weeks to complete. As such, you will likely need a user-facing test environment. -- BI solutions are multi-tiered, typically consisting of a Data Warehouse tier with ETL, a Master Data Management tier, a semantic layer and reports. Dependencies exist between these layers, that further complicate testing and deployment. -- The BI team may be responsible for developing and maintaining several different semantic models, serving different areas of business (Sales, Inventory, Logistics, Finance, HR, etc.), at different maturity stages and at varying development pace. -- The most important aspect of a BI solution is the data! As a BI developer, you do not have the luxury of simply checking out the code from source control, hitting F5 and having a full solution up and running in the few minutes it takes to compile the code. Your solution needs data, and that data has to be loaded, ETL'ed or processed across several layers to make it to the end user. Including data in your DevOps workflows could blow up build and deployment times from minutes to hours or even days. In some scenarios, it might not even be possible, due to resource or economy constraints. +On its own, GitHub Flow doesn't answer a question specific to BI teams: what does your shared test environment reflect at any given moment, when several developers each have an open pull request? **Octopus Merge** answers this: a CI pipeline continuously merges every currently open pull request into a disposable branch and deploys the result to a shared test environment — so business users always validate the combination of everything in progress, not just one feature in isolation. See [GitHub Flow and the Octopus Merge pattern](xref:github-flow) for how the pattern works and how to build it. -There is no doubt that a BI team would benefit from a branching strategy that supports parallel development on any of the layers in the full BI solution, in a way that lets them mix and match features that are ready for testing. But especially due to the last bullet point above, we need to think carefully about how we are going to handle the data. If we add a new attribute to a dimension, for example, do we want to automatically load the dimension as part of our build and deployment pipelines? If it only takes a few minutes to load such a dimension, that would probably be fine, but what if we are adding a new column to a multi-billion row fact table? And if developers are working on new features in parallel, should each developer have their own development database, or how do we otherwise prevent them from stepping on each others toes in a shared database? +A few reasons this combination fits semantic model development particularly well: -There is no easy answer to the questions above - especially when considering all the tiers of a BI solution, and the different constellations and preferred workflows of BI teams across the planet. Also, when we dive into actual build, deployment and test automation, we are going to focus mostly on Analysis Services. The ETL- and database tiers have their own challenges from a DevOps perspective, which are outside the scope of this article. But before we move on, let us take a look at another branching strategy, and how it could potentially be adopted to BI workflows. +- **Simpler mental model.** Two branch concepts instead of GitFlow's five means less onboarding overhead, particularly on teams that include report authors and business analysts alongside model developers. +- **`main` is always deployable.** If you need to ship an urgent fix — a broken measure, a security-related RLS change — you don't need to reason about which of several long-lived branches currently reflects production. +- **Environment promotion lives in the pipeline, not the branch structure.** Adding a new environment is a pipeline change, not a new permanent branch every developer has to remember to merge into. +- **Short-lived branches reduce merge conflicts** — important for Octopus Merge, since it merges every open branch together for integration testing. The shorter each branch lives, the smaller the surface area for conflicts. +- **Better fit for continuous delivery of data products** than GitFlow's versioned release-train model, since semantic models tend to evolve incrementally rather than ship in discrete releases. -### GitFlow branching and deployment environments +None of this means GitFlow is wrong — see [GitFlow branching and deployment environments](#gitflow-branching-and-deployment-environments) below for when it's still a good fit. + +### Key principles + +- `main` is always in a deployable state. +- Feature branches are short-lived and independent. +- The test environment always reflects the combination of everything currently in progress — not just one feature in isolation. See [GitHub Flow and the Octopus Merge pattern](xref:github-flow) for how. +- Fabric Git integration should **not** be enabled on any workspace used for Tabular Editor workspace databases — Tabular Editor writes to workspace databases directly through the XMLA endpoint, and those writes have no relationship to your Git branches. This is also called out in the [Workspace Mode documentation](xref:workspace-mode). + +## GitFlow branching and deployment environments + +GitFlow remains a solid choice for teams with a genuine need for the structure it provides — for example, formal versioned releases, regulatory sign-off gates tied to specific branches, or infrequent (e.g. monthly or quarterly) release cycles where a persistent `develop` branch and release branches map naturally onto your process. If that describes your team, the approach below is well worth using. The strategy described below is based on [GitFlow by Vincent Driessen](https://nvie.com/posts/a-successful-git-branching-model/). ![Gitflow](~/content/assets/images/gitflow.png) -Implementing a branching strategy similar to this, can help solve some of the DevOps problems typically encountered by BI teams, provided you put some thought into how the branches correlate to your deployment environments. In an ideal world, you would need at least 4 different environments to fully support GitFlow: +Implementing a branching strategy similar to this can help solve some of the DevOps problems typically encountered by BI teams, provided you put some thought into how the branches correlate to your deployment environments. In an ideal world, you would need at least 4 different environments to fully support GitFlow: - The **production** environment, which should always contain the code at the HEAD of the master branch. - A **canary** environment, which should always contain the code at the HEAD of the develop branch. This is where you typically schedule nightly deployments and run your integration testing, to make sure that the features going into the next release to production play nicely together. - One or more **UAT** environments where you and your business users test and validate new features. Deployment happens directly from the feature branch containing the code that needs to be tested. You will need multiple test environments if you want to test multiple new features in parallel. With some coordination effort, a single test environment is usually enough, as long as you carefully consider the dependencies between your BI tiers. - One or more **sandbox** environments where you and your team can develop new features, without impacting any of the environments above. As with the test environment, it is usually enough to have a single, shared, sandbox environment. -We must emphasize that there is really no "one-size-fits-all" solution to these considerations. Maybe you are not building your solution in the Cloud, and therefore do not have the scalability or flexibility to spin up new resources in seconds or minutes. Or maybe your data volumes are very large, making it impractical to replicate environments due to resource/economy/time constraints. Before moving on, also make sure to ask yourself the question of whether you truly need to support parallel development and testing. This is rarely the case for small teams with only a few stakeholders, in which case you can still benefit from CI/CD, but where GitFlow branching might be overkill. +We must emphasize that there is really no "one-size-fits-all" solution to these considerations. Maybe you are not building your solution in the Cloud, and therefore do not have the scalability or flexibility to spin up new resources in seconds or minutes. Or maybe your data volumes are very large, making it impractical to replicate environments due to resource/economy/time constraints. + +Even if you do need to support parallel development, you may find that multiple developers can easily share the same development or sandbox environment, without encountering too much trouble. Specifically for tabular models, though, we recommend that developers still use individual [workspace databases](xref:workspace-mode) to avoid "stepping over each others toes." -Even if you do need to support parallel development, you may find that multiple developers can easily share the same development or sandbox environment, without encountering too much trouble. Specifically for tabular models, though, we recommend that developers still use individual [workspace databases](xref:workspace-mode) to avoid "stepping over each others toes". +> [!NOTE] +> If you're evaluating GitFlow primarily because you need a shared, always-current test environment reflecting in-progress work, consider whether [GitHub Flow + Octopus Merge](#github-flow--octopus-merge) might achieve the same outcome with less branch-management overhead. GitFlow's `develop`/canary branch and Octopus Merge's disposable test branch solve a similar problem in different ways. + +## Trunk-based development + +Trunk-based development is the simplest possible branching model: developers commit small, frequent changes either directly to `main`, or via very short-lived feature branches that are merged back within hours. Microsoft recommends [trunk-based development](https://docs.microsoft.com/en-us/azure/devops/repos/git/git-branching-guidance?view=azure-devops) ([video](https://youtu.be/t_4lLR6F_yk?t=232)) generally for agile, continuous delivery of small increments. + +![Trunk Based Development](~/content/assets/images/trunk-based-development.png) + +In its purest form, trunk-based development can run into real friction for BI teams: + +- New features often require prolonged testing and validation by business users, which may take several weeks — so you need somewhere for in-progress work to be validated that isn't `main` itself. +- BI solutions are multi-tiered (Data Warehouse/ETL, Master Data Management, semantic layer, reports), with dependencies between layers that complicate testing and deployment. +- A BI team may maintain several semantic models at different maturity stages and paces. +- Data — not just code — has to be loaded, ETL'd, and processed to make a change testable. Including full data refreshes in every build could blow up pipeline runtimes from minutes to hours, and isn't always feasible at all for very large fact tables. + +**GitHub Flow + Octopus Merge, described above, is best understood as a refinement of trunk-based development that directly addresses these concerns** — rather than a departure from it. It keeps trunk-based development's core simplicity (one long-lived branch, short-lived feature branches, no release trains) while adding exactly the missing piece BI teams need: a shared test environment, populated by the pipeline rather than by a long-lived branch, that always reflects the current combined state of in-progress work. If you're choosing between the three strategies on this page, GitHub Flow + Octopus Merge is generally where we'd point a team that likes the simplicity of trunk-based development but has run into the limitations above. ## Common workflow -Assuming you already have a git repository set up and aligned to your branching strategy, adding your tabular model "source code" to the repository is simply a matter of using Tabular Editor to save the metadata to a new branch in a local repository. Then, you stage and commit the new files, push your branch to the remote repository and create a pull request to get your branch merged into the main branch. +Assuming you already have a git repository set up and aligned to your branching strategy, adding your tabular model "source code" to the repository is simply a matter of using Tabular Editor to save the metadata to a new branch in a local repository. Then, you stage and commit the new files, push your branch to the remote repository, and create a pull request to get your branch merged into the main branch. + +The exact commands are the same regardless of which strategy above you choose — what differs is what happens *after* the pull request is opened (see [GitHub Flow and the Octopus Merge pattern](xref:github-flow) for the GitHub Flow case, or your release/canary process for GitFlow). In general, the workflow looks like this: + +1. Before starting work on a new feature, create a new feature branch in git: -The exact workflow depends on your branching strategy and how your git repositories have been set up. In general, the workflow would look something like this: +```cmd +git checkout main +git pull +git checkout -b feature/add-tax-calculation +``` -1. Before starting work on a new feature, create a new feature branch in git. In a trunk-based development scenario, you would need the following git commands to checkout the main branch, get the latest version of the code, and create the feature branch from there: - ```cmd - git checkout main - git pull - git checkout -b "feature\AddTaxCalculation" - ``` 2. Open your model metadata from the local git repository in Tabular Editor. Ideally, use a [workspace database](xref:workspace-mode), to make it easier to test and debug DAX code. 3. Make the necessary changes to your model using Tabular Editor. Continuously save the changes (CTRL+S). Regularly commit code changes to git after you save, to avoid losing work and to keep a full history of all changes that were made: - ```cmd - git add . - git commit -m "Description of what was changed and why since last commit" - git push - ``` + +```cmd +git add . +git commit -m "Description of what was changed and why since last commit" +git push +``` + 4. If you are not using a workspace database, use Tabular Editor's **Model > Deploy...** option to deploy to a sandbox/development environment, in order to test the changes made to the model metadata. -6. When done, and all code has been committed and pushed to the remote repository, you submit a pull request in order to get your code integrated with the main branch. If a merge conflict is encountered, you will have to resolve it locally, using for example the Visual Studio Team Explorer or by simply opening the .json files in a text editor to resolve the conflicts (git inserts conflict markers to indicate which part of the code has conflicts). -7. Once all conflicts are resolved, there may be a process of code review, automated build/test execution based on branch policies, etc. to get the pull request completed. This, however, depends on your branching strategy and overall setup. +5. When done, and all code has been committed and pushed to the remote repository, you submit a pull request in order to get your code integrated with the main branch. If a merge conflict is encountered, you will have to resolve it locally, using for example the Visual Studio Team Explorer or by simply opening the .json files in a text editor to resolve the conflicts (git inserts conflict markers to indicate which part of the code has conflicts). +6. Once all conflicts are resolved, there may be a process of code review and automated build/test execution — including, if you're using the GitHub Flow approach above, the Octopus Merge test deployment — before the pull request can be completed. -We present more details about how to configure git branch policies, set up automated build and deployment pipelines, etc. using Azure DevOps in the following articles. Similar techniques can be used in other automated build and git hosting environments, such as TeamCity, GitHub, etc. +We present more details about how to configure git branch policies, set up automated build and deployment pipelines, etc. using Azure DevOps and GitHub Actions in the following articles. Similar techniques can be used in other automated build and git hosting environments, such as TeamCity, GitLab, etc. ## Next steps