Purpose of this Blog:
Create a feature branch based on the main branch and link a work item to it.
Login into your Azure Databricks Dev/Sandbox and click on user icon (top right) and open user settings.
Click on Git Integration Tab and make sure you have selected Azure Devops Services
There are two ways to check-in the code from Databricks UI (described below)
1.Using Revision History after opening Notebooks
2.Work with notebooks and folders in an Azure Databricks repo(Repos which is a recent development - 13th May)
Go to notebook you want to make changes and deploy to another environment.
Note: Developers need to make sure to maintain a shared/common folder for all the notebooks. You can make all required changes in your personal folder and then finally move these changes to the shared/common folder. The CI process will create the artifact from this shared/common folder.
Click on the Revision history on top right.
If it is a new notebook, you will be able to notice that git is not linked to the notebook or Git might be linked to older branch which might not exist.
Click on the ‘Git:Not Linked’(New Notebook) or 'Git Synced'(Already existing notebook).
We need to configure the Git Repository. (screenshot below)
https://dev.azure.com/<organisationname>/<ProjectName>/_git/<Repo>
src/Databricks/ITDataEngineerADBDev2/notebooks/<folder name>/<notebook.py>
Click on Save Notebook by adding a comment to integrate the code to the repository.
We need to create a PR after the changes are reflected in the feature branch.
After PR gets approved, the code now is merged into the main branch and CI-CD process will start from here.
Note: Linking individual notebooks has the following limitation
Introduction
Repos Check-in Process:
Click on Repos tab and right click on the folder you want to work and then select "Add Repos".
Fill in the Repo URL from Azure Devops and select the Git provider as "Azure Devops Services" and click on create.
The repo gets added with folder name as repo name(Data in the screenshot below) and a selection with all the branch names(branch symbol with feature/ and down arrow in screenshot below). Click on the down arrow beside the branch name.
After clicking on down arrow(previous screenshot), search and select your existing feature branch OR create a new feature branch (as shown in screenshot below).
All the folders in the branch are visible (refer the screenshot below)
Open the folder which contains the notebooks(refer the screenshot below). Create a new notebook and write code(Right click on the folder and select "create"---->"Notebook" like screenshot below) or edit an existing notebook in the folder.
After creating a new notebook or editing an existing notebook, click on top left hand of the notebook which contains feature branch name. Then, new window will appear which will show the changes. Add Summary(mandatory) and Description(optional), then click on "commit and push".
The CI pipeline builds the artifact by copying the notebooks from main branch to staging directory.
It has two tasks:
Copy Task - Copies from main branch to staging directory.
Publish Artifacts - publishes artifacts from $(build.stagingdirectory)
name: Release-$(rev:r)
trigger: none
variables:
workingDirectory: '$(System.DefaultWorkingDirectory)/<path>'
stages:
- stage: Build
displayName: Build stage
jobs:
- job: Build
displayName: Build
steps:
- task: CopyFiles@2
displayName: 'Copy Files to: $(build.artifactstagingdirectory)'
inputs:
SourceFolder: '$(workingDirectory)'
TargetFolder: ' $(build.artifactstagingdirectory)'
- task: PublishBuildArtifacts@1
displayName: 'Publish Artifact: notebooks'
inputs:
ArtifactName: dev_release
Deployment with secure hosted agent
a. Run the release pipeline for the specified target environment.
This will download the previously generated Build Artifacts. It will also download secure connection strings from Azure Key Vault. Make sure your self hosted agent is configured properly as per Self-Hosted Agent.Then it will deploy notebooks to your Target Azure Databricks Workspace.
The code below shows how to run your release agent on a specific self hosted agent: Take note of the pool
and demands
configuration.
stages:
- stage: Release
displayName: Release stage
jobs:
- deployment: DeployDatabricks
displayName: Deploy Databricks Notebooks
pool:
name: OTT-DEV-FanDataPool
demands:
- agent.name -equals <agent-name>
environment: <env-name>
We have two steps for the deployment:
Getting Key Vault Secrets PAT(Personal Access Token) and Target Data bricks Workspace URL from the Key Vault.
Importing the Notebooks to the target Databricks using import Rest API.(PowerShell Task with Inline Script)
We can perform multiple folder notebook deployments using this script.
We are creating a folder(Folder in Repository similar to Sandbox/Dev environment) if it does not exist in the target(dev/QA/UAT/Prod) data bricks workspace and then importing the notebooks into the folder.
YAML template
name: Release-$(rev:r)
trigger: none
resources:
pipelines:
- pipeline: notebooks
source: Databricks-CI
trigger:
branches:
- main
variables:
- name: azureSubscription
value: '<serviceConnectionName>'
- name: workingDirectory_shared
value: '$(Pipeline.Workspace)/<path>/'
stages:
- stage: Release
displayName: Release stage
jobs:
- deployment: DeployDatabricks
displayName: Deploy Databricks Notebooks
environment: DEV
strategy:
runOnce:
deploy:
steps:
- task: AzureKeyVault@1
inputs:
azureSubscription: '$(azureSubscription)'
KeyVaultName: $(dev_keyvault)
SecretsFilter: 'databricks-pat,databricks-url'
RunAsPreJob: true
- task: AzurePowerShell@5
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: 'InlineScript'
Inline: |
#Create secret and headers
$Secret = "Bearer " + "$(databricks-pat)";
$headers = @{
Authorization = $Secret
}
#Clean workspace/delete existing folders
$folderNames = Get-ChildItem $(setVars.notebooksDirectory) -dir
$folderNames | ForEach-Object {
#Create API Request to Delete Folder
$folderpath = "/" + $_.Name + "/";
$DeleteFolderBody = @{
path = $folderpath
recursive = $true
}
$DeleteFolderBodyText = $DeleteFolderBody | ConvertTo-Json
$FolderDeleteAPI = "https://" + "$(databricks-url)" + "/api/2.0/workspace/delete"
#Delete Folder
try {
$DeleteFolder = Invoke-RestMethod -Uri $FolderDeleteAPI -Method Post -Headers $headers -Body $DeleteFolderBodyText
}
catch [System.Net.WebException] {
Write-Host "Folder does not exist";
}
}
#Upload existing files/folders
$filenames = get-childitem $(setVars.notebooksDirectory) -recurse | where { $_.extension -eq ".py" };
$filenames | ForEach-Object {
#API Endpoints
$ImportNoteBookAPI = "https://" + "$(databricks-url)" + "/api/2.0/workspace/import";
$FolderCheckAPI = "https://" + "$(databricks-url)" + "/api/2.0/workspace/get-status";
$FolderCreateAPI = "https://" + "$(databricks-url)" + "/api/2.0/workspace/mkdirs"
#Open and Import the Notebook to Workspace
$BinaryContents = [System.IO.File]::ReadAllBytes($_.FullName);
$EncodedContents = [System.Convert]::ToBase64String($BinaryContents);
$notebookDirectory = "$(setVars.notebooksDirectory)".Replace('/','\');
$pathIndex=$_.FullName.IndexOf($notebookDirectory);
$notebookpath = "/" + $_.FullName.Substring($pathIndex+$notebookDirectory.Length).Replace('\','/');
$folderIndex = $notebookpath.IndexOf($_.Name);
$folderpath = $notebookpath.Substring(0,$folderIndex);
#API Body for Importing Notebooks
$ImportNoteBookBody = @{
content = "$EncodedContents"
language = "PYTHON"
overwrite = $true
format = "SOURCE"
path = $notebookpath
}
$ImportNoteBookBodyText = $ImportNoteBookBody | ConvertTo-Json
#API Body for Creating Folder
$CreateFolderBody = @{
path = $folderpath
}
$CreateFolderBodyText = $CreateFolderBody | ConvertTo-Json
$CheckPath = $FolderCheckAPI + "?path=" + $folderpath;
#Check if the folder exists, if not create folder
try {
$CheckFolder = Invoke-RestMethod -Uri $CheckPath -Method Get -Headers $headers;
}
catch [System.Net.WebException] {
Write-Host "Creating Folder $folderpath";
Invoke-RestMethod -Uri $FolderCreateAPI -Method Post -Headers $headers -Body $CreateFolderBodyText
}
#Importing a notebook to the Folder in target DataBricks workspace
Write-Host "Creating Notebook " + $notebookpath;
Invoke-RestMethod -Uri $ImportNoteBookAPI -Method Post -Headers $headers -Body $ImportNoteBookBodyText
}
azurePowerShellVersion: 'LatestVersion'
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.