This article is intended for deploying Jar Files, XML Files, JSON Files, wheel files and Global Init Scripts in Databricks Workspace.
Overview:
Purpose of this Pipeline:
Pre-Requisites:
Continuous Integration (CI) pipeline:
CI- Pipeline YAML Code:
name: Release-$(rev:r)
trigger: none
variables:
workingDirectory: '$(System.DefaultWorkingDirectory)/Artifacts'
pythonVersion: 3.7
stages:
- stage: Build
displayName: Build stage
jobs:
- job: Build
displayName: Build
steps:
- task: UsePythonVersion@0
displayName: 'Use Python version'
inputs:
versionSpec: $(pythonVersion)
- task: CmdLine@2
displayName: 'Upgrade Pip'
inputs:
script: 'python -m pip install --upgrade pip'
- task: CmdLine@2
displayName: 'Install wheel'
inputs:
script: 'python -m pip install wheel'
- task: CmdLine@2
displayName: 'Build wheel'
inputs:
script: 'python setup.py sdist bdist_wheel'
workingDirectory: '$(workingDirectory)'
- task: CopyFiles@2
displayName: 'Copy Files to: $(build.artifactstagingdirectory)'
inputs:
SourceFolder: '$(workingDirectory)'
TargetFolder: ' $(build.artifactstagingdirectory)'
- task: PublishBuildArtifacts@1
displayName: 'Publish Artifact: DatabricksArtifacts'
inputs:
ArtifactName: DatabricksArtifacts
Continuous Deployment (CD) pipeline:
The CD pipeline uploads all the artifacts (Jar, Json Config, Whl file) built by the CI pipeline into the Databricks File System (DBFS). The CD pipeline will also update/upload any (.sh) files from the build artifact as Global Init Scripts for the Databricks Workspace.
It has the following Tasks:
Arguments:
Databricks PAT Token to access Databricks Workspace
Databricks Workspace URL
Pipeline Working Directory URL where the files((Jar, Json Config, Whl file) are present
3.Upload Global Init Scripts
Arguments:
Databricks PAT Token to access Databricks Workspace
Databricks Workspace URL
Pipeline Working Directory URL where the global init scripts are present
CD-YAML code:
name: Release-$(rev:r)
trigger: none
resources:
pipelines:
- pipeline: DatabricksArtifacts
source: DatabricksArtifacts-CI
trigger:
branches:
- main
variables:
- group: Sample-Variable-Group
- name: azureSubscription
value: 'Sample-Azure-Service-Connection'
- name: workingDirectory_utilities
value: '$(Pipeline.Workspace)/DatabricksArtifacts/DatabricksArtifacts'
stages:
- stage: Release
displayName: Release stage
jobs:
- deployment: DeployDatabricksArtifacts
displayName: Deploy Databricks Artifacts
strategy:
runOnce:
deploy:
steps:
- checkout: self
- task: AzureKeyVault@1
inputs:
azureSubscription: "$(azureSubscription)"
KeyVaultName: $(keyvault_name)
SecretsFilter: "databricks-pat,databricks-url"
RunAsPreJob: true
- task: AzurePowerShell@5
displayName: Upload Databricks Artifacts
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: 'FilePath'
ScriptPath: '$(System.DefaultWorkingDirectory)/Pipelines/Scripts/DatabricksArtifactsUpload.ps1'
ScriptArguments: '-databricksPat $(databricks-pat) -databricksUrl $(databricks-url) -workingDirectory $(workingDirectory_utilities)'
azurePowerShellVersion: 'LatestVersion'
- task: AzurePowerShell@5
displayName: Upload Global Init Scripts
inputs:
azureSubscription: '$(azureSubscription)'
ScriptType: 'FilePath'
ScriptPath: '$(System.DefaultWorkingDirectory)/Pipelines/Scripts/DatabricksGlobalInitScriptUpload.ps1'
ScriptArguments: '-databricksPat $(databricks-pat) -databricksUrl $(databricks-url) -workingDirectory $(workingDirectory_utilities)'
azurePowerShellVersion: 'LatestVersion'
DBFSUpload.ps1
param(
[String] [Parameter (Mandatory = $true)] $databricksPat,
[String] [Parameter (Mandatory = $true)] $databricksUrl,
[String] [Parameter (Mandatory = $true)] $workingDirectory
)
Function UploadFile {
param (
[String] [Parameter (Mandatory = $true)] $sourceFilePath,
[String] [Parameter (Mandatory = $true)] $fileName,
[String] [Parameter (Mandatory = $true)] $targetFilePath
)
#Grab bytes of source file
$BinaryContents = [System.IO.File]::ReadAllBytes($sourceFilePath);
$enc = [System.Text.Encoding]::GetEncoding("ISO-8859-1");
$fileEnc = $enc.GetString($BinaryContents);
#Create body of request
$LF = "`r`n";
$boundary = [System.Guid]::NewGuid().ToString();
$bodyLines = (
"--$boundary",
"Content-Disposition: form-data; name=`"path`"$LF",
$targetFilePath,
"--$boundary",
"Content-Disposition: form-data; name=`"contents`";filename=`"$fileName`"",
"Content-Type: application/octet-stream$LF",
$fileEnc,
"--$boundary",
"Content-Disposition: form-data; name=`"overwrite`"$LF",
"true",
"--$boundary--$LF"
) -join $LF;
#Create Request
$params = @{
Uri = "$databricksUrl/api/2.0/dbfs/put"
Body = $bodyLines
Method = 'Post'
Headers = @{
Authorization = "Bearer $databricksPat"
}
ContentType = "multipart/form-data; boundary=$boundary"
}
Invoke-RestMethod @params;
}
Function GetTargetFilePath {
param (
[System.IO.FileInfo] [Parameter (Mandatory = $true)] $sourceFile
)
switch ($sourceFile.extension)
{
".json" {return "/FileStore/config/$($sourceFile.Name)"}
".jar" {return "/FileStore/jar/$($sourceFile.Name)"}
".whl" {return "/FileStore/whl/$($sourceFile.Name)"}
}
}
#Loop through all files and upload to dbfs
$filenames = get-childitem $workingDirectory -recurse;
$filenames | ForEach-Object {
if ( $_.extension -eq ".json" -OR $_.extension -eq ".whl" -OR $_.extension -eq ".jar") {
$targetFilePath = GetTargetFilePath -sourceFile $_;
Write-Host "Uploading $($_.FullName) to dbfs at $targetFilePath.";
UploadFile -sourceFilePath $_.FullName -fileName $_.Name -targetFilePath $targetFilePath;
}
}
DatabricksGlobalInitScriptUpload.ps1
param(
[String] [Parameter (Mandatory = $true)] $databricksPat,
[String] [Parameter (Mandatory = $true)] $databricksUrl,
[String] [Parameter (Mandatory = $true)] $workingDirectory
)
Function UploadFile {
param (
[String] [Parameter (Mandatory = $true)] $uri,
[String] [Parameter (Mandatory = $true)] $restMethod,
[String] [Parameter (Mandatory = $true)] $sourceFilePath,
[String] [Parameter (Mandatory = $true)] $fileName
)
#Grab bytes of source file
$base64string = [Convert]::ToBase64String([IO.File]::ReadAllBytes($sourceFilePath))
#Create body of request
$body = @{
name = $fileName
script = $base64string
position = 1
enabled = "false"
}
#Create Request
$params = @{
Uri = $uri
Body = $body | ConvertTo-Json
Method = $restMethod
Headers = @{
Authorization = "Bearer $databricksPat"
}
ContentType = "application/json"
}
Invoke-RestMethod @params;
}
Function GetAllScripts {
#Create Request
$params = @{
Uri = "$databricksUrl/api/2.0/global-init-scripts"
Method = "GET"
Headers = @{
Authorization = "Bearer $databricksPat"
}
ContentType = "application/json"
}
return Invoke-RestMethod @params;
}
#Loop through all files and upload to databricks global init
$scripts = GetAllScripts
$filenames = get-childitem $workingDirectory -recurse;
$filenames | ForEach-Object {
if ( $_.extension -eq ".sh") {
#Check if file name already exists in databricks
$scriptId = ($scripts.scripts -match $_.Name).script_id
if (!$scriptId){
#Create Global init script
Write-Host "Uploading $($_.FullName) as a global init script with name $($_.Name) to databricks";
UploadFile -uri "$databricksUrl/api/2.0/global-init-scripts" -restMethod "POST" -sourceFilePath $_.FullName -fileName $_.Name;
} else{
#Update Global init script
Write-Host "Updating global init script with name $($_.Name) to databricks";
UploadFile -uri "$databricksUrl/api/2.0/global-init-scripts/$scriptId" -restMethod "PATCH" -sourceFilePath $_.FullName -fileName $_.Name;
}
}
}
Using this CI CD approach we were successfully able to upload the artifacts to the Databricks file system.
References:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.