Microsoft Azure
AzureAILanguageClient
Basic wrapper around the Azure TextAnalyticsClient.
__init__
__init__(api_key=os.getenv('AZURE_AI_LANGUAGE_KEY'), azure_endpoint=os.getenv('AZURE_AI_LANGUAGE_ENDPOINT'))
Initialize client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str | None
|
Credential for using the service. Defaults to environment variable |
getenv('AZURE_AI_LANGUAGE_KEY')
|
azure_endpoint
|
str | None
|
URL AI Language resource, for example https://your-resource.cognitiveservices.azure.com/. Defaults to environment variable |
getenv('AZURE_AI_LANGUAGE_ENDPOINT')
|
AzureBlobStorageClient
Wrapper around the Azure BlobServiceClient that implements additional specialist methods.
https://learn.microsoft.com/en-us/python/api/overview/azure/storage-blob-readme?view=azure-python
Attributes:
| Name | Type | Description |
|---|---|---|
organization |
str
|
Organization containing the workspace session to connect to. Must match an Azure Blob Storage container name. |
workspace |
str
|
The workspace containing the session. |
session_id |
str
|
The id of the session. |
directory |
str
|
Default directory where new blobs will be created. Concatenation of workspace and session_id. |
connection_string |
str
|
Credential for connecting to the Azure Blob Storage resource. |
url_prefix |
str
|
The root URL to the Azure Blob Storage resource. |
client |
BlobServiceClient
|
Blob service client. Use to access all sub methods. |
__init__
__init__(organization, workspace, session_id, connection_string=os.environ['BLOB_CONNECTION_STRING'], url_prefix=os.environ['BLOB_URL_PREFIX'])
Initialize client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
organization
|
str
|
Organization containing the workspace session to connect to. Must match an Azure Blob Storage container name. |
required |
workspace
|
str
|
The workspace containing the session. Must match a folder in the parent container. |
required |
session_id
|
str
|
The id of the session. Must match a folder in the parent workspace folder in the organization container. |
required |
connection_string
|
str
|
Credential for connecting to the Azure Blob Storage resource. Defaults to environment variable |
environ['BLOB_CONNECTION_STRING']
|
url_prefix
|
str
|
The root URL to the Azure Blob Storage resource. Defaults to environment variable |
environ['BLOB_URL_PREFIX']
|
download_blob_json
download_blob_json(blob_name)
Used to download and access JSON files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
The path to the blob in the container. |
required |
Returns:
| Type | Description |
|---|---|
Any
|
The JSON content of the blob, likely a list or dictionary. |
get_blob_client
get_blob_client(blob_name)
Return a client for interacting with blob objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
Name of the blob. |
required |
Returns:
| Type | Description |
|---|---|
BlobClient
|
Blob client. |
get_signed_url
get_signed_url(blob_name, minutes=720)
Generate a signed URL for an Azure Blob Storage object, valid for specified duration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
Name of the blob. |
required |
minutes
|
int
|
Time duration for validity of URL. |
720
|
Returns:
| Type | Description |
|---|---|
str
|
The signed URL. |
list_blobs_in_directory
list_blobs_in_directory(ignore_files=None)
Lists all blobs in the client directory within an Azure Blob container.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ignore_files
|
list[str] | None
|
Blob names containing any string in this list will not be included in the outputted list. |
None
|
Returns:
| Type | Description |
|---|---|
list[BlobProperties]
|
A list of blobs in the client directory. |
list_blobs_with_prefix
list_blobs_with_prefix(prefix, ignore_files=None)
Lists all blobs in the Azure Blob container with the supplied prefix.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prefix
|
str
|
The prefix to search for |
required |
ignore_files
|
list[str] | None
|
Blob names containing any string in this list will not be included in the outputted list |
None
|
Returns:
| Type | Description |
|---|---|
list[BlobProperties]
|
A list of blobs with the matching prefix. |
upload_blob_json
upload_blob_json(blob_name, blob_content)
Used to upload JSON files to a blob in the container.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
blob_name
|
str
|
The path to the blob in the container. |
required |
blob_content
|
str
|
The JSON-formatted content to be uploaded. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Blob updated property dictionary. |
AzureKeyVaultClient
Wrapper around the Azure SecretClient.
Uses DefaultAzureCredential for credential, and therefore expects either a managed identity or an identity currently logged into Azure CLI.
https://learn.microsoft.com/en-us/python/api/overview/azure/key-vault?view=azure-python
__init__
__init__(vault_url=os.environ['KEY_VAULT_NAME'])
Initialize client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vault_url
|
str
|
URL of the Azure Key Vault resource e.g. https://your-resource.vault.azure.net/. Defaults to environment variable |
environ['KEY_VAULT_NAME']
|
get_secret
get_secret(secret_name)
Retrieve a secret from the Key Vault.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
secret_name
|
str
|
Name of the secret to retrieve |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Retrieved secret value, or None if not found |
set_secret
set_secret(secret_name, secret_value)
Set a secret in the Key Vault.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
secret_name
|
str
|
Name of the secret to set |
required |
secret_value
|
str
|
Value of the secret |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if operation successful |
AzureOpenAIClient
A wrapper around the AzureOpenAi class.
https://github.com/openai/openai-python?tab=readme-ov-file#microsoft-azure-openai
Attributes:
| Name | Type | Description |
|---|---|---|
api_key |
str
|
API key for Azure resource. If not provided will default to environment variable |
azure_endpoint |
str
|
Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable |
api_version |
str
|
API version for Azure resource. |
model |
str
|
Model deployment name within the Azure resource. If not provided will default to environment variable |
response_format |
dict[Any, Any] | None
|
The type of response to request from the client. For example for JSON: { "type": "json_object" }. |
client |
The |
__del__
__del__()
Destructor to ensure cleanup if close() wasn't called.
__enter__
__enter__()
Context manager entry point.
__exit__
__exit__(exc_type, exc_val, exc_tb)
Context manager exit point - ensures connections are cleaned up.
__init__
__init__(api_key=os.getenv('AZURE_OPENAI_API_KEY'), azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'), api_version=None, model=os.getenv('AZURE_OPENAI_DEPLOYMENT'), response_format=None, max_connections=50, max_keepalive_connections=20, timeout=600.0)
Initialize the client with connection pooling configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str | None
|
API key for Azure resource. If not provided will default to environment variable |
getenv('AZURE_OPENAI_API_KEY')
|
azure_endpoint
|
str | None
|
Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable |
getenv('AZURE_OPENAI_ENDPOINT')
|
api_version
|
str | None
|
API version for Azure resource. |
None
|
model
|
str | None
|
Model deployment name within the Azure resource. If not provided will default to environment variable |
getenv('AZURE_OPENAI_DEPLOYMENT')
|
response_format
|
dict[Any, Any] | None
|
The type of response to request from the client. For example for JSON: { "type": "json_object" }. |
None
|
max_connections
|
int
|
Maximum number of concurrent connections (default: 50). |
50
|
max_keepalive_connections
|
int
|
Maximum number of keepalive connections to maintain (default: 20). |
20
|
timeout
|
float
|
Read/write timeout in seconds (default: 600.0 / 10 minutes, matching OpenAI defaults). |
600.0
|
call_chat
call_chat(messages, max_retries=5, max_completion_tokens=None)
Call the chat completions API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict[str, Any]]
|
List of dictionary objects specifying the messages to send. Messages must adhere to the prompting standard. |
required |
max_retries
|
int
|
Number of times to call the API before raising an error |
5
|
max_completion_tokens
|
int | None
|
Maximum number of tokens to generate in the response. If None, uses API default. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Response from the chat API. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If attempts exceeds |
call_embedding
call_embedding(batch, max_retries=5)
Call the embeddings API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
list[str]
|
List of strings to embed |
required |
max_retries
|
int
|
Number of times to call the API before raising an error |
5
|
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
List of embeddings |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If attempts exceeds |
close
close()
Explicitly close the OpenAI client and release all connections.
Call this method when you're done using the client to ensure connections are properly cleaned up, especially in high-concurrency scenarios.
AzureTableLogHandler
Bases: Handler
Client to write workflow logs to an Azure Table Storage resource, adopting a log key and partition strategy for fast search/filtering across organizations, workspaces and sessions.
https://learn.microsoft.com/en-us/python/api/overview/azure/tables?view=azure-python
Attributes:
| Name | Type | Description |
|---|---|---|
table_name |
Name of the log table. |
|
host |
The hostname of the host running the process. |
|
start_time |
When the logger was initialized. Used to set workflow start time. |
|
parent |
Parent |
|
service_client |
Client for interacting with Azure Table Storage resource. |
|
table_client |
Client for interacting with table in the Azure Table Storage resource. |
__init__
__init__(parent, connection_string=os.environ.get('BLOB_CONNECTION_STRING'), table_name='WorkflowLogs', host=socket.gethostname())
Initialize the client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
parent
|
WorkflowRunner
|
Parent |
required |
connection_string
|
str | None
|
Credential for connecting to Azure Table Storage resource. Defaults to environment variable |
get('BLOB_CONNECTION_STRING')
|
table_name
|
str
|
Name of the log table. |
'WorkflowLogs'
|
host
|
str
|
The hostname of the host running the process. |
gethostname()
|
emit
emit(record)
Emits log messages to Table Storage, duplicating across partitions and indexing chronologicallys:
messages- General log messages
{organization}_{workspace}_messages- General log messages for a workspace
{organization}_{workspace}_{session_id}_messages- General log messages for a session
emit_metrics
emit_metrics(record)
Emits log messages to Table Storage, duplicating across partitions and indexing chronologicallys:
metrics- Metrics on document counts, processing time etc
{organization}_metrics- Metrics for an organization
{organization}_{workspace}_metrics- Metrics for a workspace
AzureVectorStorageClient
Wrapper around the Azure SearchClient with specialist methods for HARDR classification.
This client creates a single SearchClient instance that is reused across all calls, making it thread-safe and efficient for concurrent operations.
__del__
__del__()
Destructor to ensure cleanup if close() wasn't called.
__enter__
__enter__()
Context manager entry point.
__exit__
__exit__(exc_type, exc_val, exc_tb)
Context manager exit point - ensures connections are cleaned up.
__init__
__init__(endpoint=os.environ['AZURE_SEARCH_ENDPOINT'], index_name=os.environ['AZURE_SEARCH_INDEX_NAME'], connection_timeout=10.0, read_timeout=120.0, max_pool_size=50)
Initialize the client with a reusable SearchClient connection and connection pooling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endpoint
|
str
|
The URL for the Azure AI Search resource. Defaults to environment variable |
environ['AZURE_SEARCH_ENDPOINT']
|
index_name
|
str
|
The vector index to connect to. Defaults to environment variable |
environ['AZURE_SEARCH_INDEX_NAME']
|
connection_timeout
|
float
|
Connection timeout in seconds (default: 10.0). |
10.0
|
read_timeout
|
float
|
Read timeout in seconds (default: 120.0). |
120.0
|
max_pool_size
|
int
|
Maximum number of connections in the pool (default: 50). |
50
|
close
close()
Explicitly close the search client and transport, releasing all connections.
Call this method when you're done using the client to ensure connections are properly cleaned up, especially in high-concurrency scenarios.
neighbours_from_text
neighbours_from_text(text, filter, top=10, vector_fields=None, scoring_profile='default')
Retrieve the top 'n' nearest neighbours to an input text query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to search for |
required |
filter
|
str
|
ODATA filter query to limit the scope of the search. For example for a Uniclass index, to scope to the Materials table use 'subsystem eq 'Materials'. |
required |
top
|
int
|
The number of matches to return. |
10
|
vector_fields
|
list[str] | None
|
The vector fields to include in the search. Must be at least three vector fields. Each vector field is weighted differently in the search results: 1. 2.0 2. 0.5 3. 1.0 |
None
|
scoring_profile
|
str
|
The name of the vector search |
'default'
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
Dictionary of nearest neighbours. Items have the following fields:
|
semantic_search
semantic_search(text, filter, top=10, scoring_profile='default', semantic_configuration='default', vector_search=False, max_retries=3, initial_delay=1.0)
Retrieve the top 'n' semantic search matches with exponential backoff retry logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to search for |
required |
filter
|
str
|
ODATA filter query to limit the scope of the search. For example for a Uniclass index, to scope to the Materials table use 'subsystem eq 'Materials'. |
required |
top
|
int
|
The number of matches to return. |
10
|
scoring_profile
|
str
|
Profile for weighting search fields and applying boosting |
'default'
|
semantic_configuration
|
str
|
Describe the title, content, and keywords fields that will be used for semantic ranking, captions, highlights, and answers. |
'default'
|
vector_search
|
bool
|
Whether to include vector search in the query |
False
|
max_retries
|
int
|
Maximum number of retry attempts (default 3) |
3
|
initial_delay
|
float
|
Initial delay in seconds for exponential backoff (default 1.0) |
1.0
|
Returns: Dictionary of nearest neighbours. Items have the following fields:
- code (str): The ID or reference code for the item
- title (str): Plain-text descriptor for the item
- examples (str): Extended description of the item
- similarity (float): Similarity score