Microsoft Azure
AzureBlobSession
Bases: Session
A concrete implementation of the Session class that constructs a framework for processing information containers stored in the original Hoppa Azure Blob Storage architecture.
Inherits attributes from its parent class.
Attributes:
Name | Type | Description |
---|---|---|
client |
AzureBlobStorageClient
|
Establishes a connection to an Azure Blob storage resource. |
ignore_files |
List[str]
|
List of string patterns / file names that hold special meaning in the Hoppa schema and should be ignored when interacting with source files. |
include_blobs |
List[str]
|
List of string patterns / file names that should be included when indexing document version. All other files will be ignored. |
headers |
Dict[str, Any]
|
HTTP request headers to include when reading or writing blobs. Defaults to 'block-blob' for "x-ms-blob-type" header. |
sharepoint_client |
SharePointClient
|
Estabishes a connection to Microsoft Graph APIs with delegated permissions to read and write SharePoint site, folder and file resources. |
autodesk_client |
AutodeskClient
|
Establishes a connection to Autodesk APIs with delegated permissions to read and write Autodesk Build / Docs / BIM 360 hub, project, file and folder resources. |
properties |
Dict[str, Any]
|
Metadata about the session. |
organization |
str
|
Organization identifier for the session |
workspace |
str
|
Workspace identifier within the organization |
session_id |
str
|
Unique session identifier |
directory |
str
|
Directory path for the session data. Automatically set as "{workspace}/{session_id}". |
user_id |
str
|
User identifier for authentication and logging |
workflow |
Dict | None
|
Parsed workflow configuration |
classifiers |
Dict
|
Document classifiers configuration |
attributes |
List
|
Document attributes configuration |
tags |
List[str]
|
Document tags configuration |
prompts |
Dict
|
Custom prompts for AI operations |
document_versions |
List[DocumentVersion]
|
Documents in the session |
initialized |
bool
|
Whether the session has been initialized (document versions have been indexed) |
__init__
__init__(organization, workspace, session_id, user_id, include_blobs=None)
Initialize the session by getting session properties, information standard and workflow from storage and parsing the contents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
organization
|
str
|
Organization containing the workspace session to connect to. Must match an Azure Blob Storage container name. |
required |
workspace
|
str
|
The workspace containing the session. Must match a folder in the parent container. |
required |
session_id
|
str
|
The id of the session. Must match a folder in the parent workspace folder in the organization container. |
required |
user_id
|
str
|
Optional. Required if the session contains documents from an external source (e.g. SharePoint). |
required |
include_blobs
|
List[str]
|
A list of blobs in the parent container. Provides a mechanism to further scope the session binding to fewer documents, avoiding long initialisation times. |
None
|
excel_from_classifiers
excel_from_classifiers()
Export classifiers to Excel workbook with:
- Summary sheet with all classifiers and their top-level properties
- Individual sheets for each classifier's picklist options
Returns:
Type | Description |
---|---|
BytesIO
|
Excel workbook as bytes that can be downloaded or sent via API. |
get_processing_status
get_processing_status()
Check if all files in the session have completed processing.
Returns:
Type | Description |
---|---|
str
|
|
parse_files
parse_files()
Get file list and format into a hierarchical list of DocumentVersions, metadata and results by parsing the blob directory and naming structure.
Returns:
Type | Description |
---|---|
List[DocumentVersion]
|
A list of DocumentVersions. |
parse_standard
parse_standard()
Get user standard and parse from json to python objects.
Returns:
Name | Type | Description |
---|---|---|
classifiers |
Dict
|
A dictionary of classification parts and picklists. |
attributes |
List
|
A list of attributes to search for. |
tags |
List[str]
|
A list of tags. |
prompts |
Dict
|
A dictionary of prompts for each analysis type (classify, search, tag). |
parse_workflow
parse_workflow()
Get workflow and parse from json to python object. Will first look for a workflow file in the session directory. If not found, will then look for a workflow in the workspace directory, followed by the organization directory and then the default Hoppa directory.
Returns:
Type | Description |
---|---|
List[Dict[str, Any]]
|
A workflow object of stages and nested steps. |
results_to_df
results_to_df()
Transmute all document version results to a Pandas dataframe:
- Inserting results to the correct dataframe columns according to their field names
- Merging user edits and AI edits into their correct cell positions
Returns:
Type | Description |
---|---|
DataFrame
|
Each row in the dataframe is a document version. |
set_file_count
set_file_count(new_file_count)
Set the file count in the session properties JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_file_count
|
int
|
The count of files |
required |
Returns:
Type | Description |
---|---|
bool
|
True if operation successful, else false. |
set_processing_status
set_processing_status(new_status)
Set the processing status in the session properties JSON.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_status
|
str
|
The status to set. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if operation successful, else false. |
turtle_from_classifiers
turtle_from_classifiers()
Convert the information standard classifiers dictionary into Turtle, a textual syntax language for RDF triples that can be imported into other systems.
Returns:
Type | Description |
---|---|
str
|
A string-formatted JSON-LD Turtle definition. |