Microsoft Azure

AzureBlobSession

Bases: Session

A concrete implementation of the Session class that constructs a framework for processing information containers stored in the original Hoppa Azure Blob Storage architecture.

Inherits attributes from its parent class.

Attributes:

Name	Type	Description
`client`	`AzureBlobStorageClient`	Establishes a connection to an Azure Blob storage resource.
`ignore_files`	`List[str]`	List of string patterns / file names that hold special meaning in the Hoppa schema and should be ignored when interacting with source files.
`include_blobs`	`List[str]`	List of string patterns / file names that should be included when indexing document version. All other files will be ignored.
`headers`	`Dict[str, Any]`	HTTP request headers to include when reading or writing blobs. Defaults to 'block-blob' for "x-ms-blob-type" header.
`sharepoint_client`	`SharePointClient`	Estabishes a connection to Microsoft Graph APIs with delegated permissions to read and write SharePoint site, folder and file resources.
`autodesk_client`	`AutodeskClient`	Establishes a connection to Autodesk APIs with delegated permissions to read and write Autodesk Build / Docs / BIM 360 hub, project, file and folder resources.
`properties`	`Dict[str, Any]`	Metadata about the session.
`organization`	`str`	Organization identifier for the session
`workspace`	`str`	Workspace identifier within the organization
`session_id`	`str`	Unique session identifier
`directory`	`str`	Directory path for the session data. Automatically set as "{workspace}/{session_id}".
`user_id`	`str`	User identifier for authentication and logging
`workflow`	`Dict \| None`	Parsed workflow configuration
`classifiers`	`Dict`	Document classifiers configuration
`attributes`	`List`	Document attributes configuration
`tags`	`List[str]`	Document tags configuration
`prompts`	`Dict`	Custom prompts for AI operations
`document_versions`	`List[DocumentVersion]`	Documents in the session
`initialized`	`bool`	Whether the session has been initialized (document versions have been indexed)

init

__init__(organization, workspace, session_id, user_id, include_blobs=None)

Initialize the session by getting session properties, information standard and workflow from storage and parsing the contents.

Parameters:

Name	Type	Description	Default
`organization`	`str`	Organization containing the workspace session to connect to. Must match an Azure Blob Storage container name.	required
`workspace`	`str`	The workspace containing the session. Must match a folder in the parent container.	required
`session_id`	`str`	The id of the session. Must match a folder in the parent workspace folder in the organization container.	required
`user_id`	`str`	Optional. Required if the session contains documents from an external source (e.g. SharePoint).	required
`include_blobs`	`List[str]`	A list of blobs in the parent container. Provides a mechanism to further scope the session binding to fewer documents, avoiding long initialisation times.	`None`

excel_from_classifiers

excel_from_classifiers()

Export classifiers to Excel workbook with:

Summary sheet with all classifiers and their top-level properties
Individual sheets for each classifier's picklist options

Returns:

Type	Description
`BytesIO`	Excel workbook as bytes that can be downloaded or sent via API.

get_processing_status

get_processing_status()

Check if all files in the session have completed processing.

Returns:

Type	Description
`str`	`completed` if all files have status 'completed', `error` if any file has status 'failed', `processing` if no errors and any file has status 'processing', else `draft`.

parse_files

parse_files()

Get file list and format into a hierarchical list of DocumentVersions, metadata and results by parsing the blob directory and naming structure.

Returns:

Type	Description
`List[DocumentVersion]`	A list of DocumentVersions.

parse_standard

parse_standard()

Get user standard and parse from json to python objects.

Returns:

Name	Type	Description
`classifiers`	`Dict`	A dictionary of classification parts and picklists.
`attributes`	`List`	A list of attributes to search for.
`tags`	`List[str]`	A list of tags.
`prompts`	`Dict`	A dictionary of prompts for each analysis type (classify, search, tag).

parse_workflow

parse_workflow()

Get workflow and parse from json to python object. Will first look for a workflow file in the session directory. If not found, will then look for a workflow in the workspace directory, followed by the organization directory and then the default Hoppa directory.

Returns:

Type	Description
`List[Dict[str, Any]]`	A workflow object of stages and nested steps.

results_to_df

results_to_df()

Transmute all document version results to a Pandas dataframe:

Inserting results to the correct dataframe columns according to their field names
Merging user edits and AI edits into their correct cell positions

Returns:

Type	Description
`DataFrame`	Each row in the dataframe is a document version.

set_file_count

set_file_count(new_file_count)

Set the file count in the session properties JSON.

Parameters:

Name	Type	Description	Default
`new_file_count`	`int`	The count of files	required

Returns:

Type	Description
`bool`	True if operation successful, else false.

set_processing_status

set_processing_status(new_status)

Set the processing status in the session properties JSON.

Parameters:

Name	Type	Description	Default
`new_status`	`str`	The status to set.	required

Returns:

Type	Description
`bool`	True if operation successful, else false.

turtle_from_classifiers

turtle_from_classifiers()

Convert the information standard classifiers dictionary into Turtle, a textual syntax language for RDF triples that can be imported into other systems.

Returns:

Type	Description
`str`	A string-formatted JSON-LD Turtle definition.