Skip to content

Microsoft Azure

AzureBlobSession

Bases: Session

A concrete implementation of the Session class that constructs a framework for processing information containers stored in the original Hoppa Azure Blob Storage architecture.

Inherits attributes from its parent class.

Attributes:

Name Type Description
client AzureBlobStorageClient

Establishes a connection to an Azure Blob storage resource.

ignore_files List[str]

List of string patterns / file names that hold special meaning in the Hoppa schema and should be ignored when interacting with source files.

include_blobs List[str]

List of string patterns / file names that should be included when indexing document version. All other files will be ignored.

headers Dict[str, Any]

HTTP request headers to include when reading or writing blobs. Defaults to 'block-blob' for "x-ms-blob-type" header.

sharepoint_client SharePointClient

Estabishes a connection to Microsoft Graph APIs with delegated permissions to read and write SharePoint site, folder and file resources.

autodesk_client AutodeskClient

Establishes a connection to Autodesk APIs with delegated permissions to read and write Autodesk Build / Docs / BIM 360 hub, project, file and folder resources.

properties Dict[str, Any]

Metadata about the session.

organization str

Organization identifier for the session

workspace str

Workspace identifier within the organization

session_id str

Unique session identifier

directory str

Directory path for the session data. Automatically set as "{workspace}/{session_id}".

user_id str

User identifier for authentication and logging

workflow Dict | None

Parsed workflow configuration

classifiers Dict

Document classifiers configuration

attributes List

Document attributes configuration

tags List[str]

Document tags configuration

prompts Dict

Custom prompts for AI operations

document_versions List[DocumentVersion]

Documents in the session

initialized bool

Whether the session has been initialized (document versions have been indexed)

__init__

__init__(organization, workspace, session_id, user_id, include_blobs=None)

Initialize the session by getting session properties, information standard and workflow from storage and parsing the contents.

Parameters:

Name Type Description Default
organization str

Organization containing the workspace session to connect to. Must match an Azure Blob Storage container name.

required
workspace str

The workspace containing the session. Must match a folder in the parent container.

required
session_id str

The id of the session. Must match a folder in the parent workspace folder in the organization container.

required
user_id str

Optional. Required if the session contains documents from an external source (e.g. SharePoint).

required
include_blobs List[str]

A list of blobs in the parent container. Provides a mechanism to further scope the session binding to fewer documents, avoiding long initialisation times.

None

excel_from_classifiers

excel_from_classifiers()

Export classifiers to Excel workbook with:

  1. Summary sheet with all classifiers and their top-level properties
  2. Individual sheets for each classifier's picklist options

Returns:

Type Description
BytesIO

Excel workbook as bytes that can be downloaded or sent via API.

get_processing_status

get_processing_status()

Check if all files in the session have completed processing.

Returns:

Type Description
str

completed if all files have status 'completed', error if any file has status 'failed', processing if no errors and any file has status 'processing', else draft.

parse_files

parse_files()

Get file list and format into a hierarchical list of DocumentVersions, metadata and results by parsing the blob directory and naming structure.

Returns:

Type Description
List[DocumentVersion]

A list of DocumentVersions.

parse_standard

parse_standard()

Get user standard and parse from json to python objects.

Returns:

Name Type Description
classifiers Dict

A dictionary of classification parts and picklists.

attributes List

A list of attributes to search for.

tags List[str]

A list of tags.

prompts Dict

A dictionary of prompts for each analysis type (classify, search, tag).

parse_workflow

parse_workflow()

Get workflow and parse from json to python object. Will first look for a workflow file in the session directory. If not found, will then look for a workflow in the workspace directory, followed by the organization directory and then the default Hoppa directory.

Returns:

Type Description
List[Dict[str, Any]]

A workflow object of stages and nested steps.

results_to_df

results_to_df()

Transmute all document version results to a Pandas dataframe:

  1. Inserting results to the correct dataframe columns according to their field names
  2. Merging user edits and AI edits into their correct cell positions

Returns:

Type Description
DataFrame

Each row in the dataframe is a document version.

set_file_count

set_file_count(new_file_count)

Set the file count in the session properties JSON.

Parameters:

Name Type Description Default
new_file_count int

The count of files

required

Returns:

Type Description
bool

True if operation successful, else false.

set_processing_status

set_processing_status(new_status)

Set the processing status in the session properties JSON.

Parameters:

Name Type Description Default
new_status str

The status to set.

required

Returns:

Type Description
bool

True if operation successful, else false.

turtle_from_classifiers

turtle_from_classifiers()

Convert the information standard classifiers dictionary into Turtle, a textual syntax language for RDF triples that can be imported into other systems.

Returns:

Type Description
str

A string-formatted JSON-LD Turtle definition.