Skip to content

Microsoft Azure

AzureBlobDocumentVersion

Bases: DocumentVersion

row_count property

row_count

Total number of row artefacts across all sheets.

status property

status

Derived from artefact presence on this document version.

  • COMPLETED: a results artefact exists on the document, any sheet, or any child item (chunk/image/row). Any completed result anywhere in the tree is enough to mark the document completed.
  • PROCESSING: at least one chunk/image/row artefact is bound on any sheet, but no results artefact has been observed.
  • DRAFT: otherwise.

AzureBlobDocumentVersionSheet

Bases: DocumentVersionSheet

AzureBlobMetadataProperty

Bases: MetadataProperty

get

get()

Reads the metadata property definition from storage, overriding lazy load.

set

set(data=None)

Merges the updated metadata property into the session standard and writes to Azure Blob Storage.

AzureBlobMetadataSpecification

Bases: MetadataSpecification

get

get()

Reads the session standard and workflow definition from storage, overriding lazy load

set

set(data=None)

Writes the updated metadata properties and workflow definition to Azure Blob Storage.

AzureBlobPropertyValue

Bases: PropertyValue

get

get()

Reads the property value from storage, overriding lazy load.

set

set(data=None)

Merges the updated metadata property into the session standard and writes to Azure Blob Storage.

AzureBlobResults

Bases: Results

get

get()

Reads the results from storage, overriding lazy load

AzureBlobSession

Bases: Session

get_processing_status

get_processing_status()

Check if all files in the session have completed processing.

Returns:

Type Description
str

completed if all files have status 'completed', error if any file has status 'failed', processing if no errors and any file has status 'processing', else draft.

refresh_status

refresh_status()

Reconcile cached session summary from a LIST + pointer-JSON reads.

Cheap path: one LIST to enumerate blobs and classify each by artefact-suffix presence (status, row_count, per-file status and row_count all fall out here).

Follow-up cost: for SharePoint/Autodesk source files whose display name isn't already cached in session_properties.json, download the pointer JSON to read name. Subsequent refreshes skip this for unchanged files, so it amortises to zero.

Writes status, file_count, row_count, sources, document_summaries and status_last_updated_at back to session_properties.json.

results_to_df

results_to_df()

Transmute all document version results to a Pandas dataframe:

  1. Inserting results to the correct dataframe columns according to their field names
  2. Merging user edits and AI edits into their correct cell positions

Returns:

Type Description
DataFrame

Each row in the dataframe is a document version.

set_file_count

set_file_count(new_file_count)

Set the file count in the session properties JSON.

Also updates the in-memory cache.

Parameters:

Name Type Description Default
new_file_count int

The count of files

required

Returns:

Type Description
bool

True if operation successful, else false.

set_processing_status

set_processing_status(new_status)

Set the processing status in the session properties JSON.

Also updates the in-memory cache and status timestamp so that subsequent reads of session.status reflect the new value without needing a full refresh_status().

Parameters:

Name Type Description Default
new_status str

The status to set.

required

Returns:

Type Description
bool

True if operation successful, else false.

AzureBlobSheetItem

Bases: SheetItem

set

set(data=None)

Writes the updated blob to Azure Blob Storage.

AzureBlobWorkflow

Bases: Workflow

get

get()

Reads the workflow definition from storage, overriding lazy load

set

set(data=None)

Writes the updated workflow definition to Azure Blob Storage.

HasBlobClient

Bases: Protocol

Protocol for classes with blob storage attributes

json_serializer

json_serializer(obj)

Custom JSON serializer for objects not serializable by default