Skip to content

Microsoft Azure

AzureBlobDocumentVersion

Bases: DocumentVersion

row_count property

row_count

Total number of row artefacts across all sheets.

status property

status

Derived from artefact presence on this document version.

  • COMPLETED: a results artefact exists on the document, any sheet, or any child item (chunk/image/row). Any completed result anywhere in the tree is enough to mark the document completed.
  • PROCESSING: at least one chunk/image/row artefact is bound on any sheet, but no results artefact has been observed.
  • DRAFT: otherwise.

AzureBlobDocumentVersionSheet

Bases: DocumentVersionSheet

AzureBlobMetadataProperty

Bases: MetadataProperty

get

get()

Reads the metadata property definition from storage, overriding lazy load.

set

set(data=None)

Merges the updated metadata property into the session standard and writes to Azure Blob Storage.

AzureBlobMetadataSpecification

Bases: MetadataSpecification

get

get()

Reads the session standard and workflow definition from storage, overriding lazy load

set

set(data=None)

Writes the updated metadata properties and workflow definition to Azure Blob Storage.

AzureBlobPropertyValue

Bases: PropertyValue

get

get()

Reads the property value from storage, overriding lazy load.

set

set(data=None)

Merges the updated metadata property into the session standard and writes to Azure Blob Storage.

AzureBlobResults

Bases: Results

get

get()

Reads the results from storage, overriding lazy load

AzureBlobSession

Bases: Session

get_processing_status

get_processing_status()

Check if all files in the session have completed processing.

Returns:

Type Description
str

completed if all files have status 'completed', error if any file has status 'failed', processing if no errors and any file has status 'processing', else draft.

initialize

initialize(bind_sources=True)

Build the document-version graph for this session.

Parameters:

Name Type Description Default
bind_sources bool

When True (default), fully bind each source — download per-file settings and resolve connector download URLs — as needed for analysis. When False, take the lean results-read path (_initialize_for_results): skip the settings download and the connector bind, deriving document identity from the cached manifest instead. Use False only to read results.

True

refresh_status

refresh_status()

Reconcile cached session summary from a LIST + pointer-JSON reads.

Cheap path: one LIST to enumerate blobs and classify each by artefact-suffix presence (status, row_count, per-file status and row_count all fall out here).

Follow-up cost: for SharePoint/Autodesk source files whose display name isn't already cached in session_properties.json, download the pointer JSON to read name. Subsequent refreshes skip this for unchanged files, so it amortises to zero.

Writes status, file_count, row_count, sources, document_summaries and status_last_updated_at back to session_properties.json.

results_to_df

results_to_df()

Transmute all document version results to a Pandas dataframe:

  1. Inserting results to the correct dataframe columns according to their field names
  2. Merging user edits and AI edits into their correct cell positions

Returns:

Type Description
DataFrame

Each row in the dataframe is a document version.

set_file_count

set_file_count(new_file_count)

Set the file count in the session properties JSON.

Also updates the in-memory cache.

Parameters:

Name Type Description Default
new_file_count int

The count of files

required

Returns:

Type Description
bool

True if operation successful, else false.

set_processing_status

set_processing_status(new_status)

Set the processing status in the session properties JSON.

Also updates the in-memory cache and status timestamp so that subsequent reads of session.status reflect the new value without needing a full refresh_status().

Parameters:

Name Type Description Default
new_status str

The status to set.

required

Returns:

Type Description
bool

True if operation successful, else false.

AzureBlobSheetItem

Bases: SheetItem

set

set(data=None)

Writes the updated blob to Azure Blob Storage.

AzureBlobWorkflow

Bases: Workflow

get

get()

Reads the workflow definition from storage, overriding lazy load

set

set(data=None)

Writes the updated workflow definition to Azure Blob Storage.

HasBlobClient

Bases: Protocol

Protocol for classes with blob storage attributes

json_serializer

json_serializer(obj)

Custom JSON serializer for objects not serializable by default