Microsoft Azure
AzureBlobDocumentVersion
Bases: DocumentVersion
row_count
property
row_count
Total number of row artefacts across all sheets.
status
property
status
Derived from artefact presence on this document version.
- COMPLETED: a results artefact exists on the document, any sheet, or any child item (chunk/image/row). Any completed result anywhere in the tree is enough to mark the document completed.
- PROCESSING: at least one chunk/image/row artefact is bound on any sheet, but no results artefact has been observed.
- DRAFT: otherwise.
AzureBlobDocumentVersionSheet
Bases: DocumentVersionSheet
AzureBlobMetadataProperty
Bases: MetadataProperty
get
get()
Reads the metadata property definition from storage, overriding lazy load.
set
set(data=None)
Merges the updated metadata property into the session standard and writes to Azure Blob Storage.
AzureBlobMetadataSpecification
Bases: MetadataSpecification
get
get()
Reads the session standard and workflow definition from storage, overriding lazy load
set
set(data=None)
Writes the updated metadata properties and workflow definition to Azure Blob Storage.
AzureBlobPropertyValue
Bases: PropertyValue
get
get()
Reads the property value from storage, overriding lazy load.
set
set(data=None)
Merges the updated metadata property into the session standard and writes to Azure Blob Storage.
AzureBlobResults
Bases: Results
get
get()
Reads the results from storage, overriding lazy load
AzureBlobSession
Bases: Session
get_processing_status
get_processing_status()
Check if all files in the session have completed processing.
Returns:
| Type | Description |
|---|---|
str
|
|
refresh_status
refresh_status()
Reconcile cached session summary from a LIST + pointer-JSON reads.
Cheap path: one LIST to enumerate blobs and classify each by artefact-suffix presence (status, row_count, per-file status and row_count all fall out here).
Follow-up cost: for SharePoint/Autodesk source files whose display
name isn't already cached in session_properties.json, download the
pointer JSON to read name. Subsequent refreshes skip this for
unchanged files, so it amortises to zero.
Writes status, file_count, row_count, sources, document_summaries and status_last_updated_at back to session_properties.json.
results_to_df
results_to_df()
Transmute all document version results to a Pandas dataframe:
- Inserting results to the correct dataframe columns according to their field names
- Merging user edits and AI edits into their correct cell positions
Returns:
| Type | Description |
|---|---|
DataFrame
|
Each row in the dataframe is a document version. |
set_file_count
set_file_count(new_file_count)
Set the file count in the session properties JSON.
Also updates the in-memory cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_file_count
|
int
|
The count of files |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if operation successful, else false. |
set_processing_status
set_processing_status(new_status)
Set the processing status in the session properties JSON.
Also updates the in-memory cache and status timestamp so that
subsequent reads of session.status reflect the new value without
needing a full refresh_status().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_status
|
str
|
The status to set. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if operation successful, else false. |
AzureBlobSheetItem
Bases: SheetItem
set
set(data=None)
Writes the updated blob to Azure Blob Storage.
AzureBlobWorkflow
Bases: Workflow
get
get()
Reads the workflow definition from storage, overriding lazy load
set
set(data=None)
Writes the updated workflow definition to Azure Blob Storage.
HasBlobClient
Bases: Protocol
Protocol for classes with blob storage attributes
json_serializer
json_serializer(obj)
Custom JSON serializer for objects not serializable by default