Classifiers
Classifiers categorise an input. Some classifiers use generative AI models to do this, while others use quantitative techniques like vector similarity search.
azure_openai_json_classification
azure_openai_json_classification(classifier, description, filename, content=None, metadata=None, image_url=None, max_characters=8000, system_prompt=None, api_key=None, azure_endpoint=None, api_version=None, model=None, max_retries=5)
Classify an artefact (e.g. a file or page) into a category from a provided list of classifiers. Uses an LLM provided by the Azure OpenAI service.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classifier
|
List[Dict[str, Any]]
|
A list of classifiers. |
required |
filename
|
str
|
Title of the artefact. |
required |
description
|
str
|
Description of the artefact. |
required |
content
|
str
|
Main body content of the artefact. |
None
|
metadata
|
Dict
|
Metadata properties about the artefact. |
None
|
image_url
|
str
|
A URL to an image of the artefact. Must be a signed or public URL that the OpenAI service can access. |
None
|
max_characters
|
int
|
Character limit before content will be truncated. |
8000
|
system_prompt
|
str
|
Override the default prompt with custom instructions. |
None
|
api_key
|
str
|
API key for Azure resource. If not provided will default to environment variable |
None
|
azure_endpoint
|
str
|
Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable |
None
|
api_version
|
str
|
API version for Azure resource. |
None
|
model
|
str
|
Model deployment name within the Azure resource. If not provided will default to environment variable |
None
|
max_retries
|
int
|
Number of retry attempts to the Azure OpenAI API service. |
5
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
JSON-formatted dictionary containing:
|
vector_similarity_search
vector_similarity_search(query, storage_location='./storage', parquet_name='vectors')
Perform vector cosine similarity search between an input query and a set of embeddings stored in a local parquet file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str
|
The input string for which the similarity search needs to be performed. |
required |
storage_location
|
str
|
Storage location of the Parquet embeddings files. |
'./storage'
|
parquet_name
|
str
|
The name of the Parquet embeddings files. |
'vectors'
|
Returns:
Type | Description |
---|---|
Dict[Any, Any]
|
A dictionary object of the most similar match. |