Searchers
Searchers answer questions about the input data they are provided with. This could be as simple as named entity extraction (places, people, things) or more complex queries like summarising key recommendations.
azure_ai_language_search
azure_ai_language_search(df, column_name, confidence_threshold=0.98, batch_size=10000)
Extracts entities from text data using Azure AI Language Client and filters entities based on the specified confidence threshold. The function processes the DataFrame in batches and returns the original DataFrame with additional columns for the recognized entities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
A DataFrame containing a column with text data to be analyzed. |
required |
column_name
|
str
|
The name of the DataFrame column containing text fields to search for entities. |
required |
confidence_threshold
|
float
|
A threshold value between 0 and 1 to filter entities based on their confidence score. Only entities with a confidence score greater than or equal to this value are retained. |
0.98
|
batch_size
|
int
|
The size of each batch to process in parallel. |
10000
|
Returns:
Type | Description |
---|---|
DataFrame
|
pandas.DataFrame: The original DataFrame with additional columns for recognized entities, where each entity category (e.g., 'Organization', 'Location') is added as a new column. Where entity columns match an existing dataframe column name, columns are suffixed with '_entities'. |
azure_openai_search
azure_openai_search(search_term, content, metadata=None, image_url=None, max_characters=8000, system_prompt=None, api_key=None, azure_endpoint=None, api_version=None, model=None, max_retries=5)
Searches a document to find any free-text content that matches a search query.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_term
|
str
|
The search query |
required |
content
|
str
|
The main-body of the prompt |
required |
metadata
|
dict
|
Any additional information describing the content |
None
|
image_url
|
str
|
A signed or public URL where an image can be found |
None
|
max_characters
|
int
|
Maximum number of text characters (excluding images) from |
8000
|
system_prompt
|
str
|
Overrides the default prompt |
None
|
api_key
|
str
|
Access key for the Azure OpenAI resource |
None
|
azure_endpoint
|
str
|
Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. if not provided will default to environment variable |
None
|
api_version
|
str
|
API version for Azure resource. |
None
|
model
|
str
|
Model deployment name within the Azure resource. If not provided will default to environment variable |
None
|
max_retries
|
int
|
Maximum number of unsuccessful call attempts to the OpenAI service before returning an error. |
5
|
Returns:
Type | Description |
---|---|
str
|
Plain-text answer to the query, or an empty string if no answer found in the document. |