Set Up Medoids for Anomaly Detection (Two Approaches)
detail.loadingPreview
This workflow establishes cluster centers (medoids) and corresponding anomaly thresholds using two distinct methods: a distance matrix approach and a multimodal embedding model approach. This is crucial for identifying anomalies in the crops dataset.
About This Workflow
This workflow is designed as the second part of a three-part series focused on anomaly detection for a crops dataset. Its primary goal is to configure 'medoids' (cluster centers) and associated threshold scores. These medoids serve as reference points for each crop type, and the thresholds define the boundaries for anomaly detection. The workflow implements two parallel strategies:
- Distance Matrix Approach: This method calculates a distance matrix between points within a crop cluster and identifies a medoid based on which point is most similar to all others in that cluster. It then determines a threshold score by finding the point furthest from this medoid.
- Multimodal Embedding Model Approach: This method leverages textual descriptions of crops to find a medoid. It embeds these descriptions using a multimodal model and then queries the vector database to find the closest matching point (medoid) for each crop. A threshold is then calculated similarly to the distance matrix approach.
The workflow orchestrates calls to Qdrant for data retrieval and updates, and uses Python code for sparse matrix calculations. It also utilizes the Voyage AI API for embedding textual descriptions.
Key Features
- Dual Medoid Determination: Implements two distinct methods for identifying medoids (cluster centers) for anomaly detection.
- Distance Matrix Calculation: Utilizes Qdrant's search capabilities to generate a distance matrix and find the most central point within a cluster.
- Multimodal Embedding Integration: Leverages Voyage AI to embed textual crop descriptions, enabling medoid identification based on semantic similarity.
- Threshold Calculation: Dynamically calculates anomaly threshold scores based on the identified medoids and the furthest points from them.
- Qdrant Integration: Interacts with Qdrant for data fetching, filtering, and updating (setting payload values for medoids and thresholds).
- Python for Sparse Matrix Operations: Employs Python's SciPy library to efficiently process the sparse distance matrix and determine the medoid index.
- Configurable Variables: Uses 'Set' nodes to define and manage crucial variables like Qdrant connection details and search limits.
How To Use
- Trigger Workflow: Click 'Test workflow' on the
When clicking ‘Test workflow’node. - Initialize Variables: The
Qdrant cluster variablesandMedoids Variables/Text Medoids Variablesnodes set up essential configuration parameters like Qdrant URL, collection name, and search limits. - Get Crop Counts: The
Crop Countsnode queries Qdrant to get the number of points and facet counts for eachcrop_namein the specified collection. This information is processed byInfo About Crop Clustersto extract crop names and maximum cluster sizes. - Distance Matrix Approach (Upper Branch):
- For each crop name obtained:
Cluster Distance Matrixqueries Qdrant to get a distance matrix for points within that crop's cluster, using a specified sample size and vector (voyage).Scipy Sparse Matrix(Python code) processes this matrix to find the index of the point most similar to all others (the medoid).- The
medoid_idis extracted. Set medoid idupdates the payload of the identified medoid point in Qdrant, marking it withis_medoid: true.Get Medoid Vectorretrieves the vector and payload of this medoid.Prepare for Searching Thresholdtransforms the medoid vector into anoppositeOfCenterVectorand stores thecropNameandcenterId.Searching Scorequeries Qdrant for points that are furthest from the medoid (usingoppositeOfCenterVector) within the same crop, up tofurthestFromCenterlimit.Threshold Scoredetermines thethresholdScorefrom the score of the last point returned by the previous search.Set medoid threshold scoreupdates the payload of the medoid point, marking it withis_medoid_cluster_threshold.
- For each crop name obtained:
- Multimodal Embedding Approach (Lower Branch):
Textual (visual) crop descriptionsprovides a list of crop names and their descriptions.Split Out1separates these text anchors.- For each text anchor:
Embed textuses the Voyage AI API to generate an embedding for thecropDescription.Mergecombines the original crop name with the embedded vector.Get Medoid by Textqueries Qdrant using the generated embedding to find the closest point (medoid) for thatcropName.Set text medoid idmarks this identified point asis_text_anchor_medoid: truein Qdrant.Prepare for Searching Threshold1extracts the medoid vector andcenterIdfrom the retrieved point.Searching Text Medoid Scorequeries Qdrant for points furthest from this text-based medoid.Threshold Score1calculates thethresholdScore.Set medoid threshold score(note: this node has the same name as one in the upper branch but serves the text-based medoid) marks the text medoid point with itsis_medoid_cluster_threshold.
- Finalization: After both branches complete, the medoids and their associated threshold scores are set in Qdrant, ready for subsequent anomaly detection steps.
Apps Used
Workflow JSON
{
"id": "0d381f6c-b950-44d1-8703-204bd49f2d81",
"name": "Set Up Medoids for Anomaly Detection (Two Approaches)",
"nodes": 27,
"category": "Data Science",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 0d381f6c-b950...
About the Author
Free n8n Workflows Official
System Admin
The official repository for verified enterprise-grade workflows.
Statistics
Related Workflows
Discover more workflows you might like
Automate Local Business Outreach with AI-Powered Yelp Scraper
This workflow automates the process of scraping local business details from Yelp using AI, then leverages that data to send personalized partnership proposals via Gmail. It's perfect for sales and marketing teams looking to streamline lead generation and outreach campaigns.
Automate Getty Images Editorial Search & CMS Integration
This n8n workflow automates searching for editorial images on Getty Images, extracts key details and embed codes, and prepares them for seamless integration into your Content Management System (CMS), streamlining your content creation process.
Universal CSV to JSON API Converter
Effortlessly transform CSV data into structured JSON with this versatile n8n workflow. Integrate it into any application as a custom API endpoint, supporting various input methods including file uploads and raw text.