Embedding Engine
The Embedding Engine in Rakis is responsible for managing the embedding process for text inputs. It handles the worker management, embedding queue processing, and supports different embedding models. The Embedding Engine is used for two main purposes:
-
Embedding Inference Results: When an inference is completed, the output text needs to be embedded into a numerical representation for consensus verification and comparison.
-
Consensus Verification: During the consensus process, the output texts from various peers need to be embedded and compared to verify the consistency and accuracy of the results.
Worker Management
The Embedding Engine utilizes a worker-based architecture to handle the embedding process. Each worker is dedicated to a specific embedding model and can handle embedding requests concurrently.
Step 1: Initializing Workers
When the Rakis Domain is initialized, the Embedding Engine is configured with the desired number of workers for each supported embedding model. The scaleEmbeddingWorkers
function is used to create and initialize the workers:
await embeddingEngine.scaleEmbeddingWorkers(
"nomic-ai/nomic-embed-text-v1.5", // Embedding model name
2 // Number of workers
);
Step 2: Adding and Removing Workers
The addEmbeddingWorker
and deleteEmbeddingWorker
functions are used to dynamically add or remove workers as needed. These functions can be called at runtime to adjust the number of workers based on the embedding workload.
Worker Lifecycle
Each embedding worker manages its own lifecycle, including loading the embedding model, handling embedding requests, and reporting its status. The worker communicates with the Embedding Engine through message passing, allowing for concurrent and asynchronous processing.
Embedding Queue Processing
The Embedding Engine maintains a queue of embedding requests, which can be either inference results or consensus verification requests. The queue is prioritized based on the following criteria:
- Consensus Verification Requests with Contributions: Requests for consensus verification where the local node has contributed an inference are given the highest priority.
- Inference Result Requests: Requests for embedding inference results are prioritized next.
- Other Consensus Verification Requests: Requests for consensus verification without local contributions are given the lowest priority.
Within each priority level, requests are sorted by their expiration time, with the soonest expiring requests being processed first.
Queue Processing
When an embedding worker becomes available, the Embedding Engine assigns the next embedding request from the queue to the worker. The worker then processes the request and returns the embedding results. The Embedding Engine handles the results accordingly, either by storing the inference embedding or passing the consensus verification embeddings to the Inference DB for further processing.
Embedding Model Support
The Embedding Engine supports multiple embedding models, allowing for flexibility and the ability to choose the most suitable model for a given use case. The availableEmbeddingModels
constant in the types.ts file lists the supported models.
Adding support for a new embedding model involves updating the availableEmbeddingModels
constant and potentially providing custom logic for loading and configuring the model in the worker.
Example: Adding a New Embedding Model
Let's say you want to add support for the "new-awesome-embedder" model. Here are the steps you'd follow:
- Update the
availableEmbeddingModels
constant in types.ts:
export const availableEmbeddingModels = [
"nomic-ai/nomic-embed-text-v1.5",
"new-awesome-embedder", // Add the new model here
] as const;
- In the embedding-worker.ts file, update the
loadEmbeddingWorker
function to handle the new model:
async function loadEmbeddingWorker(modelName: EmbeddingModelName, workerId: string) {
try {
// ... existing code ...
if (modelName === "new-awesome-embedder") {
// Custom logic to load and configure the "new-awesome-embedder" model
workerInstance!.pipeline = await loadNewAwesomeEmbedder();
}
// ... existing code ...
} catch (err) {
// ... error handling ...
}
}
- Implement the
loadNewAwesomeEmbedder
function to load and configure the new model based on its specific requirements.
By following these steps, you can easily extend the Embedding Engine to support additional embedding models as needed.
Performance and Scaling
The Embedding Engine is designed to handle a high volume of embedding requests efficiently. It achieves this through worker scaling and queue prioritization.
Worker Scaling
The number of workers for each embedding model can be dynamically adjusted based on the workload. This allows the Embedding Engine to scale up or down as needed, optimizing resource utilization and ensuring timely processing of embedding requests.
Queue Prioritization
By prioritizing the embedding queue based on the request type and expiration time, the Embedding Engine ensures that critical requests are processed first, minimizing the risk of missed deadlines or consensus failures.
Logging and Monitoring
The Embedding Engine logs important events and metrics, such as worker status changes, embedding progress, and errors. These logs can be used for monitoring and troubleshooting purposes, providing insights into the embedding process and identifying potential bottlenecks or issues.
Additionally, the Embedding Engine emits events that can be subscribed to by other components, enabling real-time monitoring and integration with external monitoring systems.