FAISS GPU Vector Search

Vector Search

How FAISS Works

GPU-accelerated semantic similarity search with real-time embeddings

Process Flow

Text to Vectors

Embeddings transform text into 1536-dimensional vectors. Each dimension captures semantic meaning.

# Text becomes a point in vector space
embedding = model.encode("memory content")
# Result: [0.023, -0.184, 0.291, ...]

GPU Index

Vectors are indexed on the GPU for parallel search. CUDA cores enable simultaneous distance calculations.

Flat Exact search. Brute force.

IVF Inverted file. Clustered.

HNSW Hierarchical graph.

Semantic Search

Query vector finds nearest neighbors by distance. Similar meaning = close vectors = relevant results.

L2 Euclidean

IP Inner product

Cosine Angular

MCP API Reference

semantic_search Primary

Search memory with semantic similarity

semantic_search(
  query: string,    // Search text
  top_k: number     // Results (default: 5)
)

Returns: Array<{content, score, metadata}>

add_memory Write

Add new memory with auto-embedding

add_memory(
  content: string,  // Memory text
  metadata: object  // Optional tags
)

get_status Info

Get tether health and statistics

get_status()
// Returns memory count, GPU, uptime

~20ms

Search Time

GPU-accelerated

1B+

Vectors

Scales to billions

1536

Dimensions

OpenAI embeddings

CUDA

GPU

Any NVIDIA GPU

Flat

Index

Exact match

Real-time

Updates

Incremental add

Deployment

FAISS Deployment

GPU-accelerated vector search. Rapid deployment.

Discovery

Scope

Quick call

Tell us what you're building. We'll tell you what you need.

Define your use case
Estimate vector volume
Set integration points

Configure

Customize

Rapid setup

We configure the system for your specific requirements.

Configure FAISS index
Set up embedding pipeline
Optimize for your hardware

Deploy

Live

Fast deployment

Lightning-fast semantic search. Production-ready, running, yours.

Deploy to your infrastructure
Verify search performance
You're live