Introduction
Welcome back, future vector search wizard! In the previous chapters, we laid the groundwork by understanding what vector search is all about and setting up our environment with the powerful USearch library. Now, it’s time to get our hands dirty and perform our very first vector search!
This chapter is designed to be your launchpad into practical vector search. We’ll walk through the essential steps: initializing a USearch index, populating it with some sample data (vectors), and then querying it to find similar items. By the end, you’ll have a clear understanding of the fundamental operations and confidence in building your own basic vector search applications.
Remember, our goal isn’t just to copy-paste code. It’s about truly understanding what each piece does, why it’s there, and how it contributes to the magic of finding similar data in vast datasets. Let’s dive in!
Core Concepts: Building Blocks of Vector Search
Before we write any code, let’s quickly reinforce a few core ideas that USearch brings to life.
What are Vectors (Again)?
Think of a vector as a list of numbers that represents a piece of information – it could be a word, a sentence, an image, or even a product description. Each number in the vector is a “dimension,” and the entire vector captures the semantic meaning or characteristics of the original data. For example, a vector for “apple” (the fruit) might be numerically closer to “banana” than to “Apple” (the company).
The USearch Index: Your Vector Organizer
At the heart of USearch is its Index structure. This isn’t just a simple list; it’s a highly optimized data structure designed for Approximate Nearest Neighbor (ANN) search.
- What is ANN? Imagine you have millions of vectors. Finding the absolute closest vector to a query vector could take ages. ANN algorithms, like those used by USearch, find vectors that are very close to the query, often sacrificing a tiny bit of precision for massive speed improvements. This is crucial for real-time applications.
- How does it work? USearch builds a graph-like structure (specifically, an HNSW graph) where each vector is a node. Edges connect similar vectors. When you search, USearch efficiently traverses this graph to quickly locate neighbors.
- Why is it important? It allows you to search through billions of vectors in milliseconds, which is impossible with traditional exact search methods.
Measuring Similarity: Cosine Distance
How do we know if two vectors are “similar”? We use a similarity metric. USearch supports several, but one of the most common and intuitive for many AI applications is Cosine Distance (or its inverse, Cosine Similarity).
- What is Cosine Distance? Imagine two vectors originating from the same point in space. Cosine distance measures the angle between them.
- If the angle is small (vectors point in roughly the same direction), they are highly similar, and the cosine distance is small (close to 0).
- If the angle is large (vectors point in opposite directions), they are dissimilar, and the cosine distance is large (close to 2 for normalized vectors).
- Why Cosine? It’s great for capturing directional similarity, meaning it’s sensitive to the orientation of vectors rather than their magnitude. This is often preferred when the “meaning” is encoded in the direction.
Let’s visualize this with a super simple analogy. Imagine you’re describing fruits. If you say “sweet, round, red,” that’s a vector. “Sweet, elongated, yellow” is another. Cosine similarity would tell you how close “apple” and “banana” are based on these descriptions, even if one fruit is much larger (magnitude) than the other.
Step-by-Step Implementation: Your First USearch Query
Now, let’s put these concepts into practice. We’ll use Python for our example, as it’s the primary interface for USearch for many data science tasks.
1. Confirm Your Setup
First, make sure USearch is installed. If you haven’t yet, open your terminal or command prompt and run:
pip install usearch==6.25.0 numpy
We’re explicitly installing version 6.25.0 of USearch (as of 2026-02-17, this is a recent stable version based on build.gradle in the GitHub repo) and numpy, which is excellent for handling numerical data and vectors in Python.
2. Create Your Python Script
Create a new Python file, let’s call it first_search.py.
# first_search.py
# First, we import the necessary libraries.
# 'usearch' is our vector search engine.
# 'numpy' helps us create and manage numerical arrays (our vectors).
import usearch
import numpy as np
print("USearch and NumPy imported successfully!")
Explanation:
import usearch: This line brings the USearch library into our script, allowing us to use its functions and classes.import numpy as np: We importnumpyand give it the common aliasnp. NumPy arrays are a natural fit for representing vectors.
3. Initialize Your USearch Index
Next, we’ll create an instance of the USearch index. This is where your vectors will live.
# first_search.py
import usearch
import numpy as np
print("USearch and NumPy imported successfully!")
# Define the dimensionality of our vectors.
# All vectors added to this index MUST have this many dimensions.
vector_dimensions = 3
# Initialize the USearch index.
# 'ndim' specifies the number of dimensions for our vectors.
# 'metric' defines how similarity is calculated (e.g., 'cosine' for cosine distance).
# 'connectivity' is an HNSW parameter controlling graph density (higher = more accurate, slower build/search).
# For a first example, 16 is a good balance.
index = usearch.Index(ndim=vector_dimensions, metric='cosine', connectivity=16)
print(f"USearch index initialized with {vector_dimensions} dimensions and 'cosine' metric.")
Explanation:
vector_dimensions = 3: We’re starting with very simple 3-dimensional vectors. In real-world scenarios, these could be hundreds or thousands of dimensions.usearch.Index(...): This creates our index.ndim=vector_dimensions: Crucially tells the index what size of vectors it expects.metric='cosine': Specifies that we want to use cosine distance to measure similarity. Other options includel2sq(squared Euclidean distance) orip(inner product).connectivity=16: This is an HNSW (Hierarchical Navigable Small World) graph parameter. It influences the trade-off between search speed, accuracy, and memory usage. A higher value generally means a more connected graph, potentially leading to better recall but slower build times and slightly larger memory footprint. For exploration, 16 is a common starting point.
4. Generate and Add Sample Vectors
Now, let’s create some dummy vectors and add them to our index. Each vector needs a unique integer key (or label).
# first_search.py
import usearch
import numpy as np
print("USearch and NumPy imported successfully!")
vector_dimensions = 3
index = usearch.Index(ndim=vector_dimensions, metric='cosine', connectivity=16)
print(f"USearch index initialized with {vector_dimensions} dimensions and 'cosine' metric.")
# Create some sample vectors using NumPy.
# We'll create 5 vectors, each with 3 dimensions.
# Think of these as our "items" in the database.
vectors_to_add = np.array([
[0.1, 0.2, 0.3], # Vector for item with key 1
[0.15, 0.25, 0.35], # Vector for item with key 2 (similar to 1)
[0.8, 0.9, 0.7], # Vector for item with key 3 (dissimilar to 1 and 2)
[0.75, 0.85, 0.65], # Vector for item with key 4 (similar to 3)
[0.4, 0.5, 0.45] # Vector for item with key 5 (somewhere in between)
], dtype=np.float32) # USearch often prefers float32 for performance
# Assign unique integer keys (labels) to our vectors.
# These keys are what you'd use to retrieve the original data later.
vector_keys = np.array([1, 2, 3, 4, 5], dtype=np.longlong) # USearch expects longlong for keys
# Add the vectors to the index.
# 'labels' are the unique identifiers.
# 'vectors' are the actual numerical representations.
index.add(labels=vector_keys, vectors=vectors_to_add)
print(f"Added {index.size} vectors to the index.")
Explanation:
vectors_to_add = np.array(...): We create a NumPy array where each row is a vector.dtype=np.float32is important for performance and compatibility with USearch’s underlying C++ implementation.vector_keys = np.array(...): We create an array of unique integer IDs. These are the “names” or “identifiers” for our vectors. If these vectors represented products,1might beproduct_id_1,2could beproduct_id_2, and so on. USearch stores these aslonglong.index.add(labels=vector_keys, vectors=vectors_to_add): This is the core operation to insert data. USearch efficiently builds its HNSW graph as you add vectors.
5. Perform a Similarity Search
Now for the exciting part: finding similar vectors! We’ll define a query_vector and ask the index to find its nearest neighbors.
# first_search.py
import usearch
import numpy as np
print("USearch and NumPy imported successfully!")
vector_dimensions = 3
index = usearch.Index(ndim=vector_dimensions, metric='cosine', connectivity=16)
print(f"USearch index initialized with {vector_dimensions} dimensions and 'cosine' metric.")
vectors_to_add = np.array([
[0.1, 0.2, 0.3],
[0.15, 0.25, 0.35],
[0.8, 0.9, 0.7],
[0.75, 0.85, 0.65],
[0.4, 0.5, 0.45]
], dtype=np.float32)
vector_keys = np.array([1, 2, 3, 4, 5], dtype=np.longlong)
index.add(labels=vector_keys, vectors=vectors_to_add)
print(f"Added {index.size} vectors to the index.")
# Define a query vector.
# We want to find vectors similar to this one.
query_vector = np.array([0.12, 0.22, 0.32], dtype=np.float32) # Very similar to vectors 1 and 2
# Perform the search!
# 'query_vector': The vector we want to find neighbors for.
# 'count': How many nearest neighbors we want to retrieve.
# 'exact': Set to False for ANN search (default and recommended for speed).
# 'labels', 'distances' will be NumPy arrays of the results.
results = index.search(query_vector, count=3, exact=False)
# Unpack the results
found_labels = results.labels
found_distances = results.distances
print(f"\nSearching for top 3 neighbors of query vector: {query_vector}")
print("--- Search Results ---")
for i in range(len(found_labels)):
print(f" Result {i+1}: Key={found_labels[i]}, Distance={found_distances[i]:.4f}")
print("\nUnderstanding the distance:")
print(" For 'cosine' metric, a distance closer to 0 means higher similarity.")
print(" A distance closer to 2 means lower similarity (vectors pointing in opposite directions).")
Explanation:
query_vector = np.array(...): This is the vector representing the item you’re looking for. Notice it’s very close to our first two added vectors.index.search(query_vector, count=3, exact=False): This is the magic call!query_vector: The vector we’re comparing against.count=3: We’re asking for the top 3 most similar vectors.exact=False: This tells USearch to use its fast ANN algorithm. Setting it toTruewould perform an exact (brute-force) search, which is much slower for large datasets but guarantees the absolute nearest neighbors. For most use cases,Falseis preferred.
results.labelsandresults.distances: Thesearchmethod returns an object with two key attributes:labels: A NumPy array of the integer keys of the found vectors.distances: A NumPy array of the distances between thequery_vectorand each of the found vectors.
- Interpreting Distances: For
cosinedistance, a value closer to0means the vectors are nearly identical in direction (highly similar). A value closer to2means they are very dissimilar (pointing in opposite directions).
Run your script! Save first_search.py and run it from your terminal:
python first_search.py
You should see output similar to this (the exact order or distance might vary slightly due to floating-point precision or HNSW graph traversal, but the closest items should be consistent):
USearch and NumPy imported successfully!
USearch index initialized with 3 dimensions and 'cosine' metric.
Added 5 vectors to the index.
Searching for top 3 neighbors of query vector: [0.12 0.22 0.32]
--- Search Results ---
Result 1: Key=1, Distance=0.0003
Result 2: Key=2, Distance=0.0003
Result 3: Key=5, Distance=0.0022
Understanding the distance:
For 'cosine' metric, a distance closer to 0 means higher similarity.
A distance closer to 2 means lower similarity (vectors pointing in opposite directions).
Notice how keys 1 and 2 are found with very small distances, indicating high similarity to our query_vector, as expected! Key 5 is the next closest.
Mini-Challenge: Explore Vector Space
Now it’s your turn to experiment!
Challenge:
- Add two more vectors to the
vectors_to_addarray and their corresponding keys tovector_keys. Make one of them very similar tovectors_to_add[2](key 3) and another completely random. - Change the
query_vectorto one that you expect to be similar to your newly added vectors, or to existingvectors_to_add[2](key 3). - Run the script again and observe the results.
Hint:
- When adding new vectors, ensure they have the same
vector_dimensions(3 in this case). - Remember to update both
vectors_to_addandvector_keysarrays. - The
index.add()call can be made multiple times, or you can add all vectors at once. For simplicity, modify the existingvectors_to_addandvector_keysbefore theindex.add()call.
What to Observe/Learn:
- How do the
found_labelsandfound_distanceschange when you query with a vector from a different “cluster” in your data? - Can you predict which vectors will be returned based on their numerical values relative to your
query_vector? - What happens if you increase
countin theindex.search()call?
Take your time, try different values, and really feel out how the distances reflect the similarity you intuitively expect.
Common Pitfalls & Troubleshooting
Even with simple examples, it’s easy to stumble. Here are a couple of common issues:
Dimension Mismatch:
- Pitfall: Trying to add a vector with 4 dimensions to an index initialized with
ndim=3. - Error: USearch will raise an error like
ValueError: Inconsistent dimension: expected 3, got 4. - Fix: Always ensure that
query_vectorand allvectors_to_addexactly match thendimyou specified when initializingusearch.Index.
- Pitfall: Trying to add a vector with 4 dimensions to an index initialized with
Incorrect Data Type:
- Pitfall: Using default Python lists or NumPy arrays with
dtype=np.float64(double precision) when USearch expectsnp.float32. While USearch can often handlefloat64,float32is generally preferred for performance and memory efficiency in vector search. - Error: Might not always be an explicit error, but could lead to slower performance or unexpected behavior.
- Fix: Explicitly set
dtype=np.float32for all your vector NumPy arrays:np.array([...], dtype=np.float32). Ensure yourvector_keysarenp.longlong.
- Pitfall: Using default Python lists or NumPy arrays with
Forgetting
pip install:- Pitfall: Trying to run the script without USearch or NumPy installed.
- Error:
ModuleNotFoundError: No module named 'usearch'orNo module named 'numpy'. - Fix: Run
pip install usearch==6.25.0 numpyin your environment.
Summary
Fantastic work! You’ve just performed your first vector search with USearch. Let’s quickly recap what we’ve covered:
- USearch Index: The core data structure for efficient Approximate Nearest Neighbor (ANN) search.
- Initialization: How to create an index with specific
ndim,metric(likecosine), andconnectivity. - Adding Vectors: Populating the index with numerical vectors and their unique integer
labels. - Searching: Querying the index with a
query_vectorto find the most similar items based on your chosen metric. - Result Interpretation: Understanding that smaller distances (for cosine) mean higher similarity.
You’ve taken a significant step in understanding how vector search works under the hood. In the next chapter, we’ll delve deeper into more advanced features of USearch, including how to persist your index to disk and load it back, and explore more complex data types.
References
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.