Voyage AI with MongoDB / What are embeddings?
Code Summary: What are embeddings?
The following provides a summary of the code to generate Voyage AI embeddings and perform a similarity search.
Prerequisites
- Python
- Voyage AI API key
Usage
Generate Vector Embeddings:
The following initializes a Voyage AI client, embeds a single document string using the voyage-4 model, and prints the dimensionality of the resulting embedding vector.
import voyageai
vo = voyageai.Client()
document = "The Romans perfected the use of arches, vaults, and concrete in their monumental buildings. The Colosseum and Pantheon showcase their engineering brilliance and aesthetic vision."
embeddings = vo.embed(document,model="voyage-4").embeddings[0]
print(len(embeddings))
Sample Data:
The following defines a list of 15 documents, each with a title, description, and category, spanning five topics: History, Health, Technology, Art, and Gothic architecture.
data = [
{
"title": "Ancient Roman Architecture",
"description": "The Romans perfected the use of arches, vaults, and concrete in their monumental buildings. The Colosseum and Pantheon showcase their engineering brilliance and aesthetic vision.",
"category": "History",
},
{
"title": "Mediterranean Diet Benefits",
"description": "A dietary pattern rich in olive oil, fish, vegetables, and whole grains. Studies show it reduces heart disease risk and promotes longevity through anti-inflammatory compounds.",
"category": "Health",
},
{
"title": "Machine Learning Fundamentals",
"description": "Algorithms that enable computers to learn from data without explicit programming. Neural networks, decision trees, and gradient descent are core concepts in this field.",
"category": "Technology",
},
{
"title": "Greek Classical Architecture",
"description": "Ancient Greek structures featured columns, symmetry, and mathematical precision. The Parthenon exemplifies their dedication to proportion and harmony in building design.",
"category": "History",
},
{
"title": "Artificial Intelligence in Healthcare",
"description": "AI systems analyze medical images, predict patient outcomes, and assist in diagnosis. Deep learning models can detect diseases earlier than human experts in some cases.",
"category": "Technology",
},
{
"title": "Nutritional Science and Longevity",
"description": "Research links specific eating patterns with extended lifespan. Caloric restriction, antioxidant-rich foods, and healthy fats play crucial roles in cellular health and aging.",
"category": "Health",
},
{
"title": "Renaissance Art Techniques",
"description": "Artists developed linear perspective, chiaroscuro, and realistic human anatomy rendering. Masters like Leonardo and Michelangelo revolutionized visual representation.",
"category": "Art",
},
{
"title": "Deep Learning Neural Networks",
"description": "Multi-layered networks that process information hierarchically, mimicking brain structure. Convolutional and recurrent architectures excel at image and sequence tasks respectively.",
"category": "Technology",
},
{
"title": "Plant-Based Nutrition",
"description": "Diets centered on vegetables, fruits, legumes, and nuts provide fiber, vitamins, and phytonutrients. Research suggests reduced cancer and diabetes risk compared to meat-heavy diets.",
"category": "Health",
},
{
"title": "Gothic Cathedral Construction",
"description": "Medieval builders created soaring structures with pointed arches, flying buttresses, and stained glass. Notre-Dame and Chartres demonstrate vertical emphasis and light manipulation.",
"category": "History",
},
{
"title": "Computer Vision Applications",
"description": "Systems that interpret visual information from cameras and sensors. Object detection, facial recognition, and autonomous vehicle navigation rely on these technologies.",
"category": "Technology",
},
{
"title": "Impressionist Painting Movement",
"description": "Artists like Monet and Renoir captured light and movement through loose brushwork and pure color. They painted outdoors to depict changing atmospheric conditions authentically.",
"category": "Art",
},
{
"title": "Gut Microbiome and Health",
"description": "Trillions of bacteria in the digestive system influence immunity, metabolism, and mental health. Fermented foods and diverse fiber sources promote beneficial microbial communities.",
"category": "Health",
},
{
"title": "Natural Language Processing",
"description": "Computational techniques for understanding and generating human language. Transformers and attention mechanisms enable translation, summarization, and conversational AI systems.",
"category": "Technology",
},
{
"title": "Baroque Artistic Drama",
"description": "Characterized by intense emotion, dramatic lighting, and dynamic movement. Caravaggio and Bernini created theatrical compositions that engaged viewers emotionally.",
"category": "Art",
},
]
Calculate the Similarity of Two Vector Embeddings Using dotProduct:
The following initializes a Voyage AI client, embeds a list of documents and a query string separately using voyage-4 (with appropriate input_type), computes dot product similarity scores between the query and each document, and prints the top 5 most similar documents ranked by score.
import voyageai
import numpy as np
from dotenv import load_dotenv
import os
import examples.data as data
# Sample documents
documents = [item["description"] for item in data.data]
query = "ancient construction methods"
# Generate embeddings for documents
doc_embeddings = vo.embed(
texts=documents,
model="voyage-4",
input_type="document"
).embeddings
# Generate embedding for query
query_embedding = vo.embed(
texts=[query],
model="voyage-4",
input_type="query"
).embeddings[0]
# Calculate similarity scores using dot product
similarities = np.dot(doc_embeddings, query_embedding)
# Sort by similarity (np.argsort with negative sign sorts high to low)
ranked_indices = np.argsort(-similarities)
for doc_index in ranked_indices[:5]: # Top 5 results
print(f"Document: {documents[doc_index][:100]}...")
print(f"Similarity score: {similarities[doc_index]:.4f}\n")