Substrate comes with built-in vector storage, which you can use to store and query generated embeddings. It's performant, colocated with the rest of your workload, and much more cost-effective than alternative vector database providers, like Pinecone, Supabase, or Weaviate.
This guide embeds a set of common phrases used to enhance image generation prompts – and then queries the vector store, recommending phrases to enhance a given prompt.
To create a new vector store, use FindOrCreateVectorStore. Give your store a collection_name
, and specify the embedding model.
Learn more: embedding models
jina-v2
is a popular model for embedding text.clip
is a popular model for embedding text and images.
To embed data, use EmbedText. Provide the data to embed, the collection_name
, and the embedding model
. Below, we create an array of embedding nodes – Substrate automatically runs these nodes in parallel because they have no upstream dependencies.
To query a vector store, use QueryVectorStore. Provide the query string, collection_name
, and embedding model
.
Learn more: QueryVectorStore parameters
- Set
include_metadata
toTrue
to include metadata in the response. The metadata includes the embedded text in thedoc
field. - Set
top_k
to 3 to retrieve only the top 3 most similar results. - To query images using a multimodal embedding model like
clip
, providequery_image_uris
. - Multiple queries can be run in a batch – simply pass multiple query strings or images.
The output of QueryVectorStore has query results in the results
field, which is a list of lists. In this example, it contains a single list of results. If we instead provided two query_strings
, it would contain two lists of results, one for each query string.
Output
{ "results": [ [ { "id": "079ee5765c8c4df98b50bdb7b5cbdd29", "distance": -0.723642945289612, "vector": null, "metadata": { "doc": "cell shaded cartoon", "doc_id": "079ee5765c8c4df98b50bdb7b5cbdd29" } }, { "id": "98ec8bb1da1243d88721645fc0a8899b", "distance": -0.717301785945892, "vector": null, "metadata": { "doc": "cinematic", "doc_id": "98ec8bb1da1243d88721645fc0a8899b" } }, { "id": "158f2fc695e648878d245fdf93fa2917", "distance": -0.715586066246033, "vector": null, "metadata": { "doc": "wide shot", "doc_id": "158f2fc695e648878d245fdf93fa2917" } } ] ], "collection_name": null, "model": "jina-v2", "metric": "inner"}