Vector Stores | Substrate

Store and query vectors using Substrate's built-in vector storage

Substrate comes with built-in vector storage, which you can use to store and query generated embeddings. It's performant, colocated with the rest of your workload, and much more cost-effective than alternative vector database providers, like Pinecone, Supabase, or Weaviate.

This guide embeds a set of common phrases used to enhance image generation prompts – and then queries the vector store, recommending phrases to enhance a given prompt.

To create a new vector store, use FindOrCreateVectorStore. Give your store a collection_name, and specify the embedding model.

Learn more: embedding models

jina-v2 is a popular model for embedding text.
clip is a popular model for embedding text and images.

Python

TypeScript


create = FindOrCreateVectorStore(
  collection_name="image_prompt_enhancements",
  model="jina-v2",
);
create_res = substrate.run(create)

To embed data, use EmbedText. Provide the data to embed, the collection_name, and the embedding model. Below, we create an array of embedding nodes – Substrate automatically runs these nodes in parallel because they have no upstream dependencies.

Python

TypeScript


enhancements = [
    "highly detailed",
    # ...
    "sharp focus",
]
nodes = []
for e in enhancements:
    embed = EmbedText(
        text=e,
        collection_name="image_prompt_enhancements",
        model="jina-v2",
    )
    nodes.append(embed)
embed_res = substrate.run(*nodes)

To query a vector store, use QueryVectorStore. Provide the query string, collection_name, and embedding model.

Learn more: QueryVectorStore parameters

Set include_metadata to True to include metadata in the response. The metadata includes the embedded text in the doc field.
Set top_k to 3 to retrieve only the top 3 most similar results.
To query images using a multimodal embedding model like clip, provide query_image_uris.
Multiple queries can be run in a batch – simply pass multiple query strings or images.

Python

TypeScript


query = QueryVectorStore(
    query_strings=["a towering shell the size of a city skyscraper"],
    collection_name="image_prompt_enhancements",
    model="jina-v2",
    include_metadata=True,
    top_k=3,
)
query_res = substrate.run(query)
query_out = query_res.get(query)

The output of QueryVectorStore has query results in the results field, which is a list of lists. In this example, it contains a single list of results. If we instead provided two query_strings, it would contain two lists of results, one for each query string.

Output


{
  "results": [
    [
      {
        "id": "079ee5765c8c4df98b50bdb7b5cbdd29",
        "distance": -0.723642945289612,
        "vector": null,
        "metadata": {
          "doc": "cell shaded cartoon",
          "doc_id": "079ee5765c8c4df98b50bdb7b5cbdd29"
        }
      },
      {
        "id": "98ec8bb1da1243d88721645fc0a8899b",
        "distance": -0.717301785945892,
        "vector": null,
        "metadata": {
          "doc": "cinematic",
          "doc_id": "98ec8bb1da1243d88721645fc0a8899b"
        }
      },
      {
        "id": "158f2fc695e648878d245fdf93fa2917",
        "distance": -0.715586066246033,
        "vector": null,
        "metadata": {
          "doc": "wide shot",
          "doc_id": "158f2fc695e648878d245fdf93fa2917"
        }
      }
    ]
  ],
  "collection_name": null,
  "model": "jina-v2",
  "metric": "inner"
}

RAG: Summarize Hacker News comments Image generation