Rag Intro | Substrate

A simple example of RAG (Retrieval Augmented Generation) on Substrate

RAG is a popular technique for enhancing the output of language models. RAG consists of two parts:

"R" (Retrieval) – search for relevant context. Substrate comes with a built-in vector store that's performant and cost effective.
"AG" (Augmented Generation) – enrich a prompt with search results to "augment" the output.

In this guide, we'll show you how to search Hacker News comments for a topic, extract sentiment, and generate a research summary using Substrate.

This concise RAG implementation runs dozens of LLM calls in parallel and streams the markdown result.

First, we search HackerNews comments using the Algolia HN Search API (opens in a new tab).

TypeScript

Python


const searchResults = await hnSearch({
  query: query,
  numericFilters: `created_at_i>${Math.floor(Date.now() / 1000) - 60 * 60 * 24 * 7 * 4}`,
  tags: "comment",
});

Next, we use ComputeJSON to extract summary, sentiment, and other metadata from each comment. In TypeScript, we use zod (opens in a new tab) and zod-to-json-schema (opens in a new tab) to create the JSON schema. In Python, we use Pydantic (opens in a new tab).

TypeScript

Python


const commentInfo = z.object({
  summary: z
    .string()
    .describe(
      "Summarize in a couple sentences: who is commenting, what the comment is about, how it is related to the topic.",
    ),
  storyTitle: z.string().describe("The story title."),
  forHiring: z
    .boolean()
    .describe(
      "True if the story is a Ask HN post with Who is hiring, Who wants to be hired, or Seeking freelancer in the title",
    ),
  sentiment: z.enum(["positive", "neutral", "negative"]).describe("Sentiment of the post."),
  objectID: z.string().describe("objectID field"),
});
let summaries = [];
for (const hit of searchResults.hits) {
  summaries.push(
    new ComputeJSON({
      prompt: `Summarize this comment and how it relates to the topic: ${query}
      Use "negative" sentiment for posts about API, abstraction, documentation, tutorial, general quality, slowness, or performance issues.
      COMMENT: ${JSON.stringify(hit)}`,
      json_schema: zodToJsonSchema(commentInfo),
    }),
  );
}

Finally, we use ComputeText to generate a markdown summary of all the extracted JSON, and stream the results of the markdown node.

TypeScript

Python


const markdown = new ComputeText({
  prompt: sb.concat(
    `Below is a list of summarized comments about ${query} on Hacker News.
  Generate concise markdown summarizing the results.
  Summarize the contents of the comment and the sentiment about ${query}.
  Categorize results under sentiment headers.
  Order from most negative to least negative within each category.
  Add a link to the original story URL in this format: [<story title>](https://news.ycombinator.com/item?id=<objectID>)
  Filter out posts that do not seem to be about ${query}.
  RESULTS:\n`,
    ...summaries.map((s) => sb.jq(s.future.json_object, "@json")),
  ),
  model: "Llama3Instruct70B",
});
const stream = await substrate.stream(markdown);

The code we wrote was really simple. Implicitly, we were creating the graph below. But we didn't have to think about the graph at all! With Substrate, by simply relating tasks to each other, we get automatic parallelization of dozens of LLM calls for free, and 0 roundtrips.

To try this example yourself, and fork it, check out our runnable example on val.town (opens in a new tab).

Run Python code Store & query vectors