Skip to main content

文档QA

LangChain提供了一系列专门针对非结构化文本数据处理的链条: StuffDocumentsChainMapReduceDocumentsChain, 和 RefineDocumentsChain。这些链条是开发与这些数据交互的更复杂链条的基本构建模块。它们旨在接受文档和问题作为输入,然后利用语言模型根据提供的文档制定答案。

  • StuffDocumentsChain: 这是三者中最简单的链条。它只是将所有输入文档注入到提示中作为上下文,并返回问题的答案。它适用于在少量文档上进行的QA任务。
  • MapReduceDocumentsChain: 这个链条合并了一个预处理步骤,以选择每个文档的相关部分,直到标记的总数少于模型允许的最大标记数。然后,它使用转换后的文档作为上下文来回答问题。它适用于在更大的文档上进行的QA任务,并可以并行运行预处理步骤,从而减少运行时间。
  • RefineDocumentsChain: 这个链条逐个遍历输入文档,每次迭代都使用上一个答案版本和下一个文档作为上下文更新中间答案。它适用于在大量文档上进行的QA任务。

用法, StuffDocumentsChainMapReduceDocumentsChain

import { OpenAI } from "langchain/llms/openai";
import { loadQAStuffChain, loadQAMapReduceChain } from "langchain/chains";
import { Document } from "langchain/document";

export const run = async () => {
// This first example uses the `StuffDocumentsChain`.
const llmA = new OpenAI({});
const chainA = loadQAStuffChain(llmA);
const docs = [
new Document({ pageContent: "Harrison went to Harvard." }),
new Document({ pageContent: "Ankush went to Princeton." }),
];
const resA = await chainA.call({
input_documents: docs,
question: "Where did Harrison go to college?",
});
console.log({ resA });
// { resA: { text: ' Harrison went to Harvard.' } }

// This second example uses the `MapReduceChain`.
// Optionally limit the number of concurrent requests to the language model.
const llmB = new OpenAI({ maxConcurrency: 10 });
const chainB = loadQAMapReduceChain(llmB);
const resB = await chainB.call({
input_documents: docs,
question: "Where did Harrison go to college?",
});
console.log({ resB });
// { resB: { text: ' Harrison went to Harvard.' } }
};

用法, RefineDocumentsChain

import { loadQARefineChain } from "langchain/chains";
import { OpenAI } from "langchain/llms/openai";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";

export async function run() {
// Create the models and chain
const embeddings = new OpenAIEmbeddings();
const model = new OpenAI({ temperature: 0 });
const chain = loadQARefineChain(model);

// Load the documents and create the vector store
const loader = new TextLoader("./state_of_the_union.txt");
const docs = await loader.loadAndSplit();
const store = await MemoryVectorStore.fromDocuments(docs, embeddings);

// Select the relevant documents
const question = "What did the president say about Justice Breyer";
const relevantDocs = await store.similaritySearch(question);

// Call the chain
const res = await chain.call({
input_documents: relevantDocs,
question,
});

console.log(res);
/*
{
output_text: '\n' +
'\n' +
"The president said that Justice Stephen Breyer has dedicated his life to serve this country and thanked him for his service. He also mentioned that Judge Ketanji Brown Jackson will continue Justice Breyer's legacy of excellence, and that the constitutional right affirmed in Roe v. Wade—standing precedent for half a century—is under attack as never before. He emphasized the importance of protecting access to health care, preserving a woman's right to choose, and advancing maternal health care in America. He also expressed his support for the LGBTQ+ community, and his commitment to protecting their rights, including offering a Unity Agenda for the Nation to beat the opioid epidemic, increase funding for prevention, treatment, harm reduction, and recovery, and strengthen the Violence Against Women Act."
}
*/
}