Skip to main content

LLMs: 附加功能

我们为LLM提供了许多附加功能。在下面的大多数示例中,我们将使用 OpenAI LLM。然而,所有这些功能都适用于所有LLMs。

附加方法

LangChain为与LLMs交互提供了许多附加方法:

import { OpenAI } from "langchain/llms/openai";

export const run = async () => {
const modelA = new OpenAI();
// `call` is a simple string-in, string-out method for interacting with the model.
const resA = await modelA.call(
"What would be a good company name a company that makes colorful socks?"
);
console.log({ resA });
// { resA: '\n\nSocktastic Colors' }

// `generate` allows you to generate multiple completions for multiple prompts (in a single request for some models).
const resB = await modelA.generate([
"What would be a good company name a company that makes colorful socks?",
"What would be a good company name a company that makes colorful sweaters?",
]);

// `resB` is a `LLMResult` object with a `generations` field and `llmOutput` field.
// `generations` is a `Generation[][]`, each `Generation` having a `text` field.
// Each input to the LLM could have multiple generations (depending on the `n` parameter), hence the list of lists.
console.log(JSON.stringify(resB, null, 2));
/*
{
"generations": [
[{
"text": "\n\nVibrant Socks Co.",
"generationInfo": {
"finishReason": "stop",
"logprobs": null
}
}],
[{
"text": "\n\nRainbow Knitworks.",
"generationInfo": {
"finishReason": "stop",
"logprobs": null
}
}]
],
"llmOutput": {
"tokenUsage": {
"completionTokens": 17,
"promptTokens": 29,
"totalTokens": 46
}
}
}
*/

// We can specify additional parameters the specific model provider supports, like `temperature`:
const modelB = new OpenAI({ temperature: 0.9 });
const resC = await modelA.call(
"What would be a good company name a company that makes colorful socks?"
);
console.log({ resC });
// { resC: '\n\nKaleidoSox' }

// We can get the number of tokens for a given input for a specific model.
const numTokens = modelB.getNumTokens("How many tokens are in this input?");
console.log({ numTokens });
// { numTokens: 8 }
};

流式响应

某些LLMs提供流式响应。这意味着您可以在整个响应被返回之前开始处理它。这很有用,如果您想要在生成响应时向用户显示响应,或者如果你要处理正在生成的响应。 LangChain目前提供OpenAI LLM的流式传输:

import { OpenAI } from "langchain/llms/openai";

// To enable streaming, we pass in `streaming: true` to the LLM constructor.
// Additionally, we pass in a handler for the `handleLLMNewToken` event.
const chat = new OpenAI({
maxTokens: 25,
streaming: true,
});

const response = await chat.call("Tell me a joke.", undefined, [
{
handleLLMNewToken(token: string) {
console.log({ token });
},
},
]);
console.log(response);
/*
{ token: '\n' }
{ token: '\n' }
{ token: 'Q' }
{ token: ':' }
{ token: ' Why' }
{ token: ' did' }
{ token: ' the' }
{ token: ' chicken' }
{ token: ' cross' }
{ token: ' the' }
{ token: ' playground' }
{ token: '?' }
{ token: '\n' }
{ token: 'A' }
{ token: ':' }
{ token: ' To' }
{ token: ' get' }
{ token: ' to' }
{ token: ' the' }
{ token: ' other' }
{ token: ' slide' }
{ token: '.' }


Q: Why did the chicken cross the playground?
A: To get to the other slide.
*/

缓存

LangChain为LLMs提供可选的缓存层。有两个原因很有用:

  1. 如果你经常请求相同的完成多次,它可以通过减少你对LLM提供商的调用次数来节省您的费用。
  2. 它可以通过减少你对LLM提供商的API调用次数来加速你的应用程序。

内存中的缓存

默认缓存存储在内存中。这意味着如果重新启动应用程序,则会清除缓存。

你可以在实例化LLM时传递cache: true来启用它。例如:

import { OpenAI } from "langchain/llms/openai";



const model = new OpenAI({ cache: true });

使用Momento进行缓存

LangChain还提供了基于Momento的缓存。Momento是一种分布式的服务器端无服务器缓存,不需要任何设置或基础设施维护。要使用它,您需要安装@gomomento/sdk软件包:


npm install @gomomento/sdk

接下来,您需要注册并创建API密钥。完成后,像这样实例化LLM时传递一个cache选项: Next you'll need to sign up and create an API key. Once you've done that, pass a cache option when you instantiate the LLM like this:

import { OpenAI } from "langchain/llms/openai";
import { MomentoCache } from "langchain/cache/momento";
import {
CacheClient,
Configurations,
CredentialProvider,
} from "@gomomento/sdk";

// See https://github.com/momentohq/client-sdk-javascript for connection options
const client = new CacheClient({
configuration: Configurations.Laptop.v1(),
credentialProvider: CredentialProvider.fromEnvironmentVariable({
environmentVariableName: "MOMENTO_AUTH_TOKEN",
}),
defaultTtlSeconds: 60 * 60 * 24,
});
const cache = await MomentoCache.fromProps({
client,
cacheName: "langchain",
});

const model = new OpenAI({ cache });

###使用Redis进行缓存

LangChain还提供了基于Redis的缓存。如果您想要在多个进程或服务器之间共享缓存,这将非常有用。要使用它,您需要安装redis软件包:


npm install redis

然后,当您实例化LLM时,可以通过传递一个cache选项来使用它。例如:


import { OpenAI } from "langchain/llms/openai";

import { RedisCache } from "langchain/cache/redis";

import { createClient } from "redis";



// See https://github.com/redis/node-redis for connection options

const client = createClient();

const cache = new RedisCache(client);



const model = new OpenAI({ cache });

##添加超时

Adding a timeout

默认情况下,LangChain将无限期地等待模型提供程序的响应。如果要添加超时,您可以在调用模型时传递一个以毫秒为单位的timeout选项。例如,对于OpenAI:

import { OpenAI } from "langchain/llms/openai";

const model = new OpenAI({ temperature: 1 });

const resA = await model.call(
"What would be a good company name a company that makes colorful socks?",
{ timeout: 1000 } // 1s timeout
);

console.log({ resA });
// '\n\nSocktastic Colors' }

取消请求

您可以在调用模型时传递一个signal选项来取消请求。例如,对于OpenAI:

import { OpenAI } from "langchain/llms/openai";

const model = new OpenAI({ temperature: 1 });
const controller = new AbortController();

// Call `controller.abort()` somewhere to cancel the request.

const res = await model.call(
"What would be a good company name a company that makes colorful socks?",
{ signal: controller.signal }
);

console.log(res);
/*
'\n\nSocktastic Colors'
*/

请注意,如果底层提供程序公开了取消请求的选项,那么此操作仅会取消即将发出的请求。LangChain 将尽可能取消底层请求,否则它将取消响应的处理。

Dealing with Rate Limits

一些LLM提供商有速率限制。如果您超过了速率限制,将会出现错误。为了帮助您应对这个问题,LangChain在实例化LLM时提供了“maxConcurrency”选项。该选项允许您指定要发送到LLM提供商的最大并发请求数。如果您超过了这个数字,LangChain会自动将您的请求排队,以在先前的请求完成后发送。

例如,如果您设置maxConcurrency: 5,那么LangChain每次只会向LLM提供商发送5个请求。如果您发送了10个请求,前5个将会立即发送,而后5个将会排队等待。一旦前5个请求中的一个完成了,队列中的下一个请求将会发送。

要使用此功能,只需在实例化LLM时传递maxConcurrency:<数字>。例如:


import { OpenAI } from "langchain/llms/openai";



const model = new OpenAI({ maxConcurrency: 5 });

处理API错误

如果模型提供商从其API返回错误,默认情况下,LangChain将在指数回退上最多重试6次。这样可以实现错误恢复,而无需任何额外的努力。如果您想更改此行为,可以在实例化模型时传递maxRetries选项。例如:


import { OpenAI } from "langchain/llms/openai";



const model = new OpenAI({ maxRetries: 10 });

订阅事件

特别是在使用代理时,在 LLM 处理提示时可能会有大量的后台交互。对于代理,响应对象包含一个 intermediateSteps 对象,您可以打印该对象来查看它所采取的步骤概述。如果这不够用,并且您想要看到与 LLM 的每个交互,您可以将回调传递给 LLM,以进行自定义日志记录(或任何其他您想要执行的操作),当模型通过这些步骤时。

如需了解可用事件的更多信息,请参阅文档中的[回调]部分(/docs/production/callbacks/)。#Callbacks

import { LLMResult } from "langchain/schema";
import { OpenAI } from "langchain/llms/openai";

// We can pass in a list of CallbackHandlers to the LLM constructor to get callbacks for various events.
const model = new OpenAI({
callbacks: [
{
handleLLMStart: async (llm: { name: string }, prompts: string[]) => {
console.log(JSON.stringify(llm, null, 2));
console.log(JSON.stringify(prompts, null, 2));
},
handleLLMEnd: async (output: LLMResult) => {
console.log(JSON.stringify(output, null, 2));
},
handleLLMError: async (err: Error) => {
console.error(err);
},
},
],
});

await model.call(
"What would be a good company name a company that makes colorful socks?"
);
// {
// "name": "openai"
// }
// [
// "What would be a good company name a company that makes colorful socks?"
// ]
// {
// "generations": [
// [
// {
// "text": "\n\nSocktastic Splashes.",
// "generationInfo": {
// "finishReason": "stop",
// "logprobs": null
// }
// }
// ]
// ],
// "llmOutput": {
// "tokenUsage": {
// "completionTokens": 9,
// "promptTokens": 14,
// "totalTokens": 23
// }
// }
// }