OpenSearch（开源） - Powered by MinDoc

OpenSearch

本节将引导您完成 OpenSearchVectorStore 的设置，用于存储文档嵌入并执行相似性搜索。

OpenSearch 是一款源自 Elasticsearch 分支的开源搜索与分析引擎，依据 Apache 2.0 许可证分发。它通过简化 AI 生成资产的集成与管理，推动了 AI 应用开发的发展。OpenSearch 支持向量、词汇及混合搜索功能，借助先进的向量数据库技术，实现低延迟查询与相似性搜索，具体细节可参阅向量数据库页面。

OpenSearch k-NN 功能使用户能够从大型数据集中查询向量嵌入。嵌入是对数据对象（如文本、图像、音频或文档）的数值表示。嵌入可以存储在索引中，并使用各种相似度函数进行查询。

先决条件

一个正在运行的 OpenSearch 实例。以下是可用的选项：
- 自托管 OpenSearch
- 亚马逊 OpenSearch 服务
如需要，可为 EmbeddingModel 提供一个 API 密钥，用于生成由 OpenSearchVectorStore 存储的嵌入向量。

自动配置

Spring AI 为 OpenSearch 向量存储提供了 Spring Boot 自动配置。要启用此功能，请将以下依赖项添加到项目的 Maven pom.xml 文件中：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-opensearch</artifactId>
</dependency>

或将其添加到您的 Gradle build.gradle 构建文件中：

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-vector-store-opensearch'
}

无论是自托管还是亚马逊 OpenSearch 服务，均使用相同的依赖项。请参考依赖管理部分，将 Spring AI BOM 添加到您的构建文件中。

此外，您还需要一个配置好的 EmbeddingModel bean。更多信息请参阅 EmbeddingModel 部分。

现在，您可以在应用程序中自动装配 OpenSearchVectorStore 作为向量存储：

@Autowired VectorStore vectorStore;

// ...

List<Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to OpenSearch
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = vectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());

配置属性

要连接到 OpenSearch 并使用 OpenSearchVectorStore，您需要提供实例的访问详情。一个简单的配置可以通过 Spring Boot 的 application.yml文件提供：

spring:
  ai:
    vectorstore:
      opensearch:
        uris: <opensearch instance URIs>
        username: <opensearch username>
        password: <opensearch password>
        index-name: spring-ai-document-index
        initialize-schema: true
        similarity-function: cosinesimil
        read-timeout: <time to wait for response>
        connect-timeout: <time to wait until connection established>
        path-prefix: <custom path prefix>
        ssl-bundle: <name of SSL bundle>
        aws:  # Only for Amazon OpenSearch Service
          host: <aws opensearch host>
          service-name: <aws service name>
          access-key: <aws access key>
          secret-key: <aws secret key>
          region: <aws region>

以 spring.ai.vectorstore.opensearch.*开头的属性用于配置 OpenSearchVectorStore：

属性 (Property)	描述 (Description)	默认值 (Default Value)
`spring.ai.vectorstore.opensearch.uris`	OpenSearch 集群端点的 URI 列表	-
`spring.ai.vectorstore.opensearch.username`	访问 OpenSearch 集群的用户名	-
`spring.ai.vectorstore.opensearch.password`	对应用户名的密码	-
`spring.ai.vectorstore.opensearch.index-name`	存储向量的索引名称	`spring-ai-document-index`
`spring.ai.vectorstore.opensearch.initialize-schema`	是否初始化所需的模式 (Schema)	`false`
`spring.ai.vectorstore.opensearch.similarity-function`	使用的相似度函数	`cosinesimil` (余弦相似度)
`spring.ai.vectorstore.opensearch.read-timeout`	等待对端端点响应的超时时间（0 表示无限等待）	-
`spring.ai.vectorstore.opensearch.connect-timeout`	等待建立连接的超时时间（0 表示无限等待）	-
`spring.ai.vectorstore.opensearch.path-prefix`	OpenSearch API 端点的路径前缀（适用于 OpenSearch 位于反向代理后且路径非根路径的情况）	-
`spring.ai.vectorstore.opensearch.ssl-bundle`	用于 SSL 连接的 SSL Bundle 名称	-
`spring.ai.vectorstore.opensearch.aws.host`	OpenSearch 实例的主机名	-
`spring.ai.vectorstore.opensearch.aws.service-name`	AWS 服务名称	-
`spring.ai.vectorstore.opensearch.aws.access-key`	AWS 访问密钥 (Access Key)	-
`spring.ai.vectorstore.opensearch.aws.secret-key`	AWS 秘密密钥 (Secret Key)	-
`spring.ai.vectorstore.opensearch.aws.region`	AWS 区域	-

您可以通过设置 spring.ai.vectorstore.opensearch.aws.enabled 属性来控制是否启用 AWS 特定的 OpenSearch 自动配置。

若将此属性设为 false，即使类路径上存在 AWS SDK 类，非 AWS 的 OpenSearch 配置也会被激活。这使得在 AWS SDK 用于其他服务的环境中，您能够使用自托管或第三方的 OpenSearch 集群。
如果不存在 AWS SDK 类，则始终使用非 AWS 配置。
如果存在 AWS SDK 类且未设置该属性或将其设置为 true，默认情况下将使用 AWS 特定的配置。
这种回退逻辑确保用户能够明确控制 OpenSearch 集成的类型，防止在不需要时意外激活 AWS 特定逻辑。

以下相似度函数可供使用：

cosinesimil - 默认选项，适用于大多数场景。用于衡量向量间的余弦相似度。
l1 - 向量间的曼哈顿距离。
l2 - 向量间的欧几里得度量。
linf - 向量间的切比雪夫距离。

手动配置

您可以选择手动配置 OpenSearch 向量存储，而非依赖 Spring Boot 的自动配置。为此，需将 spring-ai-opensearch-store 添加到您的项目中：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-opensearch-store</artifactId>
</dependency>

或到您的 Gradle 构建文件 build.gradle 中：

dependencies {
    implementation 'org.springframework.ai:spring-ai-opensearch-store'
}

请参阅依赖管理部分，将 Spring AI BOM 添加到您的构建文件中。

创建 OpenSearch 客户端 Bean：

@Bean
public OpenSearchClient openSearchClient() {
    RestClient restClient = RestClient.builder(
        HttpHost.create("http://localhost:9200"))
        .build();

    return new OpenSearchClient(new RestClientTransport(
        restClient, new JacksonJsonpMapper()));
}

随后，使用构建器模式创建 OpenSearchVectorStore bean：

@Bean
public VectorStore vectorStore(OpenSearchClient openSearchClient, EmbeddingModel embeddingModel) {
    return OpenSearchVectorStore.builder(openSearchClient, embeddingModel)
        .index("custom-index")                // Optional: defaults to "spring-ai-document-index"
        .similarityFunction("l2")             // Optional: defaults to "cosinesimil"
        .initializeSchema(true)               // Optional: defaults to false
        .batchingStrategy(new TokenCountBatchingStrategy()) // Optional: defaults to TokenCountBatchingStrategy
        .build();
}

// This can be any EmbeddingModel implementation
@Bean
public EmbeddingModel embeddingModel() {
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}

元数据过滤

您可以同样利用 OpenSearch 中的通用、便携式元数据过滤器。

例如，您可以使用文本表达式语言：

vectorStore.similaritySearch(
    SearchRequest.builder()
        .query("The World")
        .topK(TOP_K)
        .similarityThreshold(SIMILARITY_THRESHOLD)
        .filterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'").build());

或者通过编程方式使用 Filter.Expression DSL：

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.builder()
    .query("The World")
    .topK(TOP_K)
    .similarityThreshold(SIMILARITY_THRESHOLD)
    .filterExpression(b.and(
        b.in("author", "john", "jill"),
        b.eq("article_type", "blog")).build()).build());

这些（便携式）过滤表达式会自动转换为专有的 OpenSearch 查询字符串查询。
例如，这个便携式过滤表达式：

author in ['john', 'jill'] && 'article_type' == 'blog'

被转换为专有的 OpenSearch 过滤器格式：

(metadata.author:john OR jill) AND metadata.article_type:blog

访问原生客户端

OpenSearch 向量存储实现通过 getNativeClient () 方法提供了对底层原生 OpenSearch 客户端（OpenSearchClient）的访问：

OpenSearchVectorStore vectorStore = context.getBean(OpenSearchVectorStore.class);
Optional<OpenSearchClient> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    OpenSearchClient client = nativeClient.get();
    // Use the native client for OpenSearch-specific operations
}

原生客户端让您能够访问可能未通过 VectorStore 接口公开的 OpenSearch 特定功能和操作。

作者：Ddd4j 创建时间：2025-09-08 22:25
最后编辑：Ddd4j 更新时间：2026-04-23 14:31

上一篇： Neo4j（开源）
下一篇： Oracle