= Transformers (ONNX) Embeddings

The TransformersEmbeddingClient is an EmbeddingClient implementation that locally computes https://www.sbert.net/examples/applications/computing-embeddings/README.html#sentence-embeddings-with-transformers[sentence embeddings] using a selected https://www.sbert.net/[sentence transformer].

It uses https://www.sbert.net/docs/pretrained_models.html[pre-trained] transformer models, serialized into the https://onnx.ai/[Open Neural Network Exchange (ONNX)] format.

The https://djl.ai/[Deep Java Library] and the Microsoft https://onnxruntime.ai/docs/get-started/with-java.html[ONNX Java Runtime] libraries are applied to run the ONNX models and compute the embeddings in Java.

== Serialize the Tokenizer and the Transformer Model

To run things in Java, we need to serialize the Tokenizer and the Transformer Model into ONNX format.

=== Serialize with optimum-cli

One, quick, way to achieve this, is to use the https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli[optimum-cli] command line tool.

The following snippet prepares a python virtual environment, installs the required packages and serializes (e.g. exports) the specified model using optimum-cli :

[source,bash]

python3 -m venv venv
source ./venv/bin/activate
(venv) pip install –upgrade pip
(venv) pip install optimum onnx onnxruntime

(venv) optimum-cli export onnx –generative sentence-transformers/all-MiniLM-L6-v2 onnx-output-folder

The snippet exports the https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2[sentence-transformers/all-MiniLM-L6-v2] transformer into the onnx-output-folder folder. Later includes the tokenizer.json and model.onnx files used by the embedding client.

In place of the all-MiniLM-L6-v2 you can pick any huggingface transformer identifier or provide direct file path.

== Using the ONNX Transformers models

Add the spring-ai-transformers project to your maven dependencies:

[source,xml]

org.springframework.ai spring-ai-transformers

TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.

then create a new TransformersEmbeddingClient instance and use the setTokenizerResource(tokenizerJsonUri) and setModelResource(modelOnnxUri) methods to set the URIs of the exported tokenizer.json and model.onnx files. (classpath:, file: or https: URI schemas are supported).

If the model is not explicitly set, TransformersEmbeddingClient defaults to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2[sentence-transformers/all-MiniLM-L6-v2]:

[cols=”2*”]
|===
| Dimensions | 384
| Avg. performance | 58.80
| Speed | 14200 sentences/sec
| Size | 80MB
|===

The following snippet illustrates how to use the TransformersEmbeddingClient manually:

[source,java]

TransformersEmbeddingClient embeddingClient = new TransformersEmbeddingClient();

// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json
embeddingClient.setTokenizerResource(“classpath:/onnx/all-MiniLM-L6-v2/tokenizer.json”);

// (optional) defaults to classpath:/onnx/all-MiniLM-L6-v2/model.onnx
embeddingClient.setModelResource(“classpath:/onnx/all-MiniLM-L6-v2/model.onnx”);

// (optional) defaults to ${java.io.tmpdir}/spring-ai-onnx-model
// Only the http/https resources are cached by default.
embeddingClient.setResourceCacheDirectory(“/tmp/onnx-zoo”);

// (optional) Set the tokenizer padding if you see an errors like:
// “ai.onnxruntime.OrtException: Supplied array is ragged, …”
embeddingClient.setTokenizerOptions(Map.of(“padding”, “true”));

embeddingClient.afterPropertiesSet();

List<List> embeddings = embeddingClient.embed(List.of(“Hello world”, “World is big”));


NOTE: that when created manually, you must call the afterPropertiesSet() after setting the properties and before using the client.

The first embed() call downloads the large ONNX model and caches it on the local file system.
Therefore, the first call might take longer than usual.
Use the #setResourceCacheDirectory(<path>) method to set the local folder where the ONNX models as stored.
The default cache folder is ${java.io.tmpdir}/spring-ai-onnx-model.

It is more convenient (and preferred) to create the TransformersEmbeddingClient as a Bean.
Then you don’t have to call the afterPropertiesSet() manually.

[source,java]

@Bean
public EmbeddingClient embeddingClient() {
return new TransformersEmbeddingClient();

}

== Transformers Embedding Spring Boot Starter

You can bootstrap and autowire the TransformersEmbeddingClient with the following Spring Boot starter:

[source,xml]

org.springframework.ai spring-ai-transformers-spring-boot-starter

TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.

To configure it, use the spring.ai.embedding.transformer.* properties.

For example, add this to your application.properties file to configure the client with the https://huggingface.co/intfloat/e5-small-v2[intfloat/e5-small-v2] text embedding model:


spring.ai.embedding.transformer.onnx.modelUri=https://huggingface.co/intfloat/e5-small-v2/resolve/main/model.onnx

spring.ai.embedding.transformer.tokenizer.uri=https://huggingface.co/intfloat/e5-small-v2/raw/main/tokenizer.json

The complete list of supported properties are:

[cols=”3*”]
|===
| Property | Description | Default

| spring.ai.embedding.transformer.enabled | Enable the Transformer Embedding client. | true
| spring.ai.embedding.transformer.tokenizer.uri | URI of a pre-trained HuggingFaceTokenizer created by the ONNX engine (e.g. tokenizer.json). | onnx/all-MiniLM-L6-v2/tokenizer.json
| spring.ai.embedding.transformer.tokenizer.options | HuggingFaceTokenizer options such as ‘addSpecialTokens‘, ‘modelMaxLength‘, ‘truncation‘, ‘padding‘, ‘maxLength‘, ‘stride‘, ‘padToMultipleOf‘. Leave empty to fallback to the defaults. | empty
| spring.ai.embedding.transformer.cache.enabled | Enable remote Resource caching. | true
| spring.ai.embedding.transformer.cache.directory | Directory path to cache remote resources, such as the ONNX models | ${java.io.tmpdir}/spring-ai-onnx-model
| spring.ai.embedding.transformer.onnx.modelUri | Existing, pre-trained ONNX model. | onnx/all-MiniLM-L6-v2/model.onnx
| spring.ai.embedding.transformer.onnx.gpuDeviceId | The GPU device ID to execute on. Only applicable if >= 0. Ignored otherwise. | -1
| spring.ai.embedding.transformer.metadataMode | Specifies what parts of the Documents content and metadata will be used for computing the embeddings. | NONE
|===

NOTE: If you see an error like Caused by: ai.onnxruntime.OrtException: Supplied array is ragged,.., you need to also enable the tokenizer padding in application.properties as follows:


spring.ai.embedding.transformer.tokenizer.options.padding=true

作者:Jeebiz  创建时间:2024-04-05 23:31
最后编辑:Jeebiz  更新时间:2024-07-06 19:00
---- ----