LLaVA
新的 LLaVA 模型
LLaVA(大型语言和视觉助手)模型集已更新至1.6支持以下版本的版本:
- 更高的图像分辨率:支持高达4倍的像素,让模型掌握更多的细节。
- 改进的文本识别和推理能力:在额外的文档、图表和图解数据集上进行训练。
- 更宽松的许可证:通过 Apache 2.0 许可证或 LLaMA 2 社区许可证分发。
这些模型有三种参数尺寸。7B、13B 和新的 34B 模型:
- ollama run llava:7b
- ollama run llava:13b
- ollama run llava:34b
用法
命令行界面
要使用带有 的视觉模型 ollama run,请参考.jpg 或.png 使用文件路径的文件:
% ollama run llava "describe this image: ./art.jpg"
The image shows a colorful poster featuring an illustration of a cartoon character with spiky hair. Above the character's head is a crown, suggesting royalty or high status. On the right side of the poster, there is text in French that reads "Jean-Michel Basquiat,"
Python
import ollama
res = ollama.chat(
model="llava",
messages=[
{
'role': 'user',
'content': 'Describe this image:',
'images': ['./art.jpg']
}
]
)
print(res['message']['content'])
JavaScript
import ollama from 'ollama'
const res = await ollama.chat({
model: 'llava',
messages: [{
role: 'user',
content: 'Describe this image:'
images: ['./art.jpg']
}]
})
console.log(res.message.content)
注意:在 Ollama Python和JavaScript库以及REST API中,可以在参数中提供 base64 编码的文件images。有关向视觉模型提供图像的更多示例,请参阅完整的API 文档。
示例
物体检测
% ollama run llava "tell me what do you see in this picture? ./pic.jpg"
In the image, a man wearing blue and white is holding video game controllers in his hands while smiling. His arms are raised above his head as he plays with the virtual reality gaming devices. The man appears to be enjoying himself and immersed in his gaming experience.
文本识别
% ollama run llava "what does the text say? ./wordart.png"
This is an image with a capitalized, bolded text of "ollama." The background is black and the letters are yellow, which create a contrasting appearance. Although the words "ollama" make up the majority of the image, it could also be seen as part of a sign or advertisement due to its brightness and prominent font style.
作者:Jeebiz 创建时间:2024-08-04 01:21
最后编辑:Jeebiz 更新时间:2024-11-21 01:00
最后编辑:Jeebiz 更新时间:2024-11-21 01:00