聊聊Spring AI的Multimodality

本文主要研究一下Spring AI的Multimodality

示例

chatModel示例

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage(
	"Explain what do you see in this picture?", // content
	new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource)); // media

ChatResponse response = chatModel.call(new Prompt(this.userMessage));

chatClient示例

String response = ChatClient.create(chatModel).prompt()
		.user(u -> u.text("Explain what do you see on this picture?")
				    .media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
		.call()
		.content();

目前是如下几种模型支持多模态

  • Anthropic Claude 3
  • AWS Bedrock Converse
  • Azure Open AI (e.g. GPT-4o models)
  • Mistral AI (e.g. Mistral Pixtral models)
  • Ollama (e.g. LLaVA, BakLLaVA, Llama3.2 models)
  • OpenAI (e.g. GPT-4 and GPT-4o models)
  • Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)

源码

UserMessage

org/springframework/ai/chat/messages/UserMessage.java

public class UserMessage extends AbstractMessage implements MediaContent {

	protected final List<Media> media;

	public UserMessage(String textContent) {
		this(MessageType.USER, textContent, new ArrayList<>(), Map.of());
	}

	public UserMessage(Resource resource) {
		super(MessageType.USER, resource, Map.of());
		this.media = new ArrayList<>();
	}

	public UserMessage(String textContent, List<Media> media) {
		this(MessageType.USER, textContent, media, Map.of());
	}

	public UserMessage(String textContent, Media... media) {
		this(textContent, Arrays.asList(media));
	}

	public UserMessage(String textContent, Collection<Media> mediaList, Map<String, Object> metadata) {
		this(MessageType.USER, textContent, mediaList, metadata);
	}

	public UserMessage(MessageType messageType, String textContent, Collection<Media> media,
			Map<String, Object> metadata) {
		super(messageType, textContent, metadata);
		Assert.notNull(media, "media data must not be null");
		this.media = new ArrayList<>(media);
	}

	@Override
	public String toString() {
		return "UserMessage{" + "content='" + getText() + '\'' + ", properties=" + this.metadata + ", messageType="
				+ this.messageType + '}';
	}

	@Override
	public List<Media> getMedia() {
		return this.media;
	}

	@Override
	public String getText() {
		return this.textContent;
	}

}

UserMessage实现了MediaContent的getMedia方法

Media

org/springframework/ai/model/Media.java

public class Media {

	private static final String NAME_PREFIX = "media-";

	/**
	 * An Id of the media object, usually defined when the model returns a reference to
	 * media it has been passed.
	 */
	@Nullable
	private String id;

	private final MimeType mimeType;

	private final Object data;

	/**
	 * The name of the media object that can be referenced by the AI model.
	 * <p>
	 * Important security note: This field is vulnerable to prompt injections, as the
	 * model might inadvertently interpret it as instructions. It is recommended to
	 * specify neutral names.
	 *
	 * <p>
	 * The name must only contain:
	 * <ul>
	 * <li>Alphanumeric characters
	 * <li>Whitespace characters (no more than one in a row)
	 * <li>Hyphens
	 * <li>Parentheses
	 * <li>Square brackets
	 * </ul>
	 */
	private String name;

	//......
}	

Media定义了id、mimeType、data、name属性

Format

	public static class Format {

		// -----------------
		// Document formats
		// -----------------
		/**
		 * Public constant mime type for {@code application/pdf}.
		 */
		public static final MimeType DOC_PDF = MimeType.valueOf("application/pdf");

		/**
		 * Public constant mime type for {@code text/csv}.
		 */
		public static final MimeType DOC_CSV = MimeType.valueOf("text/csv");

		/**
		 * Public constant mime type for {@code application/msword}.
		 */
		public static final MimeType DOC_DOC = MimeType.valueOf("application/msword");

		/**
		 * Public constant mime type for
		 * {@code application/vnd.openxmlformats-officedocument.wordprocessingml.document}.
		 */
		public static final MimeType DOC_DOCX = MimeType
			.valueOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document");

		/**
		 * Public constant mime type for {@code application/vnd.ms-excel}.
		 */
		public static final MimeType DOC_XLS = MimeType.valueOf("application/vnd.ms-excel");

		/**
		 * Public constant mime type for
		 * {@code application/vnd.openxmlformats-officedocument.spreadsheetml.sheet}.
		 */
		public static final MimeType DOC_XLSX = MimeType
			.valueOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

		/**
		 * Public constant mime type for {@code text/html}.
		 */
		public static final MimeType DOC_HTML = MimeType.valueOf("text/html");

		/**
		 * Public constant mime type for {@code text/plain}.
		 */
		public static final MimeType DOC_TXT = MimeType.valueOf("text/plain");

		/**
		 * Public constant mime type for {@code text/markdown}.
		 */
		public static final MimeType DOC_MD = MimeType.valueOf("text/markdown");

		// -----------------
		// Video Formats
		// -----------------
		/**
		 * Public constant mime type for {@code video/x-matros}.
		 */
		public static final MimeType VIDEO_MKV = MimeType.valueOf("video/x-matros");

		/**
		 * Public constant mime type for {@code video/quicktime}.
		 */
		public static final MimeType VIDEO_MOV = MimeType.valueOf("video/quicktime");

		/**
		 * Public constant mime type for {@code video/mp4}.
		 */
		public static final MimeType VIDEO_MP4 = MimeType.valueOf("video/mp4");

		/**
		 * Public constant mime type for {@code video/webm}.
		 */
		public static final MimeType VIDEO_WEBM = MimeType.valueOf("video/webm");

		/**
		 * Public constant mime type for {@code video/x-flv}.
		 */
		public static final MimeType VIDEO_FLV = MimeType.valueOf("video/x-flv");

		/**
		 * Public constant mime type for {@code video/mpeg}.
		 */
		public static final MimeType VIDEO_MPEG = MimeType.valueOf("video/mpeg");

		/**
		 * Public constant mime type for {@code video/mpeg}.
		 */
		public static final MimeType VIDEO_MPG = MimeType.valueOf("video/mpeg");

		/**
		 * Public constant mime type for {@code video/x-ms-wmv}.
		 */
		public static final MimeType VIDEO_WMV = MimeType.valueOf("video/x-ms-wmv");

		/**
		 * Public constant mime type for {@code video/3gpp}.
		 */
		public static final MimeType VIDEO_THREE_GP = MimeType.valueOf("video/3gpp");

		// -----------------
		// Image Formats
		// -----------------
		/**
		 * Public constant mime type for {@code image/png}.
		 */
		public static final MimeType IMAGE_PNG = MimeType.valueOf("image/png");

		/**
		 * Public constant mime type for {@code image/jpeg}.
		 */
		public static final MimeType IMAGE_JPEG = MimeType.valueOf("image/jpeg");

		/**
		 * Public constant mime type for {@code image/gif}.
		 */
		public static final MimeType IMAGE_GIF = MimeType.valueOf("image/gif");

		/**
		 * Public constant mime type for {@code image/webp}.
		 */
		public static final MimeType IMAGE_WEBP = MimeType.valueOf("image/webp");

	}

Format定义了常用的几种MimeType

小结

Spring AI设计了各种message类型用于支持多模态,其中UserMessage有个media属性,类型List<Media>,支持传入图像、音频、视频,MimeType用于指定是哪种类型。

doc

  • multimodality

相关文章

如何搭建一个自己的电影网站,免费分享教程

大家好,好久不见了吧。没错我是好久好久好久没冒泡的武玥了"不是学长哦"再次见面给大家一个福利。最近看到有好多博主,好友都在推荐什么影视网站搭建方法相信大家有的应该没有搞懂吧,这里我仔细教一下大家。纯属...

真工程师:20块钱做了张「名片」,可以跑Linux和Python

机器之心报道参与:思源、杜伟、泽南对于一个工程师来说,如何在一张名片上宣告自己的实力?在上面制造一台完整的计算机说不定是个好主意。最近,美国一名嵌入式系统工程师 George Hilliard 的名片...

注意:三大系统所有Flash版本都有漏洞

2015-07-09 05:06:00 作者:徐鹏据外媒VentureBeat报道,Adobe发布的公告显示,Windows、Mac和Linux三大系统平台的全部Flash版本都存在一个安全漏洞,计划...

Firefox 36 beta发布了

2015-01-22 05:26:00 作者:马荣【中关村在线软件资讯】1月22日消息:Mozilla最近将Firefox 36 Beta放上了FTP服务器,这意味着已经可以下载试用了。Firefox...

网站后端开发源代码

成人网站在推动 Web 发展方面发挥的作用是不可否认的。从克服浏览器视频功能的限制到使用 WebSockets 推送广告(以防止广告拦截器拦截广告),您必须不断想出巧妙的方法,才能让自己处于 Web...

如何快速低成本地搭建有一个属于自己的网站

相信很多小伙伴刚接触互联网访问各种绚丽和功能强大的网站时都会好奇这些网站是如何搭建的,自己也想拥有一个这样的网站空间,其实个人网站的搭建是很简单的下面我给大家讲解下。如果搭建一个网站,首先大家可以先了...