聊聊Spring AI的Multimodality

yumo6666个月前 (05-22)技术文章64

本文主要研究一下Spring AI的Multimodality

示例

chatModel示例

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage(
	"Explain what do you see in this picture?", // content
	new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource)); // media

ChatResponse response = chatModel.call(new Prompt(this.userMessage));

chatClient示例

String response = ChatClient.create(chatModel).prompt()
		.user(u -> u.text("Explain what do you see on this picture?")
				    .media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
		.call()
		.content();

目前是如下几种模型支持多模态

  • Anthropic Claude 3
  • AWS Bedrock Converse
  • Azure Open AI (e.g. GPT-4o models)
  • Mistral AI (e.g. Mistral Pixtral models)
  • Ollama (e.g. LLaVA, BakLLaVA, Llama3.2 models)
  • OpenAI (e.g. GPT-4 and GPT-4o models)
  • Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)

源码

UserMessage

org/springframework/ai/chat/messages/UserMessage.java

public class UserMessage extends AbstractMessage implements MediaContent {

	protected final List<Media> media;

	public UserMessage(String textContent) {
		this(MessageType.USER, textContent, new ArrayList<>(), Map.of());
	}

	public UserMessage(Resource resource) {
		super(MessageType.USER, resource, Map.of());
		this.media = new ArrayList<>();
	}

	public UserMessage(String textContent, List<Media> media) {
		this(MessageType.USER, textContent, media, Map.of());
	}

	public UserMessage(String textContent, Media... media) {
		this(textContent, Arrays.asList(media));
	}

	public UserMessage(String textContent, Collection<Media> mediaList, Map<String, Object> metadata) {
		this(MessageType.USER, textContent, mediaList, metadata);
	}

	public UserMessage(MessageType messageType, String textContent, Collection<Media> media,
			Map<String, Object> metadata) {
		super(messageType, textContent, metadata);
		Assert.notNull(media, "media data must not be null");
		this.media = new ArrayList<>(media);
	}

	@Override
	public String toString() {
		return "UserMessage{" + "content='" + getText() + '\'' + ", properties=" + this.metadata + ", messageType="
				+ this.messageType + '}';
	}

	@Override
	public List<Media> getMedia() {
		return this.media;
	}

	@Override
	public String getText() {
		return this.textContent;
	}

}

UserMessage实现了MediaContent的getMedia方法

Media

org/springframework/ai/model/Media.java

public class Media {

	private static final String NAME_PREFIX = "media-";

	/**
	 * An Id of the media object, usually defined when the model returns a reference to
	 * media it has been passed.
	 */
	@Nullable
	private String id;

	private final MimeType mimeType;

	private final Object data;

	/**
	 * The name of the media object that can be referenced by the AI model.
	 * <p>
	 * Important security note: This field is vulnerable to prompt injections, as the
	 * model might inadvertently interpret it as instructions. It is recommended to
	 * specify neutral names.
	 *
	 * <p>
	 * The name must only contain:
	 * <ul>
	 * <li>Alphanumeric characters
	 * <li>Whitespace characters (no more than one in a row)
	 * <li>Hyphens
	 * <li>Parentheses
	 * <li>Square brackets
	 * </ul>
	 */
	private String name;

	//......
}	

Media定义了id、mimeType、data、name属性

Format

	public static class Format {

		// -----------------
		// Document formats
		// -----------------
		/**
		 * Public constant mime type for {@code application/pdf}.
		 */
		public static final MimeType DOC_PDF = MimeType.valueOf("application/pdf");

		/**
		 * Public constant mime type for {@code text/csv}.
		 */
		public static final MimeType DOC_CSV = MimeType.valueOf("text/csv");

		/**
		 * Public constant mime type for {@code application/msword}.
		 */
		public static final MimeType DOC_DOC = MimeType.valueOf("application/msword");

		/**
		 * Public constant mime type for
		 * {@code application/vnd.openxmlformats-officedocument.wordprocessingml.document}.
		 */
		public static final MimeType DOC_DOCX = MimeType
			.valueOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document");

		/**
		 * Public constant mime type for {@code application/vnd.ms-excel}.
		 */
		public static final MimeType DOC_XLS = MimeType.valueOf("application/vnd.ms-excel");

		/**
		 * Public constant mime type for
		 * {@code application/vnd.openxmlformats-officedocument.spreadsheetml.sheet}.
		 */
		public static final MimeType DOC_XLSX = MimeType
			.valueOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

		/**
		 * Public constant mime type for {@code text/html}.
		 */
		public static final MimeType DOC_HTML = MimeType.valueOf("text/html");

		/**
		 * Public constant mime type for {@code text/plain}.
		 */
		public static final MimeType DOC_TXT = MimeType.valueOf("text/plain");

		/**
		 * Public constant mime type for {@code text/markdown}.
		 */
		public static final MimeType DOC_MD = MimeType.valueOf("text/markdown");

		// -----------------
		// Video Formats
		// -----------------
		/**
		 * Public constant mime type for {@code video/x-matros}.
		 */
		public static final MimeType VIDEO_MKV = MimeType.valueOf("video/x-matros");

		/**
		 * Public constant mime type for {@code video/quicktime}.
		 */
		public static final MimeType VIDEO_MOV = MimeType.valueOf("video/quicktime");

		/**
		 * Public constant mime type for {@code video/mp4}.
		 */
		public static final MimeType VIDEO_MP4 = MimeType.valueOf("video/mp4");

		/**
		 * Public constant mime type for {@code video/webm}.
		 */
		public static final MimeType VIDEO_WEBM = MimeType.valueOf("video/webm");

		/**
		 * Public constant mime type for {@code video/x-flv}.
		 */
		public static final MimeType VIDEO_FLV = MimeType.valueOf("video/x-flv");

		/**
		 * Public constant mime type for {@code video/mpeg}.
		 */
		public static final MimeType VIDEO_MPEG = MimeType.valueOf("video/mpeg");

		/**
		 * Public constant mime type for {@code video/mpeg}.
		 */
		public static final MimeType VIDEO_MPG = MimeType.valueOf("video/mpeg");

		/**
		 * Public constant mime type for {@code video/x-ms-wmv}.
		 */
		public static final MimeType VIDEO_WMV = MimeType.valueOf("video/x-ms-wmv");

		/**
		 * Public constant mime type for {@code video/3gpp}.
		 */
		public static final MimeType VIDEO_THREE_GP = MimeType.valueOf("video/3gpp");

		// -----------------
		// Image Formats
		// -----------------
		/**
		 * Public constant mime type for {@code image/png}.
		 */
		public static final MimeType IMAGE_PNG = MimeType.valueOf("image/png");

		/**
		 * Public constant mime type for {@code image/jpeg}.
		 */
		public static final MimeType IMAGE_JPEG = MimeType.valueOf("image/jpeg");

		/**
		 * Public constant mime type for {@code image/gif}.
		 */
		public static final MimeType IMAGE_GIF = MimeType.valueOf("image/gif");

		/**
		 * Public constant mime type for {@code image/webp}.
		 */
		public static final MimeType IMAGE_WEBP = MimeType.valueOf("image/webp");

	}

Format定义了常用的几种MimeType

小结

Spring AI设计了各种message类型用于支持多模态,其中UserMessage有个media属性,类型List<Media>,支持传入图像、音频、视频,MimeType用于指定是哪种类型。

doc

  • multimodality

相关文章

如何搭建一个自己的电影网站,免费分享教程

大家好,好久不见了吧。没错我是好久好久好久没冒泡的武玥了"不是学长哦"再次见面给大家一个福利。最近看到有好多博主,好友都在推荐什么影视网站搭建方法相信大家有的应该没有搞懂吧,这里我仔细教一下大家。纯属...

SpringMVC流程及源码分析

前言学了一遍SpringMVC以后,想着做一个总结,复习一下。复习写下面的总结的时候才发现,其实自己学得并不彻底、牢固、也没有学全,视频跟书本是要结合起来一起,每一位老师的视频可能提到的东西都不一致,...

ZLG嵌入式笔记 | rootfs镜像制作其实没那么难

在嵌入式Linux开发中,文件系统的打包和镜像制作是关键步骤。本文介绍了Linux核心板文件系统的打包与镜像制作方法,适合嵌入式开发人员快速上手。 前言致远电子Linux核心板提供的系统固件里,除了镜...

SpringMVC 中的参数还能这么传递?涨姿势了

今天来聊一个 JavaWeb 中简单的话题,但是感觉却比较稀罕,因为这个技能点,有的小伙伴们可能没听过!1.缘起说到 Web 请求参数传递,大家能想到哪些参数传递方式?参数可以放在地址栏中,不过地址栏...

火狐浏览器的纯64位优化版,Waterfox 36.0下载

IT之家(www.ithome.com):火狐浏览器的纯64位优化版,Waterfox 36.0下载Waterfox是一个纯64位版的火狐浏览器,它使用Firefox官方源码,专门为64位Window...

牛人优化网站必备的工具推荐

老站长做网站优化,手上必定有一大堆的seo工具和软件,俗话说的好,“工欲善其事必先利其器”。好的辅助工具对优化有事倍功半的作用。做彩票软件源码网站的优化也有将近2个年头了,手上seo软件、工具一大堆...