<?xml version="1.0" encoding='utf-8'?>
<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" "http://www.wapforum.org/DTD/wml_1.1.xml">
<wml>
<card id="card1" title="Vision-language model - Page 1 - Wikipedia">
<p>
<a accesskey="3" href="page.php?w=vision-language_model&amp;p=2">3.Next</a>
</p>
<p>A <b>vision-language model</b> (<b>VLM</b>) is a type of artificial intelligence system that can jointly interpret and generate information from both images and text, extending the capabilities of <a href="page.php?w=large_language_model">large language model</a>s (LLMs), which are limited to text.  It is an example of <a href="page.php?w=multimodal_learning">multimodal learning</a>.</p>

<p>Many widely used commercial applications now rely on this ability.  <a href="page.php?w=OpenAI">OpenAI</a> introduced <a href="page.php?w=computer_vision">computer vision</a></p><p>
<a accesskey="3" href="page.php?w=vision-language_model&amp;p=2">3.Next</a>
</p>

<do type="prev" label="Search">
        <go href="search.wml"/>
</do>

</card>
</wml>
