Google has launched a brand new household of PaliGemma vision-language fashions, providing scalable efficiency, lengthy captioning, and assist for specialised duties.
PaliGemma 2 was introduced December 5, practically seven months after the preliminary model launched as the primary vision-language mannequin within the Gemma household. Constructing on Gemma 2, PaliGemma 2 fashions can see, perceive, and work together with visible enter, in keeping with Google.
PaliGemma 2 makes it simpler for builders so as to add more-sophisticated vision-language options to apps, Google stated. It additionally permits more-sophisticated captioning talents, together with figuring out feelings and actions in photos. Scalable efficiency capabilities in PaliGemma 2 imply efficiency will be optimized for any job by way of a number of mannequin sizes (3B, 10B, 28B parameters) and resolutions (224px, 448px, 896px). Lengthy captioning in PaliGemma 2 generates detailed, contextually related captions for photos, going past easy object identification to explain actions, feelings, and the general narrative of the scene, Google stated.