AquaVLM: A Domain-Specific Vision–Language Model for Structured Understanding of Oceanarium Scenes
Abstract: Vision-Language Models (VLMs) have advanced cross-modal understanding and generation, yet their domain adaptability remains limited. To address the lack of high-quality captions for fish ...
Abstract: Generating Scalable Vector Graphics (SVG) from natural language descriptions poses significant challenges due to the need for precise semantic understanding, structural consistency, and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results