Abstract: Vision-Language Models (VLMs) have advanced cross-modal understanding and generation, yet their domain adaptability remains limited. To address the lack of high-quality captions for fish ...
Abstract: Generating Scalable Vector Graphics (SVG) from natural language descriptions poses significant challenges due to the need for precise semantic understanding, structural consistency, and ...