Show, Think, and Tell: Thought-Augmented Fine-Tuning of Large Language Models for Video Captioning

Show, Think, and Tell: Thought-Augmented Fine-Tuning of Large Language Models for Video Captioning | IEEE Conference Publication | IEEE Xplore