Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples | IEEE Conference Publication | IEEE Xplore