Keyphrases
Language Model
100%
Transformer
100%
Video Encoder
100%
Large Language Models
100%
Transformer Block
57%
Encoder
28%
3D Visuals
28%
Large-scale Pre-trained Model
28%
Visual Tokens
28%
Llama
14%
Performance Enhancement
14%
Visual Expertise
14%
2D to 3D
14%
Empirically Supported
14%
Text Data
14%
Recognition Task
14%
Visual Encoding
14%
Action Recognition
14%
Temporal Modeling
14%
Visual Recognition
14%
Relevant Region
14%
Vision-language
14%
Conventional Practice
14%
Information Filtering
14%
Modelling Task
14%
Image-text Retrieval
14%
Visual Question Answering
14%
Computer Vision Tasks
14%
Feature Activation
14%
Motion Forecasting
14%
Semantic Tasks
14%
Multimodal Vision
14%
Point Cloud Classification
14%
Multimodal Task
14%
Computer Science
Language Modeling
100%
Large Language Model
100%
Transformer LLM
50%
Temporal Modeling
16%
Action Recognition
16%
Enhance Performance
16%
Point Cloud
16%
Underlying Mechanism
16%
Visual Question Answering
16%
Computer Vision Task
16%
Visual Encoding
16%
Coder
16%
Information Retrieval
16%
Psychology
Information Filtering
100%
Computer Vision
100%