Accelerating Exploration with Unlabeled Prior Data
Paper
• 2311.05067 • Published
• 2
Note Subset of parameter learnable during inference with SSL target. Great idea.
Note Still post-training.
Note Basically wrong, Markov Decision Process requires decision making invariant to history, the consideration of temporally dependent goal that's not encoded in current state itself is falacy.