In-Context Learning Demystified
Published:
π TL;DR: the next-token prediction of a transformer block taking some context and query as input is equivalent to the output of the same transformer with weights updated by the context and with only the query as input.