I really wonder if it is the case that the image processing is simply more token... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

jerpint on June 21, 2023 | parent | context | favorite | on: GPT4 is 8 x 220B params = 1.7T params

I really wonder if it is the case that the image processing is simply more tokens appended to the sequence. Would make the most sense from an architecture perspective, training must be a whole other ballgame of alchemy though

benob on June 21, 2023 [–]

Probably. Check the kosmos-1 paper from Microsoft that appeared a few days before GPT4 was released: https://arxiv.org/abs/2302.14045

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact