Hacker News new | past | comments | ask | show | jobs | submit login

I really wonder if it is the case that the image processing is simply more tokens appended to the sequence. Would make the most sense from an architecture perspective, training must be a whole other ballgame of alchemy though



Probably. Check the kosmos-1 paper from Microsoft that appeared a few days before GPT4 was released: https://arxiv.org/abs/2302.14045




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: