Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
jerpint
on June 21, 2023
|
parent
|
context
|
favorite
| on:
GPT4 is 8 x 220B params = 1.7T params
I really wonder if it is the case that the image processing is simply more tokens appended to the sequence. Would make the most sense from an architecture perspective, training must be a whole other ballgame of alchemy though
benob
on June 21, 2023
[–]
Probably. Check the kosmos-1 paper from Microsoft that appeared a few days before GPT4 was released:
https://arxiv.org/abs/2302.14045
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: