Hacker News new | past | comments | ask | show | jobs | submit login

From that paper it seems the sampling method (SAP) is also slower, so that it beats larger models seems expected.





It's not at all expected. T5 models are not generative models by default and they were not thought to be able to perform generation, let alone in-context learning. Remember these models were released before any of the existing LLMs and in-context learning/prompting as a technique became popularized with GPT-3.

While the technique requires multiple samples to coax generations from this particular model, other LLM training schemes have incorporated both unidirectional and bidirectional objectives in their training now. However, this exploration hasn't been fully resolved as most models are still trained only on the causal objective by standard practice. There's still a a lot of exploration that can be done on pre-training objectives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: