Hacker News new | past | comments | ask | show | jobs | submit login

You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: