Right, but again I think the emphasis on avoiding 3rd party libraries isn't really relevant to machine learning. The "from scratch" here is avoiding 3rd party implementations of the transformer model, building up from the math on paper and then letting the AD/computation framework do its thing.
One does not exclude the other though. "Avoiding 3rd party implementations of the transformer model" is a subset of "avoiding 3rd party libraries". "From scratch" is, as seen, vague enough for different people to interpret it in different ways. Despite being the minority in this thread, i do not think my interpretation is any less valid - especially since some people have already done such "from scratch" (i.e. in C or C++ with no 3rd party dependencies) implementations already.