Again they are the exact same concept. Whether vectors represent tiles in a video game, an object in CSS, matrix algebra you took in school, or the semantics of words used by LLMs, in all cases it's the same meaning of the word "transform". It's not specific to language models at all - which was the thesis of your whole argument.
What is meant by "sliding window" or "skip gram" is bigram mapping (or other n-gram).
This is ML 101.
It's the same training methodology and data structure used in my next-token-prediction lib, and is widely used for training for LLMs. Ask your local AI to explain the basics, or see examples like: https://www.kaggle.com/code/hamishdickson/training-and-plott...
> ChatGPT doesn't use parts-of-speech
Yes it does, there's not only a huge business in tagging data (both POS and NER) adjacent to AI, but OpenAI specifically famously used African workers on very low wages to tag a bunch of data. ChatGPT uses text-embedding-ada, you'll have to put 2 and 2 together as they don't open source that part.
Mistral says:
"The preprocessing stage of Text-Embedding-ADA-002 involves applying POS tags to the input text using a separate POS tagger like Spacy or Stanford NLP. These POS tags can be useful for segmenting sentences into individual words or tokens."
>It's not specific to language models at all - which was the thesis of your whole argument.
I didn't say that it's unique to LMs. My argument is that saying "my LM is a transformer" is misleading because "transformer" in the context of LMs means a very specific architecture. You're deliberately misusing terms, probably to draw attention to your project.
>OpenAI specifically famously used African workers on very low wages to tag a bunch of data
Did they tag Polish parts of speech too? Or Ancient Greek? ChatGPT constructs grammatically correct Ancient Greek. I thought they tagged "harmful/non-harmful", not parts of speech?
>ChatGPT uses text-embedding-ada
[Citation needed]
NanoGPT, for example, learns embeddings together with the rest of the network so, as I said, manual tagging is not required.
Anyway, looking forward to hearing news about your image generation project. Any news?
Nobody denied the term is used in language models, I only pointed out that they use that term because of what it already means in the context of vector operations (long before OpenAI).
The wikipedia on deep learning transformers:
All transformers have the same primary components:
- Tokenizers, which convert text into tokens.
- Embedding layer, which converts tokens and positions of the tokens into vector representations.
- Transformer layers, which carry out repeated transformations on the vector representations, extracting more and more linguistic information. These consist of alternating attention and feedforward layers. There are two major types of transformer layers: encoder layers and decoder layers, with further variants.
- Un-embedding layer, which converts the final vector representations back to a probability distribution over the tokens.
Where does it say bigrams can't be used for next-token prediction? Or that you can't tag data? Note "...which converts tokens and positions of the tokens..."
> You're deliberately misusing terms, probably to draw attention to your project.
Haha well since I have like 30 followers and the npm is free/MIT whatever scheme you think I'm up to it's not working. Anyway a text autocomplete library is not exactly viral material. Jokes aside, no I am trying to use accurate terms that make sense for the project.
Could just make it anonymous - `export default () => {}` - and call the file `model.js`. What would you call it?
> Did they tag Polish parts of speech too? Or Ancient Greek?
Yes, all the foreign words with special characters were tokenized and trained on. An LLM doesn't "know any language". If it never trained on any Polish word sequences it would not be able to output very good Polish sequences anymore that it could output good JavaScript. It's not that has to train on Polish to translate Polish per se, but it does has to have the language coverage at the token level to be able to perform such vector transformations - which is probably most easily accomplished by training on Polish-specific data.
> The model was initialised from AUEB NLP Group's Greek BERT and subsequently trained on monolingual data from the First1KGreek Project, Perseus Digital Library, PROIEL Treebank and Gorman's Treebank
First1KGreek Project
> The goal of this project is to collect at least one edition of every Greek work composed between Homer and 250CE
> The new model, text-embedding-ada-002, replaces five separate models for text search, text similarity, and code search, and outperforms our previous most capable model, Davinci, at most tasks, while being priced 99.8% lower.
> Anyway, looking forward to hearing news about your image generation project. Any news?
Not yet! Feel free to follow on GitHub or even help out if you're really interested in it. Would be cool to have pixel prediction as snappy as text autocomplete.
> it's not required to. I've run lots of models
Then you must know about skip-gram and how embeddings are trained: https://medium.com/@corymaklin/word2vec-skip-gram-904775613b...
What is meant by "sliding window" or "skip gram" is bigram mapping (or other n-gram).
This is ML 101.
It's the same training methodology and data structure used in my next-token-prediction lib, and is widely used for training for LLMs. Ask your local AI to explain the basics, or see examples like: https://www.kaggle.com/code/hamishdickson/training-and-plott...
> ChatGPT doesn't use parts-of-speech
Yes it does, there's not only a huge business in tagging data (both POS and NER) adjacent to AI, but OpenAI specifically famously used African workers on very low wages to tag a bunch of data. ChatGPT uses text-embedding-ada, you'll have to put 2 and 2 together as they don't open source that part.
Mistral says:
"The preprocessing stage of Text-Embedding-ADA-002 involves applying POS tags to the input text using a separate POS tagger like Spacy or Stanford NLP. These POS tags can be useful for segmenting sentences into individual words or tokens."
> I use Claude to make new languages
Cool story, has nothing to do with the topic