The model learns by using a bit of text from the data (say, the opening sentence of a Wikipedia posting) and seeking to predict another token while in the sequence. It then compares its output with the actual textual content while in the coaching corpus and adjusts its parameters to accurate any errors.It's not magic, it's math: The outco