It’s totally normal to find it confusing, these models work differently to any other AI-like models that have ever existed before, and in fact researchers themselves still do not have a completely satisfying explanation for the task of “predict the next word” works so well. The best researchers are working on that exact problem right now! So you are in good company.
Another way of thinking of the task of “predict the next word” is it’s kind of like me dumping a bunch of books about WW2 on your desk. And I say “You have unlimited time to study these books, and I’m going to expect you to memorize them and repeat them back to me word for word. The way that I’m going to test you, is, I’m going to give you sentences from the books and you have to complete them correctly. You have to fill in the blanks. However, I’m only going to give you a notepad that’s 10 pages long. So you better come up with a good system for remembering what’s in the books without copying them exactly. You better take really really smart notes. Good luck.”
That’s basically what we’re asking the AI to do. The job is the memorize every piece of information we give it (predict the next word – exactly). But the memory isn’t big enough. So the amazing part of the AI is that it comes up with a system for organizing information internally that is really, really good. And that’s the magic part we don’t understand. How is it coming up with such a good system? Of course at a high level we know how we set it up and and can kind of “see” what might be happening. But why it works so good – mystery. From what we can tell, the systems it comes up with are pretty similar to the way people organize information. It comes up with high level concepts and organizes things along those lines, remembering exact details as best it can, but sometimes just guessing based on context.
So in a way it DOES have a ton of accumulated knowledge, and in fact we’ve almost given them every single piece of written text that’s ever existed (including the entire internet). At this point it’s all about finding new ways to set it up so it learns even better. We’ve literally run out of books to dump on it’s desk.
And this is also why it’s kind of bad at math and logic. You don’t often see math and logic broken down in books as words. Or if you do, it’s quite confusing to follow compared to symbols. It’s confusing for the AI too. It’s like if you had to do 10 digit multiplication just by saying out loud “Okay well six times seven is forty-two, carry the four…”. Jesus, doing one math problem would be an entire essay! And it’s hard to “take notes” on how to do this correctly, you kind of just have to do it and keep track of everything. Which is hard if you’re an AI, and don’t have a way to write anything down except by “talking”.
Another note: AI right now is also strongly limited by the fact that it’s actually never seen, touched, smelled, heard about any of these concepts. It just knows what it read. It’s like the old parable of the elephant and the blind man. If you let the man touch the elephant long enough, and read books in braille about elephants, he can probably convince you he know what an elephant looks like, how big it is, even what color it is…etc. But does he really? More of a philosophical question you might say. But you can see how some the AI errors might stem from this. You ask the AI: Can an elephant fit through the eye of a needle? “Well I guess I’ve read about needles and where those appear, they’re generally also around concepts like clothes, which humans wear, and when I read about Elephants humans ride them, so I guess they’re bigger than humans at least…”. You can see how it would be difficult to live your life like this, having to use all these analogies to get anywhere when it comes to things that we think of as “common sense”. Researchers are working on this bit, giving the models “vision”.