Forum Replies Created
-
AuthorPosts
-
frankdrebin
ParticipantI should also note, I’m oversimplifying many of these concepts. The AIs do not have an internal monologue, I’m just using that as a way of demonstration here. They actually represent information in the form of huge matrices. The matrices don’t do anything, unless you prompt them with a question (by multiplying them together). So when people talk about “the AI is going to take over” it’s really quite silly. Oh, are the matrices going to start multiplying themselves? It’s like saying we need to limit how large of numbers your calculator can multiply, because we’re afraid if they get too big it might start multiplying numbers all by itself (see how that make zero sense?).
frankdrebin
ParticipantI forgot to include something about image creation. You can think of it in similar terms via the analogy of “here is a ton of images, I want you to recreate them exactly, but by the way, you can’t store the image itself”. The technical underpinnings of the best image creation models are quite different to something like ChatGPT, but the gist of how we train them is similar. There are ways to merge the text/image divide, and make them work together, and this is a very active area of research.
frankdrebin
ParticipantIt’s totally normal to find it confusing, these models work differently to any other AI-like models that have ever existed before, and in fact researchers themselves still do not have a completely satisfying explanation for the task of “predict the next word” works so well. The best researchers are working on that exact problem right now! So you are in good company.
Another way of thinking of the task of “predict the next word” is it’s kind of like me dumping a bunch of books about WW2 on your desk. And I say “You have unlimited time to study these books, and I’m going to expect you to memorize them and repeat them back to me word for word. The way that I’m going to test you, is, I’m going to give you sentences from the books and you have to complete them correctly. You have to fill in the blanks. However, I’m only going to give you a notepad that’s 10 pages long. So you better come up with a good system for remembering what’s in the books without copying them exactly. You better take really really smart notes. Good luck.”
That’s basically what we’re asking the AI to do. The job is the memorize every piece of information we give it (predict the next word – exactly). But the memory isn’t big enough. So the amazing part of the AI is that it comes up with a system for organizing information internally that is really, really good. And that’s the magic part we don’t understand. How is it coming up with such a good system? Of course at a high level we know how we set it up and and can kind of “see” what might be happening. But why it works so good – mystery. From what we can tell, the systems it comes up with are pretty similar to the way people organize information. It comes up with high level concepts and organizes things along those lines, remembering exact details as best it can, but sometimes just guessing based on context.
So in a way it DOES have a ton of accumulated knowledge, and in fact we’ve almost given them every single piece of written text that’s ever existed (including the entire internet). At this point it’s all about finding new ways to set it up so it learns even better. We’ve literally run out of books to dump on it’s desk.
And this is also why it’s kind of bad at math and logic. You don’t often see math and logic broken down in books as words. Or if you do, it’s quite confusing to follow compared to symbols. It’s confusing for the AI too. It’s like if you had to do 10 digit multiplication just by saying out loud “Okay well six times seven is forty-two, carry the four…”. Jesus, doing one math problem would be an entire essay! And it’s hard to “take notes” on how to do this correctly, you kind of just have to do it and keep track of everything. Which is hard if you’re an AI, and don’t have a way to write anything down except by “talking”.
Another note: AI right now is also strongly limited by the fact that it’s actually never seen, touched, smelled, heard about any of these concepts. It just knows what it read. It’s like the old parable of the elephant and the blind man. If you let the man touch the elephant long enough, and read books in braille about elephants, he can probably convince you he know what an elephant looks like, how big it is, even what color it is…etc. But does he really? More of a philosophical question you might say. But you can see how some the AI errors might stem from this. You ask the AI: Can an elephant fit through the eye of a needle? “Well I guess I’ve read about needles and where those appear, they’re generally also around concepts like clothes, which humans wear, and when I read about Elephants humans ride them, so I guess they’re bigger than humans at least…”. You can see how it would be difficult to live your life like this, having to use all these analogies to get anywhere when it comes to things that we think of as “common sense”. Researchers are working on this bit, giving the models “vision”.
frankdrebin
ParticipantA few things here (from someone who works in the industry and with ChatGPT and ChatGPT-like models daily).
First, GPT 3.5 (the free version) is very very different from GPT 4. GPT 4 solves all kinds of problems that 3.5 can only guess at. If you have $20 to spare, sign up for the Plus membership and give it a shot. I think you might be surprised at the difference in quality/skill of the model. If you post an anonymized version of the problem here I’m happy to check for you.
Second, I know it seems unintuitive, but logic/math problems are actually the worst for this type of model. I don’t think a super satisfying non-technical explanation for why exists, but it basically comes down to the fact that the model is learning about logic entirely through language. It has no built-in “logic circuit” of it’s own. The way these models are trained is “given part of a sentence, predict the next word”. In some cases that next token might be hard to guess just looking at the words in the sentence, maybe it’s a new sentence the model has never seen before? So the model does to some extent learn the meaning of words and how the concepts are related, because it has to, if it wants to guess the next word correctly. You could imagine an example as something like “If I have a feather on a plate above a bed, and turn the plate upside down and right side up again, the feather is on the…”. The correct answer, “bed” is really only obvious if you already know what plates are, what feathers are, what gravity is, etc.
Another thing that can help these types of model is asking them to “reason about the problem step by step”. This is because the models have no “internal monologue”. What you see on the screen is it’s entire “thought process”. So if you ask it to “think out loud” sometimes it will do better. Another step further might be to ask GPT to write a program to figure out the answer! Sometimes it will be able to write a correct program, even though it couldn’t figure out the answer itself.
-
AuthorPosts
