Sure, I could see a lot of medical professions and other "knowledge bank" type jobs being replaced. I've always thought optometrists could largely be replaced with "measure my prescription" booths controlled by a computer. But anything requiring any creative juice whatsoever will likely not be replaced.
*Yes, it's not true AGI but AI that replaces 95% of all jobs
A lot can happen in 50-70 years.
We now have 7 billion people, with more of the world coming online to do R&D. China, specifically, has made it a goal to lead the world in AI by 2025.
India’s economy should grow over the next 2 decades and they will also become a world leader.
With so many resources, the world should easily advance more in the next 50 than it did in the last 100 years.
We've lived through enormous computing advances, but it's been fairly obvious for some time that hardware improvements are slowing.
I'm sure there will be amazing advancements in the next 50 years, but I expect a lot of the progress to be in fields that are either currently unknown, or seem unimportant today. Those new fields will see better return on investment.
I disagree with that part. Things slowed dramatically because we stopped putting extreme amounts of effort in, largely because there wasn't enough market demand for aviation or space flight beyond 1970s technology... for a while.
Now we have countless companies and several countries all competing and building off each other's work in the AI space. So long as people don't get bored or the global economy doesn't collapse, progress should keep chugging along. Another key difference is that it's basically free to get into. Anyone out there can download data sets, existing algorithms, and get to work on tweaking things. There's a strong foundation for any motivated person to build off of.
Overcoming physical limitations is one thing. An intelligent being creating something equally or more as intelligent as itself? And obviously I don't mean reproducing. Creating an entirely new class of thing which has intelligence equal to the creator is very different to using the forces of nature to give you an edge over gravity.
We will figure out how the brain works, make better transistors, develop better algorithms, etc
An AI “space race” between the US and China, for instance, will push the field forward over the next decade.
https://www.nasa.gov/audience/formedia/speeches/fg_kitty_hawk_12.17.03.html
Conflating them only demonstrates how far we have to go.
As for impossible talk, we have biological examples all around us of what needs to be built. We just need to imitate. Much like computer vision, algorithms sucked at it until they didn't (and all it took was someone scaling up an old design idea plus a lot of data). On the scale of gigantic ambitious goals it's pretty special in that regard. Curing cancer or death or mars colonies may indeed be impossible, by contrast.
I will agree that I trust no one's ability to predict anything. They are all just making almost-entirely-uneducated guess using a few variables out of some vast number of unknowns.
Why stick with this concept - consciousness - which is not well defined, instead of using a much more practical concept: embodiment. Embodiment is the missing link towards human level AI. Agents embodied in a world, like AlphaGO, already surpass humans (on the Go board or Dota 2), we just need to take that ability to real world. The source of meaning is in the game, not the brain. What we need is a better simulator of the world, or a neural technique for imagination in RL, which is under works [1].
> We do know how to quantify intelligence (..).
But how exactly do we do that?
And since then, not much has changed. Commercial supersonic flight never took off, and nowadays planes still use turbofans (invented during WWII). Engineering fields commonly make many breakthrough in a really short time, and then settle down for a long period. We can't predict how far IA progress will go. In the 50s, having flying cars by 2000 didn't sound unrealistic given how much flight advanced during the first half of the century. Yet, I don't think anyone nowadays believes we'll have them by 2100.
Also, between Da Vinci's Codex of the flight of birds (1502) and the Wright brothers flight, there has been four centuries. And regarding AGI, we might be closer to Da Vinci then to the Wright.
70 years from the first flight to the concorde and the saturn 5. But in the 50 years since, improvements in aerospace have been incremental.
In 75 years we went from ENIAC to TFLOPS in a laptop. But looks like that breakneck pace is slowing down sharply. We've been doing AI nearly as long, and have gone from say, Eliza to GPT-3. A huge advance, but not AGI.
A lot can happen in 50 years, but we've already had our first 70ish years with AI without an AGI breakthrough.
To the definition of AGI in the link, maybe a hundred million data scientists can hone a million models, one per "economically viable" task, and start chipping away at the 95% of the economy target, but till now I'd wager AI has put many more people to work than out of it.
It just goes to show that technological advancement can happen rather unpredictably.
OP specifically called out AGI as not requiring touch or taste, only text to beat the turing test.
> Programmers, also unlikely...Sure, you can have AIs generate more code for you, but then you'll just have the programmers working one abstraction layer up from that.
At what point do you stop calling them programmers and start calling them system architects? If I'm a programmer and my whole job can be replaced, isn't that replacing _some_ programmers? I think it's fair to argue that some programming jobs would be straight up gone. Maybe most of them.
> Sure, I could see a lot of medical professions and other "knowledge bank" type jobs being replaced. I've always thought optometrists could largely be replaced with "measure my prescription" booths controlled by a computer. But anything requiring any creative juice whatsoever will likely not be replaced.
We're not talking about what today's AI can do -- today's AI sure as hell can replace knowledge banks and some medical tasks like radiology and optometry, and yeah it can't quite make blockbuster movies. But generative AI has come a long way and there's reasons to be optimistic again. Alex cites GPT-3 and iGPT as evidence of this trajectory.
He also says "imagine 2 orders of magnitude bigger" -- models with 1.75e13 params. What emergent generative powers might we discover? Synthesizing a blockbuster movie no longer seems entirely out of reach, even if we have to move another magnitude bigger and make several more algorithmic breakthroughs.
Erm, only text in the turing test? That's a pretty non-general form of artificial general intelligence.
But a text-based AGI is not replacing plumbers and electricians, which means it's only general in limited areas, like generating human-level text. It would be impressive and not doubt have plenty of uses, but it's not a threat to paper clip the world or put everyone out of a job.
For the entertainment of the creator. Maybe you are right that AGI will be entertained by watching humans turn into vegetables as they aimlessly watch a screen for hours at a time, but I suspect not.
Maybe let's hope that a hypothetical AGI finds us "cute".
AGI might have reason to keep us alive, sure, but why create media for us? Given our current trajectory, AGI will be very energy hungry. How will it justify using that energy for the sake of human entertainment?
I understand the hypothetical dangers of an AGI with the "wrong" reward function where "wrong" includes "like humans in terms of species-tribalism and intelligence-smugness", but I don't actually see a media generator AI necessarily having human-esque identity that you're suggesting.
That said, I would take the flip side of this bet to be settled at the end of the 50 years, for at least 90% chance of AIs able to create hollywood-esque movies within the next 50 years, though possibly not for AI plumbers in that time frame. For that matter, I would put at least 10% on a movie with an AI-generated script having global box office numbers topping $1B by the end of this decade.
https://en.wikipedia.org/wiki/Autorefractor
However optometrists do a lot more than just write prescriptions for corrective lenses.
It would be nice if this was at least an option for people who might not have vision insurance or for whatever reason don't want to proceed through the traditional system
and
> Yes, it's not true AGI but AI that replaces 95% of all jobs
To be fair, being able to create an enjoyable full length hollywood-esque movie or becoming an author that can write a (creative and entertaining) book is something well under 5% of humans are capable of doing.
Perhaps you're setting the bar for AGI too high there? Does it really need to excel and exceed the capabilities of the very best of human attempts at movie making and book writing to be considered "AGI", when the vast majority of humanity can not do that?
(Also, I suspect 95% of jobs probably actively discourage "creative juice" being used. And nobody really wants to found their startup with someone who's "just an ideas guy!", if all he's ever contributing is "creative juice".)
I think there's something very insightful in your post though, and that is the observation that programmers will just work one abstraction layer up. In general, it has been demonstrated that a combination of expert + AI is far more effective than either one on their own. I can see AI becoming an indispensable tool in the tool belt of an expert, and since we want the best possible outcomes, we're not going to throw the expert out of that equation any time soon. What we may see is the need for fewer experts to get the job done, as the automation capabilities of AI allow them to become more efficient. Just like the power loom, it certainly reduced the number of humans needed, but even today, you still need some people to service the machines and to program their patterns.
I'd like to take that bet. I'd even speed up the timeline a bit, especially when it comes to the book.
Let's say, if there's no completely AI generated (so no human editing) book on the new york times best seller list #1 for at least a duration of 2 weeks, before 2040, I'll be very surprised.
https://www.lesswrong.com/posts/hQysqfSEzciRazx8k/forecasting-thread-ai-timelines
I doubt that "I'll even settle for a book that can rival human authors." states a sharp edged criterion that separates before from after. Crossing the gap between rivaling Barbara Cartland and rivaling Tolstoy mind take a century of software development.
You cannot make a bet on a statement of probability. It's unfalsifiable (unless, within the next 50 years, someone finds a way to take a random sample of the multiverse).
I'm really pessimistic about the next 30 years, but really optimistic about the next 5 after that.
For example, I would put forth that for a given problem, there is a lower limit to how simple you can make divided pieces of that problem. You can't compute the trajectory of a thrown baseball using only a single transistor. Granted most problems can be divided into incredibly simple steps. The question we face is "is AGI reductive enough for human beings to create it?" Is the minimum work-unit for planning and constructing an AGI small enough to be understood by a human?
That's of course putting aside the problem of scale. The neocortex alone has 160 trillion synapses, each of which exhibits behavior far more complex than a single transistor. You could argue that for many commercially-viable tasks we've found much better ways than nature, and that's true, but AGI is a different game entirely. Our current AI methodologies may be as unrelated to AGI as a spear is to a nuclear missile despite them both performing the same basic function.
I’d say the fact that we start life as a single cell with - at maximum - only a couple of gigabytes of information packed in our DNA and yet turn out to have profound intelligence is strong evidence that we might be able to design a system that - through some combination of obscenely efficient simple-ish learning algorithms - might be able to bootstrap its own intelligence in a similar fashion to how humans do.
This isn’t necessarily a call for research to study how human embryos turn into thinking, feeling people, but more a loose upper bound on the complexity of the initial size of a model that could become AGI.
I wouldn't be so sure about that. You can catch a ball by keeping a constant angle. That seems like something a one-parameter model or PID may be able to do. Dragonflies catch evading prey using just 16 neurons: https://www.pnas.org/content/110/2/696.full You may need a lot of neurons to learn something, but the found solutions may be quite small... Something to think about.
What exactly do you think a human brain is?
The idea of neural network in its current form came from someone in 1943 going through actual neuroscience papers and translating them to mathematical model by abstracting away the noise. While brilliant, Mccullough and Pitts work was done in pre-historic times if you consider the amount of information we learned about the brain and cognition since.
Saying "We don't know much about human brain" is just being lazy. We know too much about the brain! Brain research has collected so much empirical data and a good amount of good theories. For an example of a good theory at various levels of abstraction see Moser, O'Keefe and Nadel work on place cells and grid cells, Kandel on learning, Edmund Rolls on vision, Olshausen & Field on sparse coding, Kanerva' SDM, Plate' convolutions, Widdows on geometry and meaning, Wickelgren and J. R. Anderson on associative memory, Fukushima' neocognitron, Hofstadter on analogical reasoning, Quillian on semantic memory, Pribram holonomic theory, Valiant' neuroids, Pearl' causality. More of these "bridges" and a meta-bridge is needed if you're serious about AI.
5 + 5 = _
Predict what's next.
The most accurate way to do this is to understand how arithmetic actually works.
Sure, if you've only seen a handful of examples (or this exact example) you may have memorized 10 comes next or you may know a number comes next, but not which number.
If you've seen enormous amounts of this you may deduce the underlying rules in order to more accurately predict what comes next.
There's evidence of this happening with gpt-3.
The scaling hypothesis may be stronger than people suspect.
I'm not an AI researcher so I only have a lay-person's opinion from reading about this kind of thing via things like that blog post (I also don't have OpenAI access to play with the API myself), but if that is an accurate statement it seems like pretty good evidence of figuring out the underlying mathematical rules.
I think this could be a path towards scaling being more effective than people think.
I sense though that you're just arguing from a position of motivated reasoning based on a pre-existing conclusion rather than actually trying to look at new things that may contradict what you already believe to be true. Repeated one sentence responses arguing that none of this is evidence just isn't worth time to respond to.
In the end we'll see what ends up happening.
I linked to the blog because that's where I first read about this and it's more immediately accessible.
[Edit: Paper Link, https://arxiv.org/abs/2005.14165]
In fact after reading these sections of the actual paper, it's hard to believe that you could have read it yourself and taken away the idea that it was memorization? Pages 22-23 in particular. Part of the reason I tend to link to good blog posts instead of academic papers is when you link to papers nobody reads them. Often people linking them haven't read them either (I've only read small parts).
Again there's a simpler explanation: I read the paper and I drew different conclusions than you or your primary source. I give a very long explanation of why I drew those conclusions, above.
To clarify, I discussed the exact same subject a couple of weeks ago, I think, after reading the GPT-3 paper. I made my original comment in this thread without re-reading the paper, so I misremembered what was in it. Then I re-read the relevant section (3.9.1. Arithmetic) to refresh my memory and made the very long comment above. That's in the intereste of full disclosure and so you don't have any reason to assume I didn't read the paper, which you shouldn't anyway.
Pages: 21-23, 63
> How many "x + y" questions can be formulated where x and y are both single-digit numbers? The answer is 2^10, or 100.
Less snarkily, if there’s (10^4)^2 = 100 million combinations of 4 digit addition problems, and GPT-3 is reaching 25.5% accuracy on those problems (vs. 0.4% in the 13B parameter model). For 3 digit problems, it’s even better: 1 million combinations and 80.4% accuracy. Clearly, there is more happening than simple memorization—the training set does not contain 800k 3-digit addition problems. Thus, it’s fair to say that the model has at least a partial grasp of how to perform arithmetic operations (but probably not fair to say that it has synthesized the entire system of arithmetic).
Also, the paper does say that they scrubbed exact examples from the training set to avoid memorization, a fact you left out:
> (pg. 23): ”To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms "<NUM1> + <NUM2> =" and "<NUM1> plus <NUM2>". Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized. In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.”
This is super interesting and something I hadn't read before. That is very cool, and definitely suggests it's figuring out how the computation actually works (!).
The nature of the failure (errors where it 'forgot' to carry the one) suggest that it's doing something like basic arithmetic and making mistakes.
This is evidence in the direction of having a model of how to do basic arithmetic and evidence against memorization.
I'm not pretending that both outcomes mean it knows arithmetic. For example, if the outputs were random or if they only matched exact examples it had seen then it would look like memorization, but that isn't what's seen.
In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather thanmemorizing a table.
So, what is "often"? 100% of the time? 60% of the time? 30% of the time? Such a vague statement is no evidence of anything, much less the very strong claim made in the paper.
Now, the two- and three digit addition and subtraction tasks (operations on numbers between 0 and 99 and 0 and 999, respectively) are both small enough for the large, 175B parameter model to have memorised them exactly. Even if there was a single parameter for each three-digit number, of which there are a million, you could fit the entire set 175 thousand times in the 175 billion model (assuming they mean "a billion" as "one thousand million", not "one million million", which they don't clarify, but to be on the safe side let's assume the smallest). There is plenty of room.
These four tasks are also the tasks that are most likely to be present in their entirety in a corpus of natural language, as the one GPT-3 was trained on, for example as records of common monetary transactions (especially the two-digit ones). That is, yes, the training set can comfortably contain 800k 3-digit addition problems. Why not? It contained 410 billion tokens from the Common Crawl dataset alone, plus a few extras.
In short, the almost perfect accuracy on this task is not impressive. The 25% ish accuracy on the four-digit addition task is even less impressive. I don't know what the baseline is here, but 25% accuracy on anything is not something to write home about.
You ask me to provide evidence of my own to support the memorisation claim. The claim is not memorisation. The claim is that GPT-3 has learned arithmetic (not stated exacly like that in the paper). This claim flies in the face of the commonly understood operation of language models, which are systems that compute the probability of a token to follow a sequnce of tokens- and nothing else. It's very hard to see how such a system should be able to perform arithmetic operations, while it's very easy to see how it can instead memorise their results. If the authors of the GPT-3 paper wish to claim that GPT-3 can perform arithmetic, instead of the much simpler explanation, they have to provide very strong evidence to back that up and refute the simpler explanation.
And the "spot checks" that they performed are nowhere near such strong evidence: I can fail to find anything I search for, if I search with the wrong terms and the authors don't give much information about how they did their "spot checks". I mean, did they use a regular expression? Which one? ("<NUM1> + <NUM2> =" is not a regular expression! But then - what is it?) Did they take into account whitespace? Punctuation? Something else? What search terms they used? They dont' say. Can we tell why they failed to find what they were looking for? No.
Besides, why only "spot check" three-digit arithmetic? It would make a lot more sense to spot-check two-digit problems, first, because these are the most likely to be found more often in the dataset and consequently be memorised. Indeed, the fact that they don't report "spot checks" for two-digit arithmetic suggests that they did perform those spot checks and they found a lot more overlap than for the three digit arithmetic, but chose not to report it. And if their model was memorising two-digit arithmetic, and that explains its performance on that type of task, it's safe to assume that it was memorising the third-digit arithmetic task also and that their "spot checks" were simply not very well put together to find the three-digit arithmetic examples.
Note that section 4 goes in length over the possibility that the test set for all tasks (not just arithmetic) was contaminated (i.e. that it containted training examples from existing benchmarks, published on the internet). I haven't read that one carefully but test set contamination is another possibility. And, to be frank, any possibility is more possible than the possibility that a langauge model has learned arithmetic- which is tantamount to magick.
The claim that “GPT-3’s performance on arithmetic tasks is solely due to memorization / data leakage—it has no generalization ability on this type of task”, is easily attackable by...well, doing arithmetic and applying common sense.
There are 2(10^5)^2 = 20 billion possible 5 digit problems (both addition and subtraction). The accuracy on those tasks is about 10%, so roughly 2 billion* 5-digit addition and subtraction problems would need to be represented in the training data (Common Crawl + books as you said). Each problem is at minimum 5 tokens (e.g. 99999 + 11111 = 111110). So is ~2.5% of the training corpus 5-digit addition and subtraction problems that eluded their filtration process? (assuming it’s ~400B tokens like you said). Seems exceedingly unlikely, so much so that memorization ceases to be the simplest explanation.
That said, yes it is surprising that a language model can generalize in this way—that’s the point of the paper. How exactly this happens seems like a valuable thread to pull. Your critiques may help, but writing the results off as impossible magic does not.
About the 5-digit problems- I didn't make this clear but I don't think those were memorised. I think the two- and three-digit problems (all three operations) were memorised, because those are the most likely to be represented in their entirety, or close, in GPT-3's training corpus, given that they are operations that are common to very common in daily life.
I doubt that the four- and five-digit addition problems (and the single-digit, multi-op problem) were represented often enough in GPT-3's training corpus for them to be memorised. I think the low accuracy in these problems (less than 10% in the few-shot setting and near zero in the zero- and one-shot) is low enough that it doesn't require an explanation other than a mix of luck and overfitting that is common enough in machine learning algorithms that it's no surprise. e.g. we evaluate classifiers using diverse metrics, not just accuracy, because this is so common.
It is this observation, that GPT-3 did well in problems that are likely to be well reprsented in its training corpus and badly in ones that aren't, that convinces me that no more complicated explanation is needed than memorisation.
Something else. Like I say above, we evaluate classifiers not only by accuracy (the rate of correct answers), because accuracy can be misleading. e.g. a classifier can have 100% accuracy with 0% false positives and 100% false negatives. The GPT-3 authors only tested the ability of their model to give answers to problems stated as "x + y = ". They didn't test, e.g. what happens if they prompt it with "10 + 20 = 40, 38 + 25 = ". Testing for aberrant answers following from such confusing prompts has often showed that language models that appear to be answering questions correctly because of a deep understanding of language are in truth overfitting to surface statistical regularities. See for example [1,2] and many other references in [3].
Indeed, I could be wrong about rote memorisation and GPT-3 can still not be learning to perform arithmetic computations, given the tendency of language models to learn spurious correlations. There is an article about a mathemagician on the front page today, that shows how she found roots of huge numbers by finding shortcuts around expensive calculations. For instance, all sums between numbers ending in 5 end in 0, etc. I wouldn't find it magickal if a language model was finding such heuristics and that this is the "something else" that is said to be going on. However that would not be "generalisation" and it would not be learning to perform arithmetic.
In the end, I don't understand how a model can be said to know how to add two-digit numbers perfectly but not five-digit numbers. If it's performing an incomplete computation in the latter case, then what kind of incomplete computation is it performing? If it "gets it wrong after three digits" then why does it get three digits right? What's the big difference between three- and four-digit numbers that causes performance to fall off a cliff - other than the chance of finding such numbers in a natural language corpus?
As to magick- I'm writing off not the results, but the hand-waving presented in place of an explanation as magick. GPT-3 is a technological artifact designed to do one job, now reported to be doing another. This requires a thorough explanation but instead we got magickal thinking: the authors wish that their models could learn arithmetic, so they took its behaviour as proof that it learned arithmetic.
___________________
[1] Probing Neural Network Comprehension of Natural Language Arguments
https://www.aclweb.org/anthology/P19-1459/
[2] Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
https://www.aclweb.org/anthology/P19-1334/
[3] https://www.technologyreview.com/2020/07/31/1005876/natural-language-processing-evaluation-ai-opinion/ (try F9 if you 're over limit)
My point was that intelligent humans can and often do make mistakes in logic and computation (arithmetic) in ways that machines typically do not. One reason may be colliding or incomplete representations of certain concepts, and (relatedly) the fact that we are relying on language. I think of neural networks as fuzzy representation composers, so it seems they also fail for similar reasons. Basically, it (GPT) does have some layered representation of the concept of numbers and how they are used in different contexts which gives it some faculty at carrying out common operations, but it doesn’t “add up” to a reliable system of logic (that would allow it to extend addition to say 100-digit numbers, the way even a sharp and/or patient 2nd grader could do, generalizing from the simpler cases).
I think accuracy is sometimes the correct measure, and in this instance it seems fine—at baseline, we should expect ~0% accuracy since it is generating output from essentially the space of all possible text (texts <= 2048 tokens). I agree that it would be interesting to probe the model with better tests, and understanding when/why it fails on certain arithmetic problems or types of reasoning.
I liked what you wrote about finding heuristics, though I disagree with your conclusion that heuristic finding does not qualify as learning—it is just somewhere along the spectrum between a randomized model and an ALU (neither of which can be said to have learned anything) in terms of its ability to perform arithmetic.
Of course, we already have better models for solving proofs and such, so I generally think the way toward more complete AI models is to return to the system design view of AI (meta-learning, integration of different models, etc) rather than trying to evolve one colossal model to rule them all. That is, a meta-model that recognizes what sort of problem it is facing, then selecting a model/program to solve or generate possible solutions to that problem, while revealing or explaining as much of this process as possible to the user.
In any case, I have definitely have more to read on the subject and am mostly musing at this point. Thanks for the references and the conversation.
I guess I can concede that the memorisation explanation is not the only possible one, there's always the possibility of learned heuristics. I still expect very strong evidence before I'm convinced that GPT-3 can learn arithmetic in the general sense and I don't trust the explanation that it's only learning partially- but let's agree to disagree on that. Thank you for the conversation, too.
In fact, it is absolutely the case that if a system needs to see "enormous" amounts of examples before it can "deduce the underlying rules" (of arithmetic) then GPT-3 can't do that, because there simply aren't enough examples of such operations (between one- and two-digit numbers).
Indeed, GPT-3 completes "x + y =" and "x - y =" prompts with 80% accuracy or more when x and y are one- or two-digit numbers. It scores 20% or less when x and y are three- to five-digit numbers and its accuracy is similarly abysmal on multiplication (and results on division are not even reported for one-digit x and y).
This is very much what we would expect to see from a model that has memorised some, common, examples of one- and two-digit addition and subtraction and has not seen enough examples of other operations to learn any consistent representation of those operations.
While it could be memorization, their testing seem to imply something else is going on.
Particularly because they excluded the exact examples they tested from the training data (to try and test if it was memorization). Since gpt-3 could solve some arithmetic even without those exact example test cases in the training data, it seems more likely to me that it's not simply memorization. It might be that it has some incomplete idea of the underlying rules, and therefore makes mistakes.
A child that is learning addition will have a harder time with larger numbers and make more mistakes too.
It'll be interesting to see if continuing to scale up the model improves things or not.
> "In fact, it is absolutely the case that if a system needs to see "enormous" amounts of examples before it can "deduce the underlying rules" (of arithmetic) then GPT-3 can't do that, because there simply aren't enough examples of such operations (between one- and two-digit numbers)."
I don't know if this is true. It depends on 'enormous' and it depends on how many examples are required to deduce how math works from trying to predict what's next. I don't think anyone actually knows the answer to this yet?
GPT-3 is not a child. GPT-3 is a language model and language models are systemst that predict the next token that follows from a sequence of tokens. A system like that can give correct answers to arithmetic problems it has seen already and often, without having to learn what a child would learn when learning arithmetic.
A system like that can also give incorrect answers to arithmetic problems it has not seen already, or hasn't seen often enough and that will be for reasons very different than the reasons that a child will give incorrect answers to the same problem.
In general, we don't have to know anything about how children learn arithmetic to know how GTP-3 answers arithmetic problems, it suffices to know how language models work.
I don't think additional conversation will be productive.
The point is success on small numbers and more mistakes on larger numbers would be what I would predict both if results were memorized and if it had deduced some incomplete model of how to do basic arithmetic.
Not having the examples in the training data is a point in favor of understanding the underlying rules and a point against memorization.
> "GPT-3 is not a child. GPT-3 is a language model"
Yes - thanks for the obvious condescension.
To be honest, I assumed you were making an anthropomorphising comment, suggesting that GPT-3 learns like a child would learn. Sorry if that wasn't what you meant. I'm used to seeing comments like that (on HN in particular) so I guess I jumped to conclusions.
Anyway I guess I took your comment about not reading linked papers personally and you took my comment about GPT-3 not being a child personally. And the other poster pounced on my comment as if they had something to win and I reacted to it angrily. This is not a very good conversation and I haven't done my due dillience to keep it civil. I'm sorry we couldn't have a very constructive conversation this time. Maybe next time.
I do suspect that you may be too quick to draw the memorization conclusion because you find the other one too unlikely. I think you're setting the evidence barrier too high/dismissing evidence in favor of the conclusion you already believe to be true while not holding the memorization hypothesis to the same bar (rather than recognizing that the evidence suggests maybe something interesting is going on).
Either way, when things scale up we should be able to see what ends up happening.
My motivation here is that GPT-3 is a language model and language models are designed to do one thing, and only that thing (calculate probabilities of next-tokens). If we observe some unexpected behaviour of a language model, the logical first step is to try and explain it on the basis of what we know a language model to be able to do. However, the authors of the GPT-3 paper didn't do that and immediately jumped to wishful thinking, about their model doing something it wasn't designed to do, as if by magick, on the sole basis that it was a larger model, trained on more data, than others. But, the more data and more parameters make GPT-3 a quantitatively different, not qualitatively different model and if it's now behaving in ways that language models are not designed to behave, this requires a very thorough explanation and a very strong justification. I didn't see anything like that in the paper.
I've given some links above, in my reply to jointpdf from today, to papers where people have tested language models more thoroughly and found that despite similar claims (e.g. for BERT etc) a careful examination of a model's behaviour and training corpus reveals that it's doing what a language model is designed to do and nothing else.
So, yes, I don't think the two explanations are equally likely: memorisation and magickal arithmetic. There is a very strong prior in favour of the former and very little evidence in support of the latter.
(I work on Elicit, the tool used in the thread.)
I believe most people underestimate chances of AGI arrival because they overestimate humans. The famous post "Humans who are not concentrating are not general intelligences" got most of the point.
Machine learning (ML) and Deep Learning (DL) in particular benefit from fast computer chips. The most impressive gains in AI in the 2010s (Computer Vision & Natural Language Processing) were made thanks to DL. There's a famous post by AI researcher Rich Sutton, which summarises this fairly neatly [1].
Now, this connection between DL and fast chips would tie the progress of AI pretty tightly to the progress of Moore's Law [2]. There's some compelling evidence that Moore's Law is at least slowing down [3]. On the other hand, there are industry experts like Jim Keller who pretty strongly disagree with these assessments [4] and even TSMC seems to be bullish on being able to keep up with Moore's Law [5].
Some estimates of GPT-3s training cost put it in the range of ~5-10 million dollars [6]. It's hard to say how big of an impact on the economy GPT-3 will have. It's probably safe to assume though, that OpenAI is already working on GPT-4. The jump in parameter sizes from GPT-2 to 3 was roughly 100x (1.5 billion vs. 175 billion) and I'm assuming the price of training the model increased in roughly the same proportion (I might be wrong here and if anyone can point me to evidence on this, it would be much appreciated). With these assumptions, and provided that GPT-4 won't be affected by diminishing returns of adding more parameters (big if), the price for it would be somewhere between 500 million and a billion dollars. That's still not an insane amount of money to put into R&D, but you'd probably want it to be at least be somehow economically viable to be able to justify putting a hundred billion dollars into GPT-5.
All this is to say, that I find making predictions of the progress of AI really hard due to the large amount of uncertainty related to the field and the underlying technologies (mainly the hardware).
[1]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
[2]: https://arxiv.org/pdf/2007.05558.pdf
[3]: https://p4.org/assets/P4WS_2019/Speaker_Slides/9_2.05pm_John_Hennessey.pdf
[4]: https://www.youtube.com/watch?v=oIG9ztQw2Gc
[5]: https://www.nextplatform.com/2019/09/13/tsmc-thinks-it-can-uphold-moores-law-for-decades/
[6]: https://venturebeat.com/2020/06/11/openai-launches-an-api-to-commercialize-its-research/
One doesn't have to be "economically valuable" to be considered intelligent. Think of philosophy majors, for instance.
Now, imagine an AI that could replicate the intelligence of a 6 year old; it wouldn't be "economically valuable" at first, but it would keep learning, year after year, until it exceed humans.
Wouldn't that be a prime example of an AGI? Or would it only be accepted as such when it matched or surpassed humans "at almost all (95%+) economically valuable work"? What if it decided to pursue a degree in philosophy?
When it happens, we will be way past the first AGI, and entering the singularity.
There is no way to predict when scientific discoveries will happen before they happen. This is a fools errand.