There are many ways in which one could explain AI as a phenomenon over the past few years. This is my attempt at explaining the assumptions and logic that underlies all the investment and the hype, and where I think we are heading. I tried to keep it relatively jargon free. Let me know what you think!
1. Since about 2012, the artificial neural network branch of AI has been on the ascendance reaching its current peak with the successes of LLMs powered by the transformer architecture. Many believe that artificial neural networks can achieve human-level intelligence (to avoid never-ending debates, let me just state that I don't care whether they 'really reason' or not, and whether what they have is 'true intelligence'. If an artificial neural network can do everything a human cognitively can then I would consider them exhibiting human-level intelligence).
2. The size of the neural networks can be measured by the number of parameters (tunable values) it has. The values of these parameters are determined by 'training' the network with vast amounts of data - which in turn takes a lot of computing power (or compute for short). All else being equal, a neural network with larger number of parameters, trained on larger amount of data, using larger amount of compute has more capabilities.
3. To achieve high levels intelligence with manageable amounts of parameters/data/compute, we imbue the neural network architecture with a few inductive biases (e.g., multi-headed self attention with permutation invariance, positional encoding, global interactions for transformers; spatial locality, translation invariance, and hierarchical feature composition for CNNs).
4. When done right, these inductive biases help achieve higher levels of capabilities for given levels of data/compute/parameters, but they also bring some intrinsic limitations to the architecture. Figuring out the correct set of inductive biases has been driven by intuition and painstakingly progressed through trial & error.
5. Every once in a while, we hit a jack pot of emergent capabilities with a particular approach and architecture, like we did with Large Language Models built using transformers. Predicting the next word given a sequence of words (tokens to be precise) led to an LLM capable of writing poetry, code, and solve many kinds of math problems!
6. Because there isn't a theoretically sound way of determining the limits of these emergent capabilities, we keep making the models as large as we could until they reach the plateau of unacceptably small returns proportional to the inputs. Meanwhile, some skeptics keep pointing to the intrinsic limitations of the architecture, and why this investment is unlikely to lead to human-level intelligence.
7. As it stands, we seem to have hit this plateau of unacceptably small returns with "pre-training" the LLMs. While OpenAI's GPT-4.5, likely the largest LLM ever trained, performs somewhat better compared to its predecessor GPT-4o, it's much more expensive to use.
8. Even before the release of GPT-4.5, however, focus has firmly shifted to using Reinforcement Learning to improve the reasoning capabilities of the base models. Some believe that GPT-4.5, being the larger model with better capabilities, will make a better base model for future reasoning models. This is likely to be the case, but this brings us to the next obvious question - when will we hit the plateau of unacceptably small returns with this approach? My own gut-feel says we'll get to this stage within the next couple of years. But once again, we won't really know until we actually try!
9. I hope research will continue to be made along other axes (e.g., neurosymbolic AI, memory augmented transformers) so that we don't have all our eggs in one big LLM basket.
10. Finally, even if progress stalls or slows in a year or two, there's plenty of work to be done in developing and deploying applications that utilize the already existing capabilities of todays' leading LLMs. There are still exciting times ahead!
Discussion about this post
No posts
https://johnshanewayofthepoet.substack.com/p/song-of-the-monstrous-grid-john-shane?r=4max28