2023: A Brief Overview of the AI Landscape
Written on
This document provides a succinct summary of the most significant trends and developments that characterized the AI sector in 2023.
Scaling Up Context Length
The race to increase parameters appears to have plateaued, with GPT-4 likely not achieving the anticipated trillion parameters. As building larger models has become impractical, the new competitive edge has shifted towards enhancing context length.
One of the limitations of Transformer-based language models is their restricted context length due to quadratic computational costs in time and memory. However, there is a rising demand for extended context windows in applications like PDF processing and narrative development. (source)
Historically, models before 2023 capped at a context length of 2048:
GPT-4 claims a context length of 32K, while Claude appears to support 100K tokens. LLaMa 2 originally began with a 4K context length, but this was rapidly expanded by the open-source community.
Unfortunately, models are still unable to utilize this extensive context length effectively.
The Importance of Smaller Models
A trend toward smaller models has emerged, primarily due to the astronomical costs associated with infrastructure and resources. Companies are striving to create more compact models that can perform equivalent tasks to their larger counterparts, at least for specific applications.
In the coming year, the focus will likely remain on both productivity and the economic viability of these models.
AI Agent Design
Another noteworthy trend is the development of AI agents capable of deciding which APIs to utilize and how to integrate outcomes into their predictions.
Ensembles Make a Comeback
GPT-4, trained on a mix of text and images, demonstrated remarkable performance, such as excelling in the Bar Exam.
OpenAI claims that GPT-4 outperforms 90% of individuals taking the bar to become lawyers and 99% of competitors in the Biology Olympiad (source).
The outstanding success of GPT-4 illustrates the transformative impact of Reinforcement Learning from Human Feedback (RLHF) on large language models (LLMs). The introduction of chatGPT opened up new possibilities and helped the public grasp the capabilities of LLMs.
Today, RLHF is integral to nearly all leading LLMs and is especially critical in chat applications, although it relies heavily on human annotators to ensure optimal performance.
GPT-4 marked a significant turning point for multiple reasons:
- The lack of transparency concerning its training data, dataset, and architecture.
- It has become the benchmark for evaluating a wide range of tasks.
- GPT-4 was utilized to assess other models.
- It also served as a prompt generator or a teaching model.
But is RLHF the sole factor behind GPT-4's success?
It appears that GPT-4 might not be a single model, but rather an amalgamation of eight smaller models, each containing 220 billion parameters, effectively functioning as an ensemble of experts.
Predictions for 2024 suggest we may witness increased use of expert mixtures, potentially leveraging smaller specialized models.
Multimodal Developments
The push to develop multimodal models has accelerated, evident in the production-level advancements made by GPT-4 with vision capabilities (GPT-4V).
GPT-4V allows users to prompt GPT-4 to analyze image inputs, representing a significant step in AI research and development. (source)
Not only OpenAI is exploring vision-language models; other notable examples include Google Flamingo (closed-source), Google PaLM E, and BLIP-2 (open-source).
A significant development is the LLaMA-Adapter V2, which aims to convert LLaMA into a multimodal model.
Open-Source Revolution
LLaMA is emerging as a pivotal model following GPT-4. Unlike Bard, LLaMA (and its successor LLaMA-2) are open-source, leading to rapid community adoption and numerous derivatives.
The open-source community has proven to be a formidable player, establishing itself as a third competitor against OpenAI and Bard. Following LLaMA's release, various groups have experimented with extending its context length and enhancing its capabilities for a range of tasks.
Research indicates that smaller models can rival larger ones if trained appropriately, favoring the open-source movement, which, despite having similar resources as larger companies, benefits from greater agility and fewer bureaucratic hurdles.
Raising Awareness of LLM Limitations
As excitement settles, discussions about the limitations of LLMs have intensified. For instance, the much-touted emergent properties may not actually exist, contributing to a decline in the parameter race.
Is there merit to the notion that emergent capabilities in AI are illusory? (source)
Moreover, after earlier assertions that Vision Transformers would replace convolutional networks, it's become clear that convolutional networks can still compete effectively with ViTs. Overall, it seems the transformer architecture may be showing signs of its limitations after years of dominance.
Medical AI Advances
AlphaFold-2 revolutionized the study of protein structures, and progress has continued in 2023. New models are now capable of predicting protein structures with comparable accuracy to AlphaFold-2 but at greater speeds. Additionally, diffusion models have been employed to synthesize proteins from scratch.
Other models have demonstrated the ability to forecast changes in gene expression due to gene stimulation or suppression, as well as assess whether mutations are pathogenic (e.g., AlphaMissense).
Innovative models like Google Med-PaLM and PMC LLaMA have shown superior capabilities in addressing medical inquiries, with clinicians sometimes preferring Google Med-PaLM's responses. Google has also evolved its model into a multimodal framework.
The medical field is not the only area seeing advancements; research continues across various disciplines.
The Safety Discussion
Generative models have made impressive strides, producing results that increasingly resemble original content. Consequently, interest has grown in identifying AI-generated texts and images:
As these systems proliferate, the risks of their misuse escalate, including social engineering, election manipulation, and the proliferation of misinformation. (source)
To address this, watermark templates have been proposed for both AI-generated images and texts:
Additionally, Google DeepMind has initiated SynthID to watermark images created by their models.
Another noteworthy initiative is the Foundation Model Transparency Index, developed by Stanford researchers to evaluate various models' transparency. The findings indicate that significant improvements are still needed, even among open-source models:
This clearly reflects how these companies stack up against one another, and we hope it encourages them to enhance their transparency. (source)
The regulatory landscape remains uncertain, with the AI Act being a first step that has left many dissatisfied. The anticipated EU AI Act is expected to have a significant impact, similar to the GDPR.
Emerging Trends
- Interest in synthetic data is rising, though results remain mixed.
- Code-writing models are improving, alongside the emergence of open-source alternatives.
- Prompting techniques are becoming more sophisticated, with ongoing research into automated approaches.
- Text-to-video models are achieving higher accuracy and resolution.
- Music generation is emerging as a new frontier.
- Weather prediction models are becoming more precise with extended forecasting capabilities.
- New diffusion models are being developed for protein design.
- The demand for GPUs is surging, with NVIDIA leading the market, projected to reach a $1 trillion valuation. Computer clusters are proliferating globally.
- ChatGPT's rise is significantly impacting businesses, with Chegg being one of the early casualties. Recent announcements from OpenAI have disrupted companies focused on PDF analysis, while ChatGPT and Copilot are diverting traffic away from StackOverflow.
- Ongoing copyright lawsuits reflect the lack of clear guidelines in this area.
Conclusions
AI has made significant strides this year, particularly in various applications. The launch of ChatGPT has sparked widespread public interest in the concept of large language models.
From a data scientist's perspective, the future holds exciting possibilities.
If you found this article engaging:
You can explore my other writings and connect with me on LinkedIn. Check out this repository for weekly updates on ML & AI news. I’m open to collaborations and projects, and you can reach me on LinkedIn.
Here’s a link to my GitHub repository, where I’m compiling code and resources related to machine learning, artificial intelligence, and more.
<div>
<div>
<h2>GitHub - SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…</h2>
<div><h3>Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…</h3></div>
<div><p>github.com</p></div>
</div>
<div>
</div>
</div>
You might also be interested in one of my recent articles:
<div>
<div>
<h2>How transparent are large language models?</h2>
<div><h3>Stanford proposes an index to measure LLM transparency, and the results are not encouraging</h3></div>
<div><p>pub.towardsai.net</p></div>
</div>
<div>
</div>
</div>
<div>
<div>
<h2>Have convolutional networks become obsolete</h2>
<div><h3>Vision transformers seem to have replaced convolutional networks, but are they really better?</h3></div>
<div><p>levelup.gitconnected.com</p></div>
</div>
<div>
</div>
</div>
<div>
<div>
<h2>The Computer Vision’s Battleground: Choose Your Champion</h2>
<div><h3>Which is the best computer vision model? Which one is best for a particular task?</h3></div>
<div><p>pub.towardsai.net</p></div>
</div>
<div>
</div>
</div>
<div>
<div>
<h2>Tabula Rasa: How to save your network from the category drama</h2>
<div><h3>Neural networks do not like categories but you have techniques to save your favorite model</h3></div>
<div><p>levelup.gitconnected.com</p></div>
</div>
<div>
</div>
</div>