spirosgyros.net

2023: A Brief Overview of the AI Landscape

Written on

This document provides a succinct summary of the most significant trends and developments that characterized the AI sector in 2023.

Scaling Up Context Length

The race to increase parameters appears to have plateaued, with GPT-4 likely not achieving the anticipated trillion parameters. As building larger models has become impractical, the new competitive edge has shifted towards enhancing context length.

One of the limitations of Transformer-based language models is their restricted context length due to quadratic computational costs in time and memory. However, there is a rising demand for extended context windows in applications like PDF processing and narrative development. (source)

Historically, models before 2023 capped at a context length of 2048:

GPT-4 claims a context length of 32K, while Claude appears to support 100K tokens. LLaMa 2 originally began with a 4K context length, but this was rapidly expanded by the open-source community.

Unfortunately, models are still unable to utilize this extensive context length effectively.

The Importance of Smaller Models

A trend toward smaller models has emerged, primarily due to the astronomical costs associated with infrastructure and resources. Companies are striving to create more compact models that can perform equivalent tasks to their larger counterparts, at least for specific applications.

In the coming year, the focus will likely remain on both productivity and the economic viability of these models.

AI Agent Design

Another noteworthy trend is the development of AI agents capable of deciding which APIs to utilize and how to integrate outcomes into their predictions.

Ensembles Make a Comeback

GPT-4, trained on a mix of text and images, demonstrated remarkable performance, such as excelling in the Bar Exam.

OpenAI claims that GPT-4 outperforms 90% of individuals taking the bar to become lawyers and 99% of competitors in the Biology Olympiad (source).

The outstanding success of GPT-4 illustrates the transformative impact of Reinforcement Learning from Human Feedback (RLHF) on large language models (LLMs). The introduction of chatGPT opened up new possibilities and helped the public grasp the capabilities of LLMs.

Today, RLHF is integral to nearly all leading LLMs and is especially critical in chat applications, although it relies heavily on human annotators to ensure optimal performance.

GPT-4 marked a significant turning point for multiple reasons:

  • The lack of transparency concerning its training data, dataset, and architecture.
  • It has become the benchmark for evaluating a wide range of tasks.
  • GPT-4 was utilized to assess other models.
  • It also served as a prompt generator or a teaching model.

But is RLHF the sole factor behind GPT-4's success?

It appears that GPT-4 might not be a single model, but rather an amalgamation of eight smaller models, each containing 220 billion parameters, effectively functioning as an ensemble of experts.

Predictions for 2024 suggest we may witness increased use of expert mixtures, potentially leveraging smaller specialized models.

Multimodal Developments

The push to develop multimodal models has accelerated, evident in the production-level advancements made by GPT-4 with vision capabilities (GPT-4V).

GPT-4V allows users to prompt GPT-4 to analyze image inputs, representing a significant step in AI research and development. (source)

Not only OpenAI is exploring vision-language models; other notable examples include Google Flamingo (closed-source), Google PaLM E, and BLIP-2 (open-source).

A significant development is the LLaMA-Adapter V2, which aims to convert LLaMA into a multimodal model.

Open-Source Revolution

LLaMA is emerging as a pivotal model following GPT-4. Unlike Bard, LLaMA (and its successor LLaMA-2) are open-source, leading to rapid community adoption and numerous derivatives.

The open-source community has proven to be a formidable player, establishing itself as a third competitor against OpenAI and Bard. Following LLaMA's release, various groups have experimented with extending its context length and enhancing its capabilities for a range of tasks.

Research indicates that smaller models can rival larger ones if trained appropriately, favoring the open-source movement, which, despite having similar resources as larger companies, benefits from greater agility and fewer bureaucratic hurdles.

Raising Awareness of LLM Limitations

As excitement settles, discussions about the limitations of LLMs have intensified. For instance, the much-touted emergent properties may not actually exist, contributing to a decline in the parameter race.

Is there merit to the notion that emergent capabilities in AI are illusory? (source)

Moreover, after earlier assertions that Vision Transformers would replace convolutional networks, it's become clear that convolutional networks can still compete effectively with ViTs. Overall, it seems the transformer architecture may be showing signs of its limitations after years of dominance.

Medical AI Advances

AlphaFold-2 revolutionized the study of protein structures, and progress has continued in 2023. New models are now capable of predicting protein structures with comparable accuracy to AlphaFold-2 but at greater speeds. Additionally, diffusion models have been employed to synthesize proteins from scratch.

Other models have demonstrated the ability to forecast changes in gene expression due to gene stimulation or suppression, as well as assess whether mutations are pathogenic (e.g., AlphaMissense).

Innovative models like Google Med-PaLM and PMC LLaMA have shown superior capabilities in addressing medical inquiries, with clinicians sometimes preferring Google Med-PaLM's responses. Google has also evolved its model into a multimodal framework.

The medical field is not the only area seeing advancements; research continues across various disciplines.

The Safety Discussion

Generative models have made impressive strides, producing results that increasingly resemble original content. Consequently, interest has grown in identifying AI-generated texts and images:

As these systems proliferate, the risks of their misuse escalate, including social engineering, election manipulation, and the proliferation of misinformation. (source)

To address this, watermark templates have been proposed for both AI-generated images and texts:

Additionally, Google DeepMind has initiated SynthID to watermark images created by their models.

Another noteworthy initiative is the Foundation Model Transparency Index, developed by Stanford researchers to evaluate various models' transparency. The findings indicate that significant improvements are still needed, even among open-source models:

This clearly reflects how these companies stack up against one another, and we hope it encourages them to enhance their transparency. (source)

The regulatory landscape remains uncertain, with the AI Act being a first step that has left many dissatisfied. The anticipated EU AI Act is expected to have a significant impact, similar to the GDPR.

Conclusions

AI has made significant strides this year, particularly in various applications. The launch of ChatGPT has sparked widespread public interest in the concept of large language models.

From a data scientist's perspective, the future holds exciting possibilities.

If you found this article engaging:

You can explore my other writings and connect with me on LinkedIn. Check out this repository for weekly updates on ML & AI news. I’m open to collaborations and projects, and you can reach me on LinkedIn.

Here’s a link to my GitHub repository, where I’m compiling code and resources related to machine learning, artificial intelligence, and more.

<div>

<div>

<h2>GitHub - SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…</h2>

<div><h3>Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…</h3></div>

<div><p>github.com</p></div>

</div>

<div>

</div>

</div>

You might also be interested in one of my recent articles:

<div>

<div>

<h2>How transparent are large language models?</h2>

<div><h3>Stanford proposes an index to measure LLM transparency, and the results are not encouraging</h3></div>

<div><p>pub.towardsai.net</p></div>

</div>

<div>

</div>

</div>

<div>

<div>

<h2>Have convolutional networks become obsolete</h2>

<div><h3>Vision transformers seem to have replaced convolutional networks, but are they really better?</h3></div>

<div><p>levelup.gitconnected.com</p></div>

</div>

<div>

</div>

</div>

<div>

<div>

<h2>The Computer Vision’s Battleground: Choose Your Champion</h2>

<div><h3>Which is the best computer vision model? Which one is best for a particular task?</h3></div>

<div><p>pub.towardsai.net</p></div>

</div>

<div>

</div>

</div>

<div>

<div>

<h2>Tabula Rasa: How to save your network from the category drama</h2>

<div><h3>Neural networks do not like categories but you have techniques to save your favorite model</h3></div>

<div><p>levelup.gitconnected.com</p></div>

</div>

<div>

</div>

</div>

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Exploring the Psychological Drive Behind Belief in Extraterrestrials

Examining the human inclination to believe in aliens and its psychological roots.

How to Accelerate Your Path to Senior Software Engineer

Discover essential strategies to become a senior software engineer faster and enhance your programming career.

Exploring Signals from Proxima b: A Potential Alien Connection

An in-depth look into potential alien signals from Proxima b, investigating their implications for extraterrestrial life.

Whiteness: A Call to Understand, Challenge, and Transform

Exploring the concept of whiteness and its impact on society, while offering pathways for understanding and transformation.

# A Year with Apple TV 4K: Insights and Reflections

Discover the challenges and surprises of using Apple TV 4K exclusively for a year, alongside insights on its features and limitations.

The Anticipation for Sony WH-1000XM5 Headphones: Release Insights

Excitement builds for the Sony WH-1000XM5, with leaks hinting at design changes and features, but release details remain unclear.

The Rise and Legacy of the Achaemenid Empire: A Historical Overview

Explore the emergence and significance of the Achaemenid Empire, one of the ancient world's greatest powers, from its rise to its enduring legacy.

Crows and Ravens: The Remarkable Intelligence of These Birds

Discover the incredible intelligence of crows and ravens that rivals that of a seven-year-old, showcasing their problem-solving skills and memory.