The rise of large language models & coming business applications 

One week ago, on November 30th, OpenAI released it’s latest large language model chat bot, called ChatGPT. As a general chat-bot, this system allows users to enter prompts to which the system replies. While it is not always correct, ChatGPT is in many instances able to provide very detailed answers to a wide range of prompts ranging from factual queries about the training data to being an assistant in tasks such as writing emails and to do lists. While concerns about reliability biases and costs are not fully addressed at this point, recent progress shows the promise of large language models as a general purpose tool to unlock new efficiencies in business applications.

ChatGPT is a successor of a recent line of machine learning models referred to as “large language models”. Examples include the GPT-3 (Generative Pre-trained Transformer 2) model from June 2020 and GPT-2 from February 2019, which was open sourced by OpenAI. These models are trained on very large text training datasets and utilize a type of deep neural networks called transformer networks, invented by Google in 2017, and which are finding more and more applications in recent months. Similar efforts on large general purpose language models are currently under way by multiple big industry players in machine learning such as the November 2018 model BERT (Bidirectional Encoder Representations from Transformers) and more recently LaMDA by Google and OPT-175B (Open Pre-trained Transformer Language Models) by Meta, which has also been open sourced.  

Within the short time span of just 2 years, these models have transformed what was thought possible in terms of machine language understanding of text and have reached breath-taking sizes – OPT-175B for example features 175 billion trained parameters and many of the recent models require data-center scale resources of hundreds or thousands of GPUs to train. However, once trained and released, these models can be run and adapted with more reasonable compute resources by end-users (e.g. 16 Nvidia V100 GPUs for OPT-175B). 

While ChatGPT is in some sense just the latest incarnation of the meteoric rise of these models over recent years, its adoption rate provides a first glimpse of the business impact these models may soon have. Within one week, it was reported that more than a million users had signed up to early access to ChatGPT and within just a few hours of the release twitter was set alight with messages from astonished end-users discussing possible end-use cases. 

Early Example Use Cases of ChatGPT include 

One good approach to think of the capabilities that these models are converging to is, as Yann LeCunn, one of the key researchers in this field, states as “Auto-Complete for everything, where a system can produce draft suggestions based on natural language queries. While ChatGPT in particular produces text output, recent methods are not limited to text, as we are in parallel are seeing similar progress in generative image synthesis, for example, which is a topic for a future article.

Challenges & Limitations: 

At Decision Labs, we are excited about the challenges and possibilities of incorporating and fine-tuning these general-purpose open-source large language models into solutions for our clients and to integrate these new technologies with custom business datasets. If your business is interested in these possibilities and if you would like to be at the forefront of developments in this space, get in touch! With these rapidly maturing technologies open for exciting new ways to accelerate your business and to query and interact with large internal business datasets.