ChatGPT-3 - Lets have a chat!

 

Chatbots have been around for decades, from the very early therapist ELIZA (1966) to the very modern virtual assistant Siri. Over time they have become increasingly sophisticated tools that allow us to interact with machines in a natural and efficient manner.

More recently, the buzz around Open-AI’s latest chatbot ChatGPT may have caught your attention. Launched in November 2022, its impressive language processing abilities have taken the internet by storm.  In this blog post we will investigate the recent explosion in popularity of GPT; what is a “GPT”, how can ChatGPT be used, and why we should be aware of its limitations?

 

“A robot progressively getting higher tech” - submitted to Open-AI’s Dalle-E-2 Image generation model.

 
 

What is GPT?

 Generative Pre-Trained Transformer 3 (GPT-3) is the third in a set of large language models developed by the start-up Open-AI and forms the base of ChatGPT. A large language model is a neural network that can process natural language, that is, language evolving naturally from humans. Large language models are trained to learn the relationships between words and retain this information in sets of numerical values. Through mathematical manipulation, these values can then be used to perform a variety of natural language processing tasks ranging from language translation to question answering.

Generative

Generative models aim to create new examples of data that mimic the features and relations from their training dataset. In the case of GPT-3, this means generating text that is indistinguishable from text generated by humans!

Pre-Trained

Training of large language models typically requires huge datasets. A workaround is needed for tasks without such large amounts of data. Transfer learning is one such workaround. Transfer learning is the process of taking a model that was initially trained to conduct a certain task using large amounts of data, and fine-tuning it on a much smaller amount of data to solve a similar but distinct task. This method works on the hypothesis that many useful features in the new task will have already been learned from the large dataset. We say that the original model was “pre-trained” before being fine-tuned.

GPT-3 was (pre)trained to generate text on a wide range of topics and writing styles using webpages, books, and Wikipedia articles allowing it to generate text with a massive range of topics and writing styles. ChatGPT was then fine-tuned to produce dialogue-formatted responses to user prompts.

Transformer

Early natural language algorithms would treat each word independently, having no way of allowing surrounding words to influence its meaning.  As context can entirely alter the meaning of a word, models were developed that could store and represent this information. Recent natural language processing models have made use of recurrent neural networks (RNNs). To maintain context and dependency between the words, the RNN was able to update its internal state to retain information about words that had previously appeared in the sentence.

RNNs had some design flaws that limited their ability to be developed into large language models. Due to their sequential input, particularly long pieces of text would struggle to retain information about words appearing early in the sequence. By the time they had reached the end of the text, they had forgotten the context of the beginning. This meant that trying to summarise documents, something ChatGPT excels at, would not be feasible as only the last few sentences would be represented in its output.  Additionally, this sequential input would impose restrictions on their training speed as they cannot fully utilise the power of GPUs. Training on ‘GPT-3 sized’ datasets was just not possible!

Transformers are a type of neural network architecture that was introduced by the Google Brain team in 2017 (Attention is all you need, 2017) [1]. Transformer networks use the concept of attention to retaining relations between words. Attention involves weighting the importance of each part of the input to highlight important information.

 

A graph demonstrating the increase in model parameter size since the 2018. Parameter sizes are displayed in billions of parameters. The exponential increase in size has been highlighted through the fit line in red.

 

This mechanism is very similar to the human idea of attention in the fact that it allows the model to focus on important parts of the input text. We wouldn’t need to consider all words in “Hogwarts is a place of magical study” to understand that this sentence was about Harry Potter, we could just focus (attend) to the words “Hogwarts” and “magical”. These attention values can be passed linearly through the model, allowing for training parallelization.  In other words, transformers scale very well! We can pass a lot more data through a much larger model. This has rapidly increased the parameter size of large language models – as seen above!

Where can ChatGPT be utilised?

ChatGPT has a wide range of natural language processing applications. Alongside question answering and sentence completion, many have found that ChatGPT can increase their productivity in the workplace, for example drafting emails, summarising documents, or even code completion.

Focusing on the hiring process, résumé screening is a trivial task for ChatGPT, providing summaries of prospective candidates, and therefore increasing the efficiency of hiring managers. Job descriptions could also be generated and posted within seconds, just requiring a few key bullet points about candidate requirements. ChatGPT could effectively streamline the hiring process, allowing for the best candidates to reach the latter (human-reviewed) stages in record time! However, as we will now discuss, such reliance on ChatGPT does not come without its risks.

What are the limitations of ChatGPT?

Large language models all share some similar pitfalls common to much of artificial intelligence. Some of these issues lie with the model and some lie with the user. A common concern among researchers is their tendency to exhibit bias against certain protected characteristics. For example, a 2021 study found that GPT-3 would commonly associate Islam with violence, resulting in unfavorable responses to prompts regarding Muslims [2]. As these models are trained on huge datasets consisting of unfiltered text from a variety of sources, biases, and stereotypes present in the dataset will be present in the model output. While Open-AI has implemented their “Moderation API” to block unsafe content from being fed to ChatGPT, bias is still present in many of its responses. Textio CEO Kieran Snyder recently asked ChatGPT to write job posts, finding that while the API refused some problematic prompts, most were able to pass through without any issues [3].

Another risk of becoming dependent on ChatGPT like models is their ability to hallucinate. They will often generate responses that are grammatically perfect and sound plausible but are incorrect. There is no way for ChatGPT to verify its answers, every word it generates is just statistically chosen to be the “best fit”. This is very dangerous as information could be generated that, without careful inspection, could be taken as fact. Regardless, its question-answering abilities have already been well documented, causing quite a stir in academia.  Some studies have called for a shift away from the online learning introduced by the pandemic, with studies returning to invigilated in-person examinations due to GPT-cheating concerns [4].

Closing Remarks (written by ChatGPT)

I think that it is only right that we give ChatGPT the opportunity to write the closing remarks to this post. Take it away! 

In conclusion, ChatGPT is a powerful language model that has the potential to revolutionize a wide range of natural language processing applications. Its ability to generate human-like text and answer questions has been well-documented and has already caused quite a stir in academia. However, it is important to note that the model is not without its limitations, such as biases and the tendency to "hallucinate" responses that are not necessarily true. As with any technology, it is crucial to use ChatGPT responsibly and with caution. Overall, the advancements in large language models like ChatGPT are exciting and have the potential to greatly benefit society in a multitude of ways.

Author: Lewis Wood (AI Algorithm Developer)