ChatGPT is an AI tool that has shown an impressive ability to answer questions, engage in realistic dialogue, write programming code from scratch, and write compelling blog posts on a wide variety of topics.
What is ChatGPT and what previous work does it build off?
ChatGPT is a large language model that was trained by OpenAI. The underlying model is an upgrade to OpenAI’s Generative Pre-trained Transformer 3 (GPT-3), which uses a transformer architecture to produce human-like text in response to a text prompt provided by a user. The original GPT-3 model was announced in 2020 and trained with 175 billion parameters, requiring 800+ GB of storage.
Prior to the release of ChatGPT, OpenAI has trained and deployed for public use additional GPT-3-like models (e.g InstructGPT as text-davinci-002 and text-davinci-003) that, despite using fewer than one percent of the original 175 billion parameters, perform much better than the original GPT-3 model across a number of different benchmarks and human evaluations. As far as the public knows, ChatGPT is only an iterative update to these models.
Why is ChatGPT taking the world by storm then, if the underlying model (GPT-3) and its successors (InstructGPT) have been available for almost 2 years?
Two components have been key to ChatGPT’s viral success.
OpenAI created a publicly available, user-friendly interface that allows anyone to “chat” with ChatGPT at no cost. This has made the tool more engaging and simpler to interact with than alternatives. Outputs produced by the model and shared online have been at different times exciting [solving Advent of Code problems], hilarious, [peanut butter in VCR], and even frightening [creating a virus].
ChatGPT is a specific implementation of GPT-3 that is trained to generate text for conversation. OpenAI has not yet published specific details explaining the difference between it and previous models. However, we do know that a large part of the improvement in performance is due to OpenAI improving its usage of reinforcement learning with human feedback (RLHF). Pioneered in 2017 and first deployed at scale in InstructGPT, this method helps to “align” a model to better follow human instructions by fine-tuning it with actual human feedback.
The human feedback data is made up of:
A set of written prompts previously submitted through the OpenAI API.
A collection of human-generated demonstrations of the desired model response to those prompts.
Although the details of how ChatGPT was tuned are not available, here’s what ChatGPT had to say about it:
What are potential applications of ChatGPT?
One obvious application is in, you guessed it — chatbots! You’ve likely had an experience with a chatbot on a company’s webpage. If your experience using one has been anything like ours, then you probably found it frustrating, and tried to get it to forward you to a human as soon as possible. In comparison, ChatGPT can be a vast improvement over the chat bot models that many organizations currently use to try to automate customer service.
Why stop at chatbots? ChatGPT can be used in any context where people need to write. ChatGPT’s outputs can already serve as a great starting point for productive tasks, including writing blog posts, marketing copy, email campaigns, promotional events, or even helping with search engine optimization. This even includes creative writing, a context where us humans have traditionally had a huge advantage over our machine counterparts. Skeptical? Try it out!
For those of us (like the authors) who write code for a living, ChatGPT also proves a quite adept assistant. It can generate code in a number of different languages, and can do so with ambiguous text prompts. Like GitHub’s copilot, this could be a great way to help supplement code writing and accelerate our work, or a useful way to template, outline, or explain some code.
This section could go on forever, because the possibilities truly seem quite endless at this point. It certainly isn’t perfect, and it isn’t about to put all the humans out of work, but the progress that is already achievable through this free and accessible tool is astounding.
Now that we know the possible benefits, what sort of questions or issues does ChatGPT raise?
The tech world has been buzzing with the release of ChatGPT, but it certainly won’t be the most capable language model to be developed. Technologists, researchers, and large organizations are focused on where we go from here, and deservedly are asking the following questions:
Technical Considerations
What improvements in model capabilities are next?
How can OpenAI (or other researches) make more accurate, less biased language models?
One thought is the quality of text that is used to train the models. Whereas the ChatGPT model was trained using a wide range of text that is available online, there are examples of potentially more useful models being possible when trained using text specifically from scientific literature. [Galactica]
How can organizations integrate ChatGPT (or other language models) into their business model to improve productivity or unlock new capabilities?
We touched on some of the potential use cases and noted that the potential use cases are endless. New technology itself doesn’t help organizations, but the appropriate implementation and change management do.
How can ChatGPT scale?
The growth in number of users has been astounding for the new technology, but can it, or future models like it, continue to scale with demand?
If organizations start to see a benefit, then the number of users and number of prompts will grow rapidly. Does OpenAI commercialize the tool or parts of the tool?
While the capabilities of ChatGPT are certainly impressive, the model still fails in unexpected and dangerous ways, often enthusiastically offering “helpful” advice when prompted by a user to provide instructions for gaslighting friends, hacking into a company’s servers, evaluating applicants based on race, and other tasks which are clearly harmful. We want to state in no uncertain terms that users of ChatGPT, and other models like it, should not trust that the text output by the model is truthful, complete, or unlikely to cause harm without manual human verification of the output.
A few issues that users have already found include:
Hallucination
Numerous examples have shown that ChatGPT, and other large language models like it, often “hallucinate” facts.
For example, a biography of a famous computer scientist might include discussion of her previous accomplishments in mathematics, even though she never published any research in that field or participated in any relevant competitions.
In these cases, the model “hallucinates” by adding in text (the mathematics accomplishments) that is often seen in similar contexts, such as in the biographies of other computer scientists. This issue can cause models to produce subtly incorrect outputs in ways that a non-expert user could find hard to discern.
Bias
People have already found ways of prompting ChatGPT to get it to produce biased text outputs like in the case of this Python code it wrote to identify good developers based on race and sex. Why does this happen?
Much of the text included in large language model training datasets subtly (or unsubtly) exhibit the biases of the people who wrote them, and the realities of the world in which the phenomena described in the text are grounded. Training on such text amplifies existing and/or historical biases against or for certain demographic groups in society.
As models like ChatGPT are deployed and create more text for public consumption on the internet, the training datasets for future models will likely be composed of larger and larger portions of AI-generated data, which could further deepen this problem and create a self-fulfilling, negative cycle of progressively more biased, less accurate models.
Misuse
Many concerns have been raised about the potential for ChatGPT to facilitate cheating in school, cheating in interviews, hot-wiring a car, generating blackmail, and hacking and bug exploitation in existing code.
Despite OpenAI’s efforts at blocking certain prompts and training the model to be harmless and truthful, nobody has yet been able to consistently prevent ChatGPT or models like it from offering “helpful” advice for harmful topics.
Difficulty of identifying provenance
It is increasingly difficult to tell the difference between human-generated and AI-generated text. Even ChatGPT itself cannot reliably determine if a user-supplied portion of text was generated by an AI model or by a human.
As models like ChatGPT are increasingly used, it could become very hard to tell whether any text we read was generated by an actual human being.
Privacy
The data that is used to train the model contains a lot of text, some of which is personal or sensitive. When prompted, ChatGPT and other similar models often will offer up this information without any hesitation.
Disruption
The widespread usage of large language models has the potential to make workers more productive in the present. However, there could come a point in the not-too-distant future when these models are good enough to replace human workers for some tasks.
Corporations, governments, and regulatory bodies worldwide must be much nimbler and more engaged in the process of regulating the usage of these models than they currently are if these harms are to be avoided.
Transparency/interpretability
As models become more and more complex, our understanding of how they will behave in all relevant circumstances becomes less and less.
If more rigorous model testing and evaluation is not demanded and developed, we could very well see more high-profile cases of these models generating harmful or incorrect text when used for critical applications, because the internal rules that they use to make their outputs are not well understood.
Great, so what now?
This field is no stranger to rapid technological advances, so it shouldn’t be a complete surprise that ChatGPT has exploded onto the scene. Despite its capabilities and possible benefits, organizations should exercise just as much caution when seeking to implement it or similar models as they do for other new and disruptive technologies.
For one, organizations can only absorb so much change. Even if the quality of ChatGPT is superior to current text generation tools, the prudent action would be to move slowly. Processes break, and anything that an organization is relying on should be more robust than use this free research tool that is weeks-old and has poorly understood capabilities.
Secondly, understanding and addressing the ethical concerns listed above constitutes an even greater challenge. We’ve seen horror stories of AI being implemented without very careful consideration of the ethical consequences. ChatGPT can certainly be a powerful tool, but carries a great risk to any organization that doesn’t appropriately develop practices and policies that can address its weaknesses.
There is, however, very little risk in experimenting with the tool in its current stage: Give it some prompts. Ask it some questions. Let it humor you, inform you, and even mislead you. The more you’re able to familiarize yourself with the tool and pitfalls, the more prepared you will be when ChatGPT and tools like it are commonplace.
And with the release of other, even more capable and potentially disruptive models in 2023 (i.e GPT-4), there’s no time like today to get familiar with the technology of tomorrow.
After all, it’s only going to get wilder from here.