GPT-3.5 VS GPT-4: Comparing AI Bots

OpenAI recently unveiled GPT-4, the new and improved version of the chatbot that has taken the world by storm. What are the differences between GPT-4 and its predecessor GPT-3.5, though? Let’s compare GPT-3.5 vs GPT-4 and take a look at how much smarter it really is.

Practical Differences

Before we get to the differences in performance, though, first lets go over some practical differences between the two. First off, while anybody with an internet connection can go online and make use of GPT-3.5, only people who have opted into OpenAI’s paid plan (called ChatGPT Plus) get to mess around with GPT-4.

The paid plan is $20 per month (before tax) and not only gives you access to GPT-4, it also gives you priority if the older version is experiencing heavy load. It also makes the answers with GPT-3.5 come faster. We do mean a lot faster, too: once we paid up it seemed like there was no stopping it!

However, even if you pay, you don’t get unlimited access to GPT-4 like you do with GPT-3.5. Instead, you only get to ask it a set number of prompts. Currently, you get 25 prompts per three hours, but as this limit is constantly changing it could be that by the time you read this those restrictions are gone.

Also note that GPT-4 is a lot slower to respond than GPT-3.5. Though this will likely change in the coming months, right now it takes the bot a while to come up with answers, so don’t expect to doing anything too quickly with it.

GPT-4 Is Smarter

As you’d expect from an upgrade, GPT-4 is simply just a lot smarter than GPT-3.5. By this we mean that it’s more creative and better at understanding what you want. Its language model — rethink of it as the way we communicate and the words we use — is just light years ahead. You can see this when you feed it a prompt that has some nuance in it.

For this example, we decided on a relatively simple prompt, inspired by this Reddit thread, asking ChatGPT to “create an example of what a short essay about photosynthesis would look like if it were written by a 10-year-old. Incorporate mistakes the writer would make.” The results are wild. Here’s GPT-3.5’s take on it.

GPT-3.5 on photosynthesis as a child

It’s not too bad, the childish use of language is fairly convincing but there are a few giveaways here and there. It’d be a pretty smart 10-year-old that wrote that. Now check out GPT-4:

GPT-4 on photosynthesis as a child

This is an entirely different kettle of fish: there are spelling mistakes, some grammar issues and overall the use of language is a lot more convincing. Though it’s not perfect, at first glance you would believe your nephew would write this.

These changes reflect in every prompt you could possibly give GPT-4: its use of language is quite simply better and it’s much better at picking up the nuances of human speech, including yours. While you still need to be careful how you phrase prompts, you can expect much better output with less work.

Better Facts

Along with more creativity, GPT-4 is also a bit more trustworthy than GPT-3.5. When it first came out, GPT became famous for its propensity to spout complete gibberish with confidence, like giving out incorrect statistics or messing up historical timelines.

When an AI simply makes stuff up that sounds plausible it’s called a hallucination. It’s like when you had a test in school that you didn’t study for so you just started writing random stuff down in the hope that at least some of it was true.

In all fairness, these problems have become better as people have used GPT-3.5 and it’s become smarter and more knowledgeable. Still, though, you’ll find that it parrots incorrect information, especially if it’s a niche subject. It’s particularly prone to messing up advanced concepts from physics, mathematics or computer programming.

GPT-4, though, is even further ahead. According to a technical report, GPT-4 does roughly 20 percent better than GPT-3.5 does in this regard. Naturally, this doesn’t mean you should blindly trust everything GPT-4 says, as it will sometimes still make things up. If you’re going to use it for school, say, you may want to fact-check it to make sure.

Broadening the Context Window

Besides being smarter, GPT-4 also has a better “memory” than GPT-3.5 does. We put “memory” in quotes because it doesn’t really remember things, but rather puts prompts into the context of what you asked it before. This is called a context window, so how well a generative AI can use information from an earlier prompt into a new one.

GPT-3.5 wasn’t very good at this. If you entered a prompt and then entered a new one based on the output you got it would usually work, but only once or twice. More than a few steps from the original prompt the bot would reset and you would have to enter your parameters again in your prompts.

GPT-4’s context window is a lot larger, “remembering” more and for longer. The more complicated you make things, the worse it gets at recalling what you asked before, but overall after playing around for a bit we have to say it responded quite well to our prompts. We predict it will save people who use it intensively a lot of time.

What will likely also help is that GPT-4 can handle a lot more input, up to 3,000 words per prompt. This is perfect if you want to feed it an example text to work with and works well in conjunction with the broadened context window.

Safety

Finally, because of all its improvements GPT-4 is a lot “safer” than GPT-3.5. By safer we mean that there’s a lot less chance of getting so-called toxic responses, so answers that involve nasty racist or sexist opinion.

Language models have a pretty poor performance in this regard: in 2016, for example, Microsoft’s Tay chatbot had to be taken offline in mere hours after being taught some pretty nasty stuff by the internet’s less salubrious elements. The plug was pulled after Tay started parroting white supremacist talking points and spouting nonsense about the 9/11 attacks.

GPT-3.5 was a big improvement in this regard in that it would not regale its users with extreme right-wing slogans, but still would come up with a few claims you’d not bring up at your dinner table — at least not in polite company. Likely this is because of the way the language model was trained over the course of months.

OpenAI seemingly has gotten even better at training, since with GPT-4 toxic results have been reduced even further, with only about one percent of responses being classified as toxic. This should make GPT-4 a more trustworthy resource overall.

When Should You Use GPT 3.5 vs GPT-4?

While GPT-4 is a lot better in every respect, it comes with two bottle necks: it’s slower and it also has a cap on how much you can use it. As a result, you’re probably best off only using it for the heavy-duty prompts, ones that require a lot of input and thus come up with nuanced answers.

For everything else, GPT-3.5 is likely the better option: answers come snappy and though you may have to do a bit more fact-checking, it’s reliable enough for most simple tasks. In the end, mixing and matching both versions will probably get you some amazing results for relatively little effort.

Read Full Bio

Fergus O'Sullivan Tech Writer

VPN security expert, Fergus O'Sullivan, leverages a decade of journalism experience from Taiwan's daily press to top-tier tech publications. He has tirelessly ensured consumer protection via 100+ honest software reviews, upholding integrity in tech.

Latest from Author

What is the Best VPN for Torrenting in 2025? ChatGPT vs Search Engines — What’s the Difference & When to Use Each