Last week I heard two people involved in LLM product development, claiming that “LLMs deteriorate over time”. Have you experienced similar issues? I had to dig further and see if other reports on that.
The first mention of large language model deterioration was from July 2023, where 3 computer scientists (two from Stanford and one from Berkeley) performed tests on GPT4 and GPT 3.5 - first in March and then in June, and saw model performance drop.
Last month, another article on the topic was released, again suggesting newer LLMs perform worse compared to older models.
Why LLMs may deteriorate over time?
In summary the reasons for supposed model deterioration might be found in:
- The training data
- The parameters defining the model
New models vs Old models
Now, new models can be affected by both the training data and the model parameters. Old models, however, can suffer only if model parameters are being played with.
Here's what Lauren Leffer says, source:
“Unlike in a traditional computer program, where each line of code serves a clear purpose, developers of generative AI models often cannot draw an exact one-to-one relationship between a single parameter and a single corresponding trait. This means that modifying the parameters can have unexpected impacts on the AI’s behavior.”
Yes, one cannot bid on LLMs being deterministic. Being able to communicate like humans and change wording, so they sound more natural, is basically one of the best feature in their list.
Is there anything you can do about it?
As LLM performance depends a lot on the input, one cannot be sure how their apps, using certain large language models will perform over time, unless they run tests.
The tests can include 20-25 questions, which are important for your app/business case, and you can run those questions at certain periods of time and record the answers. Then, by comparing answers from different tests, you will be able to see if the model still does a good job or you need to change your prompts, or even switch to a new LLM. Check here for more info on LLM test.