GAI is learning to be worse and worse

PART 1

But really, we are learning more and more about how bad Generative AI (GAI) really is:

No way intelligent people would end up here:

There is no way... If you had identical triplets brought up together and going to the same everything in their lives creating content with the same specification, there is no way they would come up with this close a set of results.

So how does this happen?

I asked my son, who is more of an AI geek than I am these days, and he told me:

Here's another theory... I have managed to tame the magical properties of GAI to produce repeatable results across multiple GAI programs. See... I can produce BS just as well as they can!!!

What is a reasonable alternative explanation?

The GAI produces things based on what it finds from the Internet, predominantly from Web sites, most of them commercial. The vast majority of the Internet Web sites are generated by the same small number of Web site building products. Those products produce more or less identical outputs in form and substance, constraining most users to the pre-defined lists of things they provide. A commercial Web site selling an application has the same fields in the same order with the same look and feel, even if it chooses different stock images from its libraries depending on the user selection. The GAI merely assembles the high probability sequences which are inherently identical because they are generated from the same software.

The GAI produces advertising copy in the same way, by guessing at the next claim about a product or service based on the previous claims. If most 'spam detection' mechanisms provide 'real-time detection' and have a '24x7 support team', when the GAI sees 'spam detection' or anything like it, it will produce the high probability list of benefits under the 'benefits' part of the form, putting in 'real-time detection' and '24x7 support team' regardless of what the prompt may or may not have included.

Of course the more we use GAI to produce these things, the more we will see more of the same, because the GAI programs get the results of previous GAI programs as input. The end result is likely to be a feedback system resulting in stationary results. Among the reasons for this is that there is no built-in mechanism for success metrics on the use of the different stuff it puts out, like which ad copy actually sold more. It is a popularity contest, not a meritocracy mechanism.

How can we fix it?

I think of this as an issue of brand safety. Anything I am going to put out about my offerings is going to be checked to make sure it is truthful. But how do I do this for folks who work for me and don't necessarily know what is going too far? They often take ad copy that is truthful, try to reword it to be more compelling, and end up with something that is not actually true.

Copy editors have done the same thing to my writings over the years. It is not malicious. They are trying to make it clearer, or more active than passive, or more favorable for the brand they work for. They try to take it out of first person, make it plural instead of singular, and so forth. Or they may try to take a group experience and personalize it to the brand. All of these things may be good for the brand, if they produce truthful results. And the thing is, the author knows (or should know) the underlying facts so they can keep the truth in the form presenting the results.

I currently use a 2-step process for doing brand safety on my advertising. Step 1 is to do an automated check against the specifications of the offering. Here is an example of such a check of advertising content vs. offering information extracted automatically from the Cognos manual.

My summary is that the GAI got wrong the stuff my analytics got right, in fact, 0 out of 10! It did get right the part where is checks the syntax of the entered data against the specifications for that syntax. In other words, it pretty much sucks at the complicated things we would really like it to pick up, but it can do basic syntax checking and, impressively, it can check the list if things required in text to be there and not the things not supposed to be there. So it can be useful, but you have to be careful how you use it, and check the results against the realities.

GAI for an investor letter

GAI works reasonably well for a short investor introduction email. In this case I have redacted some details, but it reasonably accurately reflects the investment opportunity (at the time of the offering) and gets written by GAI in perhaps 20 seconds. Of course you have to check it for facts, but it is handy and less expensive and time consuming than having a person do it.

We have done other similar things with mixed results. For example, looking at related markets, it finds some segments, but the numbers it produces for those segments often disagrees with the references cited. And we have tested this using Gemini, Claude, and ChatGPT - including across different versions of each with identical content and prompts. One of th einteresting results is that older versions do better at some things than newer ones. So picking the right version, working on the prompts carefully, and extensive testing help to improve results.

More information?

Join our monthly free advisory call, usually at 0900 Pacific time on the 1st and 3rd Thursday of the month, tell us about your company and situation, and learn from others as they learn from you.

Advisory Session

In summary

GAI is decent for social proof but not so good in terms of getting it right. A combination of GAI and human expertise makes for a more effective and efficient result.

Copyright(c) Fred Cohen, 2025 - All Rights Reserved