GAI is learning to be worse and worse
But really, we are learning more and more about how bad Generative AI (GAI) really is:
No way intelligent people would end up here:
All 3 made up the same things that were not in the prompt and not true.
There is no way... If you had identical triplets brought up together and going to the same everything in their lives creating content with the same specification, there is no way they would come up with this close a set of results.
So how does this happen?
I asked my son, who is more of an AI geek than I am these days, and he told me:
Here's another theory... I have managed to tame the magical properties of GAI to produce repeatable results across multiple GAI programs. See... I can produce BS just as well as they can!!!
What is a reasonable alternative explanation?
The GAI produces things based on what it finds from the Internet, predominantly from Web sites, most of them commercial. The vast majority of the Internet Web sites are generated by the same small number of Web site building products. Those products produce more or less identical outputs in form and substance, constraining most users to the pre-defined lists of things they provide. A commercial Web site selling an application has the same fields in the same order with the same look and feel, even if it chooses different stock images from its libraries depending on the user selection. The GAI merely assembles the high probability sequences which are inherently identical because they are generated from the same software.
The GAI produces advertising copy in the same way, by guessing at the next claim about a product or service based on the previous claims. If most 'spam detection' mechanisms provide 'real-time detection' and have a '24x7 support team', when the GAI sees 'spam detection' or anything like it, it will produce the high probability list of benefits under the 'benefits' part of the form, putting in 'real-time detection' and '24x7 support team' regardless of what the prompt may or may not have included.
Of course the more we use GAI to produce these things, the more we will see more of the same, because the GAI programs get the results of previous GAI programs as input. The end result is likely to be a feedback system resulting in stationary results. Among the reasons for this is that there is no built-in mechanism for success metrics on the use of the different stuff it puts out, like which ad copy actually sold more. It is a popularity contest, not a meritocracy mechanism.
How can we fix it?
I think of this as an issue of brand safety. Anything I am going to put out about my offerings is going to be checked to make sure it is truthful. But how do I do this for folks who work for me and don't necessarily know what is going too far? They often take ad copy that is truthful, try to reword it to be more compelling, and end up with something that is not actually true.
Copy editors have done the same thing to my writings over the years. It is not malicious. They are trying to make it clearer, or more active than passive, or more favorable for the brand they work for. They try to take it out of first person, make it plural instead of singular, and so forth. Or they may try to take a group experience and personalize it to the brand. All of these things may be good for the brand, if they produce truthful results. And the thing is, the author knows (or should know) the underlying facts so they can keep the truth in the form presenting the results.
I currently use a 2-step process for doing brand safety on my advertising. Step 1 is to do an automated check against the specifications of the offering. Here is an example of such a check of advertising content vs. offering information extracted automatically from the Cognos manual.
At this point, the ad copy is sent back to the person who used the GAI tool to (or manually) produced the ad, and it is up to them to fix it. Of course I could have Cognos do the fixes (by cutting out the original source content that produced these inconsistencies), but I generally prefer the human to identify correct facts to add to the known facts corpus so as to eliminate uncertainty in the process going forward.
Conclusions
I have found the mix of GAI and human expertise to be more effective and efficient than either acting on their own for this sort of check. We still need human expertise and creativity. But the GAI is generally faster at getting to a draft that meets social proof criteria. And when it comes to checking facts, the inconsistency analysis using AI identifies most of the real brand safety problems in the GAI results. The way it is used in this case is advisory to help a human fix obvious problems before doing a human review, saving additional time and steps in getting to results.
It's in my nature When I start to see things like the above, I decide to look deeper. And since we apply GAI and other analytics that used to be called AI for looking at startups (and other issues) I naturally looked at the effect of AI on due diligence and related matters. This month's article at all.net covers this in more depth, but I thought it might be a good idea to look a bit deeper here as well. Here is a typical example of output we provide today to companies seeking to get funding or advisory support using our GWiz™ SaaS offering after they fill in their metrics and background:
| Year | Δ Revenue | % Revenue | Δ Direct | % Direct | Δ Indirect | % Indirect | Δ EBIDTA | % EBIDTA |
|---|---|---|---|---|---|---|---|---|
| 1 | $4,000,000 | N/A% | $2,000,000 | N/A% | -$381,400 | 66.29% | $2,381,400 | -110.48% |
| 2 | $12,000,000 | 400.00% | $6,000,000 | 400.00% | $1,250,000 | 266.67% | $4,750,000 | 480.00% |
| 3 | $64,000,000 | 500.00% | $12,000,000 | 250.00% | $2,000,000 | 200.00% | $50,000,000 | 933.33% |
| 4 | $80,000,000 | 200.00% | $20,000,000 | 200.00% | $2,000,000 | 150.00% | $58,000,000 | 203.57% |
| 5 | $160,000,000 | 200.00% | $40,000,000 | 200.00% | $0 | 100.00% | $120,000,000 | 205.26% |
Note that this was NOT done by GAI. We tried that and we got ridiculous results for about 20% of the content and it failed to pick up a lot of things we asked it to look for, despite provisioning very explicit instructions about what to look for. Here's what the best GAI we could produce generates for inconsistencies independent of the analysis just provided. It takes perhaps 45 seconds on average to do this (again with REDACTIONS) and this time with My Comments:
Internal Inconsistencies Identified
Runway analysis is not really right - funded runway can be the same
2 years from a menu selection vs. 2-3 years from text provided is not inconsistent.
Actually, this is how it works - a big jump to get revenue then some growth rate not a big as the jump
Actually, costs as a ratio of revenue tend to go down as companies scale... so bad analysis again
The GAI failed to take the CAGR into account - so wrong answer again
External Inconsistencies Identified
Nope - it just got this wrong...
Again, completely wrong GAI analysis. And the citations do not agree with the claims. And the recurring revenue is not for the initial purchase...
The GAI didn't look at the right part of the literature
The GAI misquoted the claim, and if you get the wrong words for the claim, you cannot be right in your analysis.
Per the instructions
It did this reasonably well - except for its 'questionabkle premise crack.
Desirables
It got these right!!! Yay!
Summary: The analysis reveals significant inconsistencies across multiple areas of the REDACTED documentation. Internal financial projections contain mathematical errors and unrealistic growth assumptions. External validation shows questionable technology claims and inflated market size figures that contradict established industry data. Formatting compliance is mixed, with several critical failures in word limits and URL formatting. Based on the identified issues across financial projections, technology claims, market data, and formatting requirements, approximately 65% of the content contains inconsistencies or compliance failures.
My summary is that the GAI got wrong the stuff my analytics got right, in fact, 0 out of 10! It did get right the part where is checks the syntax of the entered data against the specifications for that syntax. In other words, it pretty much sucks at the complicated things we would really like it to pick up, but it can do basic syntax checking and, impressively, it can check the list if things required in text to be there and not the things not supposed to be there. So it can be useful, but you have to be careful how you use it, and check the results against the realities.
GAI for an investor letter
GAI works reasonably well for a short investor introduction email. In this case I have redacted some details, but it reasonably accurately reflects the investment opportunity (at the time of the offering) and gets written by GAI in perhaps 20 seconds. Of course you have to check it for facts, but it is handy and less expensive and time consuming than having a person do it.
Dear Potential Investor,
XXXXXXXXX represents a compelling investment opportunity in the medical devices and dentistry sector. The company has developed a revolutionary laser cleaning and purification system that painlessly alters tooth enamel to prevent decay and provide permanent whitening. Three key features make this opportunity particularly attractive to investors:
XXXXXXXXXXX is currently seeking $150,000 in funding with a pre-money valuation of $8,000,000. The company has completed production devices and FDA approval, with distribution and contract manufacturing already underway. With only 5 months of current runway, this investment would provide the capital needed to scale operations and capture market share in the $40 billion total addressable market. The projected exit valuation of $470,000,000 in 2 years offers investors a potential ROI of 666.49% annually, with the investment providing 1.84% equity ownership.
Thank you for considering this investment opportunity. For more detailed information about XXXXXXXX's technology, financial projections, and growth strategy, please contact CEO YYYYYYYYYYYYYY directly at ZZZZZZZZZZZZZZZZ or PHONE-NUMBER-REDACTED.
Best regards,
YYYYYYYYYYY
Co-founder and CEO
XXXXXXXXX
We have done other similar things with mixed results. For example, looking at related markets, it finds some segments, but the numbers it produces for those segments often disagrees with the references cited. And we have tested this using Gemini, Claude, and ChatGPT - including across different versions of each with identical content and prompts. One of th einteresting results is that older versions do better at some things than newer ones. So picking the right version, working on the prompts carefully, and extensive testing help to improve results.
More information?
Join our monthly free advisory call, usually at 0900 Pacific time on the 1st and 3rd Thursday of the month, tell us about your company and situation, and learn from others as they learn from you.
In summary
GAI is decent for social proof but not so good in terms of getting it right. A combination of GAI and human expertise makes for a more effective and efficient result.
Copyright(c) Fred Cohen, 2025 - All Rights Reserved