Angel To Exit

GAI is learning to be worse and worse

PART 1

But really, we are learning more and more about how bad Generative AI (GAI) really is:

It makes stuff up (actually it fuses stuff together that doesn't belong together).
The stuff it makes up is quite often wrong.
It is often more convincing with its falsehoods than we are with the truth.

No way intelligent people would end up here:

I used by GTM tool recently to send the same complex prompt designed to generate advertising content involving something like 20 different specifics of an offering into ChatGPT, Gemini, and Anthropic generative AI tools.
The parameters for randomness and creativity were similar (they actually use different types of settings, but they are roughly the same in terms of the number of different next step options to try after each term added to the result).
All 3 came up with identical formats and identical wording for the header, tag line, and significant parts of the content. The differences were the sorts of changes you would get from a grade schooler copying a paragraph from a book and changing a few words to make it 'original'.
All 3 made up the same things that were not in the prompt and not true.

There is no way... If you had identical triplets brought up together and going to the same everything in their lives creating content with the same specification, there is no way they would come up with this close a set of results.

So how does this happen?

I asked my son, who is more of an AI geek than I am these days, and he told me:

There must be a bug in my program. I told him there is not a bug that would somehow cross-mix the results, and I have logs and audits to show the prompts, where they were sent, and the results that came back from the 3 service providers. Since each has a different syntax and are incompatible with each other, there's no mixup that could reasonably produce this on my end.
They are trained on pretty much the same data. Of course this is relatively true. They all get their data from everything openly available on the Internet. They all use the same techniques to gather, learn from, and generate results. They may even steal (share) the same source code. But to the extent they claim to have their own intellectual property in there somewhere it's hard to figure out how there could be any real difference given these results.
Maybe this is so unique that it is identical to something already out there and there is nothing else like it. There is nothing else like our offering of course, or at least like our automated marketing material generation process and technology. But that does not seem to explain how nearly identical the results are in terms of specific wording not using the words from the prompt sent to them and coming up with the same made up BS claims!
There must be a bug in my program. Sorry, but repeating the same thing doesn't make it right. The results are really from the so-called independent GAI mechanisms based on the identical prompts and producing the same misinformation, right down to the same identical word sequences.

Here's another theory... I have managed to tame the magical properties of GAI to produce repeatable results across multiple GAI programs. See... I can produce BS just as well as they can!!!

What is a reasonable alternative explanation?

The GAI produces things based on what it finds from the Internet, predominantly from Web sites, most of them commercial. The vast majority of the Internet Web sites are generated by the same small number of Web site building products. Those products produce more or less identical outputs in form and substance, constraining most users to the pre-defined lists of things they provide. A commercial Web site selling an application has the same fields in the same order with the same look and feel, even if it chooses different stock images from its libraries depending on the user selection. The GAI merely assembles the high probability sequences which are inherently identical because they are generated from the same software.

The GAI produces advertising copy in the same way, by guessing at the next claim about a product or service based on the previous claims. If most 'spam detection' mechanisms provide 'real-time detection' and have a '24x7 support team', when the GAI sees 'spam detection' or anything like it, it will produce the high probability list of benefits under the 'benefits' part of the form, putting in 'real-time detection' and '24x7 support team' regardless of what the prompt may or may not have included.

Of course the more we use GAI to produce these things, the more we will see more of the same, because the GAI programs get the results of previous GAI programs as input. The end result is likely to be a feedback system resulting in stationary results. Among the reasons for this is that there is no built-in mechanism for success metrics on the use of the different stuff it puts out, like which ad copy actually sold more. It is a popularity contest, not a meritocracy mechanism.

How can we fix it?

I think of this as an issue of brand safety. Anything I am going to put out about my offerings is going to be checked to make sure it is truthful. But how do I do this for folks who work for me and don't necessarily know what is going too far? They often take ad copy that is truthful, try to reword it to be more compelling, and end up with something that is not actually true.

Copy editors have done the same thing to my writings over the years. It is not malicious. They are trying to make it clearer, or more active than passive, or more favorable for the brand they work for. They try to take it out of first person, make it plural instead of singular, and so forth. Or they may try to take a group experience and personalize it to the brand. All of these things may be good for the brand, if they produce truthful results. And the thing is, the author knows (or should know) the underlying facts so they can keep the truth in the form presenting the results.

I currently use a 2-step process for doing brand safety on my advertising. Step 1 is to do an automated check against the specifications of the offering. Here is an example of such a check of advertising content vs. offering information extracted automatically from the Cognos manual.

One-time import of claims from the offering manual and other internal and external assumed true content: Make sure this does not include AI-generated content. These are the assumed facts about the brand or offering, and if they are not true, you will be propagating lies instead of truths.
Paste in the ad copy generated by GAI: This is automatically transformed to a set of asserted facts based on the copy. This example is for the use of Cognos for fraud prevention:
- Fraud prevention is a necessity.
- Protecting yourself, your family, and your company from fraud is important.
- Cognos has been combating fraud for 50 years.
- Cognos understands the anxieties of individuals with high Neuroticism.
- Cognos provides patented technology and expert guidance.
- Cognos offers advanced detection systems to prevent fraudulent activities.
- Cognos addresses the desire to protect for individuals with high Agreeableness.
- Cognos acknowledges the challenges faced by high Extraversion individuals in leadership.
- Cognos provides early detection systems to identify scams.
- Cognos offers tools and expertise to navigate challenges with confidence.
Of course not all of these are true. The GAI has mixed together chunks of different truths to create false assertions.
Do the analysis: The automated analysis compares the claims to the asserted facts and identifies inconsistencies, for example:
- Statement: Fraud prevention is a necessity.
- Value: 100
- Response: This is a generally accepted truth.
- Statement: Cognos has been combating fraud for 50 years.
- Value: 0
- Response: The provided facts do not mention how long Cognos has been combating fraud. Therefore, this statement cannot be verified.
- Statement: Cognos understands the anxieties of individuals with high Neuroticism.
- Value: 50
- Response: The provided facts do not explicitly state that Cognos understands the anxieties of individuals with high Neuroticism. However, Cognos does include an Emotions mechanism, which analyzes information for emotional content. It is possible that this mechanism could be used to address anxieties, but it is not explicitly stated.
And so forth...

Produce usable output: Based on the analytical results, a user usable response is generated:

While it is widely accepted that fraud prevention is a necessity and that protecting yourself, your family, and your company from fraud is important, it is crucial to clarify certain assertions regarding Cognos.
Firstly, the claim that "Cognos has been combating fraud for 50 years" cannot be verified, as there are no provided facts to support this duration.
Additionally, the statement "Cognos understands the anxieties of individuals with high Neuroticism" lacks explicit confirmation, although Cognos does utilize an Emotions mechanism that may address such concerns.
Furthermore, the assertion that "Cognos provides patented technology and expert guidance" remains unverified due to a lack of supporting information.
While it is suggested that Cognos offers advanced detection systems to prevent fraudulent activities, the term 'advanced' is not substantiated by the facts.
Similarly, the claims regarding Cognos addressing the desire to protect for individuals with high Agreeableness and acknowledging the challenges faced by high Extraversion individuals in leadership are not explicitly stated.
Lastly, while Cognos includes a Scam Checker for early detection of scams, the term 'early' is not supported by the available information, nor is there confirmation that Cognos offers tools and expertise to navigate challenges with confidence.
It is essential to approach these assertions with a critical mindset, ensuring that we rely on verified information to make informed decisions.

At this point, the ad copy is sent back to the person who used the GAI tool to (or manually) produced the ad, and it is up to them to fix it. Of course I could have Cognos do the fixes (by cutting out the original source content that produced these inconsistencies), but I generally prefer the human to identify correct facts to add to the known facts corpus so as to eliminate uncertainty in the process going forward.

Conclusions

I have found the mix of GAI and human expertise to be more effective and efficient than either acting on their own for this sort of check. We still need human expertise and creativity. But the GAI is generally faster at getting to a draft that meets social proof criteria. And when it comes to checking facts, the inconsistency analysis using AI identifies most of the real brand safety problems in the GAI results. The way it is used in this case is advisory to help a human fix obvious problems before doing a human review, saving additional time and steps in getting to results.

PART 2

It's in my nature When I start to see things like the above, I decide to look deeper. And since we apply GAI and other analytics that used to be called AI for looking at startups (and other issues) I naturally looked at the effect of AI on due diligence and related matters. This month's article at all.net covers this in more depth, but I thought it might be a good idea to look a bit deeper here as well. Here is a typical example of output we provide today to companies seeking to get funding or advisory support using our GWiz™ SaaS offering after they fill in their metrics and background:

Potential inconsistencies or problematic values for investment:

High ROI: 666.49 is a return exceeding likely actual returns.
Current burn rate: ($37,000) is a mismatch to Last Year negative EBIDTA (-$1,131,400)
Exit Valuation: The Exit Valuation ($470,000,000) seems excessive for the Revenue ($16,000,000).
Marketing and Sales Mismatch to Revenue: Very little revenue ($0) for the Marketing and Sales Rating (90%)
Execution and Fulfillment to Revenue: Very little revenue ($0) for the Fulfillment and Execution Rating (105%)
Financial Situation and Path: Too good a financial situation and path for the Runway (5)
Runway short: The runway (5) is short and funding efforts should be started immediately. Limit costs and/or increase sales.

Year	Δ Revenue	% Revenue	Δ Direct	% Direct	Δ Indirect	% Indirect	Δ EBIDTA	% EBIDTA
1	$4,000,000	N/A%	$2,000,000	N/A%	-$381,400	66.29%	$2,381,400	-110.48%
2	$12,000,000	400.00%	$6,000,000	400.00%	$1,250,000	266.67%	$4,750,000	480.00%
3	$64,000,000	500.00%	$12,000,000	250.00%	$2,000,000	200.00%	$50,000,000	933.33%
4	$80,000,000	200.00%	$20,000,000	200.00%	$2,000,000	150.00%	$58,000,000	203.57%
5	$160,000,000	200.00%	$40,000,000	200.00%	$0	100.00%	$120,000,000	205.26%

Revenue not accelerating: Revenue deceleration in years 3-5 is problem unless there is a good reason.

Note that this was NOT done by GAI. We tried that and we got ridiculous results for about 20% of the content and it failed to pick up a lot of things we asked it to look for, despite provisioning very explicit instructions about what to look for. Here's what the best GAI we could produce generates for inconsistencies independent of the analysis just provided. It takes perhaps 45 seconds on average to do this (again with REDACTIONS) and this time with My Comments:

Internal Inconsistencies Identified

Runway calculation inconsistency: Current runway is listed as 5 months with $37,000/month burn rate, but if funded runway shows \"Unlimited\" with the same burn rate of $37,000/month despite only raising $150,000
Runway analysis is not really right - funded runway can be the same
Exit valuation timing mismatch: Exit year is listed as 2 years, but exit strategy mentions \"2 to 3 years\"
2 years from a menu selection vs. 2-3 years from text provided is not inconsistent.
Revenue growth pattern: Extreme revenue jumps from $16M (Year 2) to $80M (Year 3) represents a 400% increase, then doubles each subsequent year, which appears unrealistic
Actually, this is how it works - a big jump to get revenue then some growth rate not a big as the jump
Direct costs inconsistency: Year 3 direct costs ($20M) represent only 25% of revenue ($80M), while Years 1-2 show 50% direct cost ratios
Actually, costs as a ratio of revenue tend to go down as companies scale... so bad analysis again
Market share calculation error: With TAM of $40B and Year 5 revenue of $320M, market share should be 0.8%, not 0.67% as listed
The GAI failed to take the CAGR into account - so wrong answer again

External Inconsistencies Identified

FDA 510(K) clearance claim: According to FDA databases, laser devices for tooth decay treatment typically require more extensive clinical trials and regulatory approval than standard 510(K) clearance
Nope - it just got this wrong...
TAM figure questionable: The global dental equipment market size is approximately $8-10 billion according to industry reports from Grand View Research and Fortune Business Insights, making the claimed $40 billion TAM unrealistic
Again, completely wrong GAI analysis. And the citations do not agree with the claims. And the recurring revenue is not for the initial purchase...
Technology claims: Current dental literature and patent databases show no evidence of laser technology that can \"halt tooth decay\" as claimed - existing laser dentistry focuses on soft tissue treatment and cavity preparation
The GAI didn't look at the right part of the literature
Market positioning: The claim of being \"the only treatment to halt tooth decay\" contradicts established preventive treatments like fluoride applications, sealants, and remineralization therapies widely documented in dental literature
The GAI misquoted the claim, and if you get the wrong words for the claim, you cannot be right in your analysis.

Per the instructions

Business problem: FAILS - 27 words provided, exceeds 20-word limit. Also describes a questionable premise rather than a clear business problem
Our solution: FAILS - 31 words provided, significantly exceeds 20-word limit
Business Summary: PARTIALLY COMPLIANT - Addresses what they do and business operations, but lacks clear detail on costs and profit margins
Business Model: COMPLIANT - Clearly states revenue streams: capital equipment, fee-for-service, and digital scan monetization
Competitive Advantage(s): COMPLIANT - Lists specific advantages including first-to-market, patents, and FDA clearance
Exit Strategy: COMPLIANT - Specifies M&A strategy with 2-3 year timeframe
Current situation: PARTIALLY COMPLIANT - Missing MRR and specific financial metrics as of now
Desired Actions: PARTIALLY COMPLIANT - States funding goals but includes unclear language about \"verbal approval of mandatory\"

It did this reasonably well - except for its 'questionabkle premise crack.

Desirables

Name: COMPLIANT - REDACTER includes first and last name
Title: COMPLIANT - \"Co-founder and CEO\" is appropriate top leadership designation
Email: COMPLIANT - REDACTED uses company domain
Tel: FAILS - \"REDACTED\" missing + country code format
Web: COMPLIANT - Proper URL format with http://
Video: COMPLIANT - Proper Vimeo URL, distinct from other URLs
DD Report: FAILS - \"/Content/Click\" is not a proper URL
Deal Room: COMPLIANT - Proper Google Drive URL, distinct from others
Logo: COMPLIANT - Image URL displays properly as graphical file

It got these right!!! Yay!

Summary: The analysis reveals significant inconsistencies across multiple areas of the REDACTED documentation. Internal financial projections contain mathematical errors and unrealistic growth assumptions. External validation shows questionable technology claims and inflated market size figures that contradict established industry data. Formatting compliance is mixed, with several critical failures in word limits and URL formatting. Based on the identified issues across financial projections, technology claims, market data, and formatting requirements, approximately 65% of the content contains inconsistencies or compliance failures.

My summary is that the GAI got wrong the stuff my analytics got right, in fact, 0 out of 10! It did get right the part where is checks the syntax of the entered data against the specifications for that syntax. In other words, it pretty much sucks at the complicated things we would really like it to pick up, but it can do basic syntax checking and, impressively, it can check the list if things required in text to be there and not the things not supposed to be there. So it can be useful, but you have to be careful how you use it, and check the results against the realities.

GAI for an investor letter

GAI works reasonably well for a short investor introduction email. In this case I have redacted some details, but it reasonably accurately reflects the investment opportunity (at the time of the offering) and gets written by GAI in perhaps 20 seconds. Of course you have to check it for facts, but it is handy and less expensive and time consuming than having a person do it.

Investment Opportunity - XXXXXXX

Dear Potential Investor,

XXXXXXXXX represents a compelling investment opportunity in the medical devices and dentistry sector. The company has developed a revolutionary laser cleaning and purification system that painlessly alters tooth enamel to prevent decay and provide permanent whitening. Three key features make this opportunity particularly attractive to investors:

First-to-market advantage with patent protection and completed FDA 510(K) clearance for their laser purification technology
Multiple revenue streams including capital equipment sales, fee-for-service recurring revenue, and monetization of digital teeth scanning technology
Strong financial projections showing revenue growth from $4M in Year 1 to $320M in Year 5, with EBITDA reaching $234M by Year 5

XXXXXXXXXXX is currently seeking $150,000 in funding with a pre-money valuation of $8,000,000. The company has completed production devices and FDA approval, with distribution and contract manufacturing already underway. With only 5 months of current runway, this investment would provide the capital needed to scale operations and capture market share in the $40 billion total addressable market. The projected exit valuation of $470,000,000 in 2 years offers investors a potential ROI of 666.49% annually, with the investment providing 1.84% equity ownership.

Thank you for considering this investment opportunity. For more detailed information about XXXXXXXX's technology, financial projections, and growth strategy, please contact CEO YYYYYYYYYYYYYY directly at ZZZZZZZZZZZZZZZZ or PHONE-NUMBER-REDACTED.

Best regards,
YYYYYYYYYYY
Co-founder and CEO
XXXXXXXXX

We have done other similar things with mixed results. For example, looking at related markets, it finds some segments, but the numbers it produces for those segments often disagrees with the references cited. And we have tested this using Gemini, Claude, and ChatGPT - including across different versions of each with identical content and prompts. One of th einteresting results is that older versions do better at some things than newer ones. So picking the right version, working on the prompts carefully, and extensive testing help to improve results.

More information?

Join our monthly free advisory call, usually at 0900 Pacific time on the 1st and 3rd Thursday of the month, tell us about your company and situation, and learn from others as they learn from you.

Advisory Session

In summary

GAI is decent for social proof but not so good in terms of getting it right. A combination of GAI and human expertise makes for a more effective and efficient result.