LLMs, huh, what are they good for?
Absolutely nothing? Or some things.
Over the last year or so, I have been expressing the view that
LLMs are unsuited for lots of things. And that remains true.
But I have also been working through their improvements and uses,
and I have found some things that LLMs are useful for.
Use cases
The technology of large language models is often mistaken for some
form of intelligence. But the term "AI" is not really useful as a
replacement for intelligence. The reality behind the hype of AI has
always been automation. Just like automating factory processes can be
a great improvement, especially for repetitive tasks, so can the many
forms of AI developed over the years. Here are some examples:
- Parsing and rewriting natural language: One of the things
modern LLMs do well is take inputs the way people write or speak them
(after processing them with speech to text), and parse them into
simple statements. Their ability to summarize is quite good, and
because of the randomness parameters in these systems, you can do this
with relatively little change in terminology (or lots if you like
misinterpretation as 'creativity'). By keeping track of what produces
what, you can also retain references back to the reasons the output
was produced (cause and effect with the somewhat nebulous mechanism
of transformers).
- Categorizing linguistic statements: Another thing LLMs are
good at is putting things into simple categories. The more complex and
numerous the categories, the worse it does, and the more precisely the
categories are specified, the better it does. It's really up to you to
define the categories in a way that the LLM will produce useful
results.
- Comparing natural language statements to each other: One of
the best things LLMs seem to do well is comparing simple statements of
fact to each other. For example, if I say as an assumption that all
cars made by Joe have black or white exteriors, and I ask whether
Jane's statement "Joe makes rainbow colored cars" is true under that
assumption, LLM will almost always give me the right answer. This is
very much the sort of parsing that has been done historically, but
LLMs seem to do much better on the association of arbitrary linguistic
expressions than older AI linguistic parsing approaches.
- Generating different forms of output from specifications:
LLMs are increasingly good at generating structured output containing
linguistic expressions. As a simple example, if you ask it to compare
5 statements to 3 facts and provide the output in a table of 3 columns,
providing details of what those columns should contain, it will do this
quite well most of the time.
Putting these things together
Just like any other programming approach, you build up programs
from the assembly of parts. For example, you can provide a list of
facts, tell the LLM to parse human statements into a list of claims
made in those statements, compare the claims to the facts, and produce
a table listing which claims are true, which are false, and which are
not determinable by the facts, producing a factual rebuttal to claims
that are false, and it will do it pretty well, quickly, inexpensively,
and reliably.
Like in programming, where you can express things many different
ways to produce similar results, how you request things from LLMs
leads to different, even if similar results. In traditional
programming, errors accumulate, so for example, 1/3 expressed in
binary always produces an inexact answer. Whether high or low in the
last bit, as you do more and more with it, the errors accumulate unless
controlled, until the answers come out completely wrong, errors of
kind rather than small errors of amount.
Unlike traditional programs, the same input and program often do
not produce the same outputs in LLMs. Part of the problem of building
up more complex use cases is constraining the expansion of outcomes to
desired subsets. The expansion of outputs, if otherwise uncontrolled,
leads to unpredictable results that go far astray as the overall LLM
'program by prompt' method expands minor errors into huge ones in a
less predictable manner, and because the pivot points to get to
errors of kind rather than amount are unclear, some programmatic
process is most effective in constraining results by reforming them,
rejecting a step and trying again, and other similar methods.
Thus today, the most successful attempts I have seen, use LLM for
some steps and traditional programming between those steps. This mix
and how to implement it become a fundamental of success in this space
for business purposes.
Rules of thumb
Here are a few rules to thrive by in the LLM space:
- Determine requirements for trusting results: You really
need to think about a trust architecture here. What do you trust for
what purpose to what extent over what time frame, and how do you limit
the harm by architecting the trust. This is one of the interesting
functions of how you put things together. In essence, the mix of
traditional and LLM programming uses the traditional to limit harm and
the LLM to enhance value.
- Verify form and format of results: The traditional methods
are very good for strictly constraining intermediate (and final)
results, but the LLMs should also be used in such a way as to produce
more predictable results that are more easily checked by the
traditional methods.
- Keep the human in the loop (for now): At the end of the
day, if harm can come from the results, and the harm is sufficient to
warrant the cost, a human check for final (and sometimes intermediate)
results makes sense. How do you decide how much human to put where?
This is determined by your trust architecture.
- It's going to do weird things occasionally: Any time you
use a modern LLM, you will likely at least occasionally, get a weird
result. Without the traditional programming checks or the human review
process, these results will produce some level of mayhem, particularly
when it hits the traditional systems which are much more brittle than
the LLMs, which usually keep running regardless of how bad or
ridiculous the results may become.
- You can trick it - and you can trick people too: A lot of
hubbub is made about jailbreaking LLMs, and the security threats they
produce, and so-called hallucinations (which are no such thing). But
the same is really true of any system or person. In reality, for the
most part, if properly architected, the user trying to trick the
system will only trick themselves. In most sensible applications,
honest people are trying to do an honest job using LLMs to make them
more efficient and effective. People make mistake, so do LLMs, and so
do computers without LLMs. If you have a decent trust architecture,
you will manage these issues, limit harm to reasonable levels, and
have a system worthy of the trust placed in it.
Conclusions
LLMs are good for a lot of things, if properly architected in the
context of a trust architecture for the context of their use.
or as rewritten by ChatGPT: LLMs can be highly effective when
designed within a well-structured trust architecture tailored to their
intended use.
More information?
If you want more details, join us on our monthly free advisory
call, usually at 0900 Pacific time on the 1st Thursday of the month:
and we will be happy to answer any of your questions.
In summary
LLMs are good for lots of things, if you know how to use them well for those purposes.
Copyright(c)
Fred Cohen, 2025 - All Rights Reserved