January 30, 2026

A 10,000ft Overview of Large Language Models

How LLMs work in general, and my experience using them as a tool.

Aerial view of green fields with roads, scattered houses and forests in a tropical environment.

In the past week or two a number of articles and studies related to Large Language Models (LLMs) have caught my attention. I think it’s worth spending a bit of time going over LLMs in general, so I can later comment on some of the linked articles from a secure base, and touch on how their actual usage does and doesn’t line up with the popular narrative.

I’m not going to try and argue for or against LLMs, but instead I want to establish a basic understanding of how they work, and why some of the failure states I’ve run into may have come about.

Kantan Kanban status

But first a very quick aside about the Kanban program - things came up and I didn’t get it fully packaged and up on PyPi.org. I finished a minor refactor to help the packaging process, but have some checks to do before I’m 100% certain it’s good to go. I’ll have another short update next week.

LLM articles that prompted this post

I’ll come back to cover some of these in detail later on. I feel it’s probably more helpful to lay the groundwork of how LLMs work as I understand it, along with my experience using them. So you can judge the articles and my responses in context.

Impact in Corporations

LLM in Society and Organizations

Environmental Impact

My experience with Large Language Models (LLMs)

I didn’t use any of the LLMs for the first year or so after they were first released. I only started to use them to a degree after Ollama became relatively accessible on Gentoo/Linux - after Gemma2 was released.

It has only been somewhere around one year that I’ve been using any of the web-based LLMs (ChatGPT, Gemini, Qwen, etc.).

In both cases, I think my use has been fairly typical. I’ve used locally hosted ones for brainstorming, some editorial and tech related concerns, things like that. For the web-based ones I’ve only really used it for: editorial suggestions and programming problems (troubleshooting).

Generally my preference is for self-hosted, and that’s the case for LLMs as well, though the size of the models makes the trade-offs of a local model more noticeable than in other fields.

My Results

It’s been a long time since I tried brainstorming with an LLM, but I thought it was OK at it, with some caveats. The biggest drawbacks as I remember them were: it was very supportive/positive even if an idea wasn’t actually practical, it didn’t act as a “sounding board” as much as I wanted - it was likely to suggest other ideas/options (that would be great in some cases, but wasn’t what I was looking for as much).

I’ve used LLMs for editorial/proofreading checks more than anything else. Overall I think that for this use-case it works pretty well. When I first started this blog, I noticed that I was falling into some bad habits - run-on sentences, over-explaining/defensiveness, etc. While I detected these issues, I had a hard time imagining how to fix them - this was where LLMs were very helpful, but providing decent alternatives it helped illuminate other approaches. As the blog has gone on my writing in general has improved, and while I still do generally run things through for a check, my usage is much lower than initially.

In a blog post a couple of weeks ago I briefly mentioned my experience using an LLM for technical issues. I’ll just briefly note that that was me trying to understand why my drag and drop implementation, based on Qt’s QML documentation, didn’t work. That was a fairly frustrating experience, where I ended up looping through very similar, but seemingly different enough, solutions were given by a couple of LLMs. Eventually I got it working as expected, but it wasn’t what I’d call smooth.

More recently, in preparing for packaging, I decided to do a sanity check and ask about “gotchas” for my preparation for PyInstaller and Setuptools. Because this happened after I started preparing for this post I’ll be more detailed, and while I reference one LLM the exact same thing could happen with any one.

I received a general confirmation that my approach matched best-practices, with a suggestion to use a “resources.rcc” file to make sure all of the QML files were imported smoothly.

I did a check, and found PythonGUIs.com also highly recommended the use of “resources.rcc”, so I decided it would be worth exploring. I created the file, and then found out that it needed to be converted for Python.

ChatGPT said to use pyrcc6 to perform the conversion. I didn’t have that on my system, but through some back and forth I did some more digging locally to see if I was missing it.

After a while I did some more general searching - which led me to discovering that pyrcc6 does not exist. It seems that in 2022, the developer who would have made pyrcc6 decided that it wasn’t worth the effort and ended its development. So, the core tool for this line of inquiry had not existed for a minimum of 4 years, but it was a stable suggestion across numerous prompts and responses.

Overall, my experience with programming related tasks have been mixed. I’d say it’s been OK with Python and more general question; things where there is a lot of material out there. But, in more niche cases like QML where things aren’t nearly as well documented, it seems to really struggle; though the pyrcc6 instance shows that it can also hit more popular topics as well.

Why might things work out this way?

How Do LLMs work (A very ‘Cliffs Notes’ style explanation)

First a disclaimer: I’m not an expert! I’m an outside observer - someone interested in technology, and comfortable with more advanced systems but I wouldn’t call myself a developer especially not of AI or related items. For this I’m going for an “explain like I’m 5” approach, so I will be avoiding as many of the more technical details as I can (vectors would be a big one, quantization, reinforcement, are all examples of what I’ll skip). This should be viewed as a 10,000ft overview. If you’re curious, please do research in more technical venues.

You might also have noticed that I’ve been careful to use the term “LLM”. I’m being careful to avoid words that I think come with baggage and which might be descriptive but could lead to unhelpful anthropomorphism of LLM models. Words like “learn”, and “understand” in particular are often used, but from what I’m able to see and understand don’t accurately reflect what is happening, at least how the average person would understand it. I will use “training” and “hallucinations” because I’m not aware of more neutral alternatives, but I have reservations about “hallucinations” implying perception where it doesn’t exist. Likewise I’m going to stick with LLM over AI, as I think that AI comes with a lot of associations from popular culture which aren’t particularly helpful.

LLM Training

Training starts with: a huge amount of text, that is often described as a large portion of all text currently in existence, which is then highly curated and fed into the program. (That raises the question of if it’s possible to curate such a large set of data with any stringency - I would argue it isn’t, which raises concerns.) That program creates something like a spreadsheet, where every word as it occurs in relation to other words that relationship is given a specific value. Because vast amounts of text is put in, those numeric representations can become tremendously large, and represent an extremely large number of unique combinations and relationships.

I want to plant a flag here about LLM generated text for LLM training, and studies indicating that this is a poison pill - leading to worse output. That’s something I want to come back to later as I think it has broader implications.

Based on those quantified relationships, the model is prompted to fill in missing text or to finish a partial bit of text (sentence or paragraph); the closer it gets to matching the input the better, and numbers are adjusted as needed to achieve that. The people behind it will also have some text they didn’t give to the model to quantify, and check that it can accurately predict that as well, not just things it has already quantified.

The people behind it might make additional tweaks, or add extra textual data for specific topics, etc. until it meets whatever goal they had set. Then, when it’s finalized, there are some steps to streamline the model to make it smaller and more efficient - how much of either I can’t say.

LLM Prompting

From that, when you provide a prompt to an LLM, it basically repeats the steps that were done during the training:

It takes your input text and parses it to get the numeric representation of all of the words and their relationships.
It compares that numeric representation to the enormous amount of numeric representations it has on file
It determines what the most likely set of text would be after the text you provided

Each model also has a certain amount of “warmth”, which is essentially a randomization engine that slightly tweaks the numbers for the response text, so that you are extremely unlikely to get a rote response (it should also not fully write text that was provided during training, though there seem to be instances where it has)

This randomization is possibly (in my mind almost certainly) a large factor in “hallucinations” - where the LLM will respond with demonstrably false, inaccurate, and/or potentially harmful answers. As long as adding randomness is necessary, it will be impossible to eliminate the risk that those adjustments compound with other factors to create a “hallucination”.

It’s worth noting that there is a some speculation here - due to the size and complexity of the data being used, no one completely understands how the full model works and can draw a chain of causation from start to finish. I don’t think that can be emphasized enough - there is a lack of understanding on the details of how the different parts come together in a full LLM model. There are certainly conceptual bases that we are pretty sure about in isolation, like drawing correlations between the elements of a body of text; but exactly how a given set of output from a model relates back to the training data is unclear.

That only covers text-based LLMs, and not other “Generative AI” like for images or video. Though the core technology is essentially the same, it’s just that instead of blocks of text (say a web page, or a book), images or snippets of digital video are used instead - and I don’t know what the token size equivalent would be, if it’s down to pixels or is more abstract like a 16x16 pixel block. Given the size of digital images and video, I have to assume a fair amount of abstraction is happening.

In Conclusion

My experience with multiple models providing replies that didn’t resolve the given issue, or even reflect current reality, is a result of how the technology works. While a definitive explanation for the pyrcc6 issue isn’t possible, there are some reasonable guesses given what we know about how these tools work.

At some time there was a pyrcc6 and it had a good amount of discussion online (this seems likely, as I find it in comments from the past year or two)
Similar tools (say pyrcc5, or rcc for the C++ codebase) were correlated enough, and had a high enough frequency that they artificially made a non-existent tool count as if it were real and being discussed (perhaps the tokens for the words aligned in such a way that they had “cross-talk” or something)
There are market pressures to have the LLM respond: confidently and positively, leading to information that would otherwise have lower confidence get portrayed in a more positive light
And the process of collecting, curating, and then training on such large volumes of text leaves questions on how things are selected, what biases are in place, what the curation thresholds are, what training and refinement pressures are in place beyond objective accuracy, etc. All of these issues affect every LLM, and are worth considering when using any of them as a tool in your work.

Takeaways

I’ve found LLMs to be the most useful when I approach them with this thought in mind: whatever the LLM provides in response something like the Wisdom of the Crowd’s response to what you put in. That can still be useful, especially in areas where there are a large number of knowledgeable people who have shared their thoughts. But it can also be misleading in areas with less available knowledge or higher noise to signal ratios, or where system prompts/pressures can lead to incorrect responses.

Some things to consider when using an LLM:

the confidence of an output is not a good indicator of correctness
is your question something that is likely to be well-represented in existing knowledge, or is it a niche, poorly known/documented domain, or something that’s quickly evolving?
would you find a more general “Wisdom of the Crowd” answer useful or not, and would you be able/willing to expand on the response yourself?
be cautious about anthropomorphizing the LLM - this is hard because the way the text is output understandably encourages user engagement, and skews towards positive/deferential/flattery adjacency

If you keep these things in mind, I think you’ll tend to have better outcomes and be well equipped to handle the times an LLM response goes sideways.

Next Time

Next week I will one or two of the links included above on how LLMs and organizations/society at large seem to be interacting at this point.

References

Photo Credit: Photo by oleks film on Unsplash