Search on TFTC
Busting Myths in AI

Busting Myths in AI

Dec 30, 2023

Busting Myths in AI

There are myths and misconceptions in every industry. The AI space is near the top, due to its potential, and the poorly understood nature of intelligence. People are either over or underestimating things, and simply misconstruing capabilities, every single day.

This article is a rapid fire myth busting for easy future reference.

You just need MOAR data

You often hear that ChatGPT was trained on the entire internet. This is not only false, but off by so many orders of magnitude, it would make your head spin. GPT-3 -175B was ~600 GB (300B tokens) while the internet was 64,000,000,000,000 GB. That’s a bit like comparing your neighborhood to the size of the Sun. Or more accurately, if all the data on the internet was represented by the entire surface of the Earth, then all of ChatGPT’s data would only be represented by about 478 square centimeters (or about 74 square inches), or approximately the area taken up by a typical dinner plate.

Why is that so?

It’s because most of the data out there is not in a useful format for training a language model. In fact, you can think of data like untapped, raw materials: it has to be cleaned and refined, before it can be used.

Remember that Language Models are trained on the relationship between words and sentences. The examples must be representative of what you want the model to produce. Less is more. Higher quality, and only if possible, greater quantity - but not at the expense of quality.

You can just “train a model on your own data”

Well, not really. When people are saying this, what they’re often talking about is some form of retrieval augmentation, ie: “RAG”.

This is not the same as fine-tuning or training a large language model! RAG stands for Retrieval Augmented Generation, which utilizes external data sources at inference time to add additional context for responses. This is a very different process and technique than pre-training or fine-tuning a model. RAG should be treated as a supplemental tool, but it is not actually “training”. Nothing is changed about the underlying model at all. It is a way to prototype, or create a “chat to your documents” mini-agent, but it’s not “your own model” and it’s definitely not robust enough to have long, meaningful or contextual conversations with.

RAG shortens the context window because of all the extra context you need to inject into the abstracted prompt. This means that after a few responses it will lose context. You can mitigate this with a sliding context window prompt, but this is not a great fix assuming you want a fluid, useful dialogue with the model.

Ultimately, training your own model on your own data will require a lot more data than what you would use in a RAG scenario, and is an entirely different exercise which remains expensive and out of reach for the average person.

We need “unbiased” language models

Bias is not something that can or even should be removed from language, discourse or personalities. Bias is another word for preference, or opinion, or “worldview.” All discourse, all data, all information has within it an implicit bias.

When it comes to language models, since they reflect some aggregate of the data they’re trained on, they will fundamentally also reflect that bias. It’s inescapable. The workaround is of course to put guardrails on the model, to inject pre and post-framing for every response (as is done with ChatGPT these days) but that doesn't remove the bias - it just creates a bad user experience.

Trying to eliminate bias is like trying to flatten everything. It’s a Quixotic pursuit. The focus instead should be on being clear about what the bias is, and building many alternatives. Since a bias is just a model of the world, we want many of these, not just one or a few.

When people are talking about “unbiased AI” they are either misinformed, naive, or in some cases, using that as a way to claim a moral high ground in order to impose rules around what language or styling is “acceptable.”

AI gets rid of the need for human work

The idea that an AI will one day replace humans — either by taking your job or by annihilating humanity — is a scary concept. It has inspired a plethora of films and books, so nervousness about the implications of AI is understandable.

But if the last few years have demonstrated anything, it’s that when people are scared and falsely-informed, that they make the worst decisions.

It’s important to note, models not only perform significantly better when there is human feedback and human-generated data involved, as per what we’ve documented in our report, but that models are only as useful as the person who uses them. Nothing has changed about the nature of tools. There is an actor and a tool. AI is merely a tool which if used well, can yield superior results.

AGI is Around the Corner

Related to the above unfounded fear is AGI. It’s nebulous enough to be scary, and people, who otherwise have not enquired into the nature of either intelligence or consciousness, often believe that somehow something sentient will emerge from the circuits. As a result, we must either ban, or form a regulatory body to “manage it”, of course, “for our safety”.

I am personally not of the opinion (which is not shared by everyone) that AI is suddenly going to become sentient and rule or take over the world. AGI and the singularity is a red-herring.

The real danger is of AI as a tool being wielded only by those who have questionable intentions, or a poor track record with other tools. Examples abound.

The existential risk is that such entities or groups embed power AI tools into every layer of society, and therefore reduce human liberties and dull the color of life.

It is this threat we want to counteract. The idea is to build AI-enabled tools that enhance human flourishing, and bring more color and nuance to life.

“We’re all going to have our own AI, on our local machines”

This may happen, one day. But not for a while. Perhaps even decades. Why?

First of all, it's related to what’s mentioned above about fidelity. The technology has a long way to go before it becomes more science than art. To properly train and tune smaller scale models for every person will require a whole host of frameworks, pipelines and tools that simply do not exist today. Furthermore, the compute necessary is just not available.

Second, and this is a less understood, more insurmountable factor - as better large-scale cloud models are released, they will raise the minimum acceptable bar, and thus make these smaller DIY models less interesting and useful. This has more to do with the human condition than it does the efficacy of the self-hosted models.

Notice how the first time something happens, “it’s a miracle” and then it becomes normalized. It happens with flying on an airplane, with using the internet and it happened with ChatGPT. Everyone lost their minds for a minute, and now it’s just another app.

How this relates to the point is that at no point, no matter how much better compute and local hosted models get, will they exceed the capabilities of larger, cloud-hosted models. And these larger cloud-hosted models (whether ChatGPT or other) will set the bar for usage, quality, functionality, etc. Using your local model will be like going from an airline back to a wooden sailboat, or from a car back to the horse and buggy.

Now, before you say: “but there are small models outperforming the large one’s already”, please read the next point.

Benchmarking and Evaluations

This one is not so much a myth, but a misconception.

When people see that “x” model has outcompeted “y” model with more or less parameters, they immediately assume x model is better. This is not necessarily true.

Why? Because, evaluations and benchmarks are not only subjective, but they are narrow and can only evaluate models within the window they apply. So what happens is two things:

  1. People game the results to hit the leaderboards. Models are often tuned specifically around a series of evaluation metrics and benchmarks. This means they perform well in those tests but not so well outside it.
  2. This creates the false assumption that the models are broadly better than they really are. It’s easy to go look at a Hugging Face leaderboard and assume that it applies across everything. This is why GPT-4 continues to outperform all of the open source models, and why everyone continues to use it.

This is not to say benchmarking and evaluation is bad. It’s just misunderstood, and as a result, people project forward erroneously. In fact, benchmarking and evaluation is necessary, particularly for projects like ours, which are domain-specific. Because we can, within our domain, show that what we’ve built, outcompetes models x, y and z. We cannot claim our model is useful outside of this context - but that is fine, because we’re not claiming anything beyond that.

Open-Source VS Closed-Source

Also not a myth, but a series of misconceptions.

The first confusion relates to what is being open-sourced? The data or the model? Notice that very, very few groups open-source the data sets. In fact, I’m not sure I’ve seen one major “Open-Source” model, also open source their full database. This is because it comes with a whole host of legal implications.

What they do willingly open source are the weights and biases. And this is great, but outside of a few data scientists around the world, it doesn’t mean a lot to most people. Very few are going to print out the parameters and check for themselves.

This is not to disparage open source at all. It’s extremely important because so long as some people can check, that is great. Therefore in this context, closed source is not so different. It basically means you don’t know about what you wouldn’t understand anyway. In other words, there is nothing you would do with the weights and biases anyway.

But…and very importantly - where Open Source shines is that it enables anybody, anywhere to take the current model (depending on the OS License) and adapt it. They can re-train, fine-tune and really turn it into something new. That's precisely what we’ve done with the Satoshi suite of models, and the upcoming “Code-Satoshi”.

The most important thing once again is application and honesty about “does it do what it says on the label”? In other words, if you want to build something more proprietary, just tell people what it does and do not pretend it is “unbiased.” This is once again where Open Source does shine, because there are a few great magicians out there who can check whether the ingredients are really there, and can bring such things to light.

The final note here is on crowd-sourcing. If we can successfully work out how to build these models with the help of the crowd, then of course they should be fully open sourced, and become “utilities”, so to speak. This is our mission with the Satoshi suite of models, and Max Webster from VC firm Hivemind ventures discussed in his essay earlier this year. He specifically wrote about ways that Bitcoin and the Lightning Network can power open source models to win. As bitcoiners, we appreciate the open source nature of Bitcoin and the Lightning Network code. It’s a fantastic read, as is the following post from the team at Turing Post:

If the convergence of AI and Bitcoin is a rabbit hole you want to explore further, you should probably read the NEXUS: The First annual Bitcoin <> AI Industry Report. It contains loads of interesting data and helps sorting the real from the hype. You will also learn how we leverage Bitcoin to crowd-source the human feedback necessary to train our open-source language model.


Current Block Height

Current Mempool Size

Current Difficulty