|
Aardvark DailyThe world's longest-running online daily news and commentary publication, now in its 30th year. The opinion pieces presented here are not purported to be fact but reasonable effort is made to ensure accuracy.Content copyright © 1995 - 2025 to Bruce Simpson (aka Aardvark), the logo was kindly created for Aardvark Daily by the folks at aardvark.co.uk |
Please visit the sponsor! |
We're told that AI has been trained on the whole sum of human knowledge.
The web-crawlers which feed AI have sucked up almost every page that exists on the web and handed it over to the training datasets that are used to make AI "smart".
On the face of it, this sounds fantastic.
One of the huge hurdles we started to face (in the pre-AI era) was that we simply had too much information to handle.
A Google search on something would often turn up links to thousands of pages, all of which contain relevant information. There's just no way a human can sort through that data and condense out all knowledge found within.
Fortuantely for us, this is where AI excels.
The same simple search query, when fed to AI, will produce a summary of key facts and these are based on the consolidation and analysis of hundreds or even thousands of those relavant pages.
That's fantastic.
However, there's a huge weakness which is intrinsic to this modern way of accessing our knowledge.
Despite the fact that AI has also been trained on some printed material, the vast majority of its datasets suffer from quite significant temporal bias.
What's that?
Basically, if something happened prior to the 1990s (when the internet started becoming popular) then AI probably doesn't know about it -- unless it's historically significant and has since been published online.
Aside from those "significant" events, AI is pretty much unaware of most of the 20th century and that means it will forever be limited in coming up with robust reasoning and results in many cases.
For example, there is a raft of intellectual and engineering knowledge locked away in the experiences and documents of the 20th century. Huge tracts of this data is simply unknown to AI because it does not make up part of its training datasets. This would mean that when given a complicated problem to solve in a related area, AI may be completely ignorant of simple, elegant solutions from 60 years ago that are still valid today. Instead, it will likely try to reinvent the wheel, often with substandard results.
Similarly, when you've been trained on a world which is filled with high technology and where there's a dependence on that technology, the ability to think of low-tech solutions which might be a better fit in some situations.
This lack of historical information may also make it impossible for AI to spot cyclic trends that take place over the span of many decades or even centuries. Being aware of such trends may be a key element of coming up with effective solutions to problems.
In the case of human knowledge it is often handed down from generation to generation by word of mouth -- AI has no access to this unwritten history.
I ran some queries through Google's Gemini and it admitted that it was likely to hallucinate because although, for example, it has assimilated the pages of magazines such as a 1950 edition of Popular Mechanics, that data has been OCRed (so may be inaccurate) and none of the drawings, plans, diagrams or schematics have been included. Typically, access to those visual elements is a critical part of "understanding" and interpreting what has been written, therefore AI can't reliably use that information -- in fact it may to a degree, corrupt its thinking.
So AI is best treated like a 20-something savant. It knows a lot about the world today but has a very narrow understanding (often a misunderstanding) of what went on before it was born.
What is it they say? Those who ignore history are doomed to repeat it?
Carpe Diem folks!
Please visit the sponsor! |
Here is a PERMANENT link to this column
Beware The Alternative Energy Scammers
The Great "Run Your Car On Water" Scam