Bloggy Bits
Really really REALLY REALLY don't trust AI chatbots, Part III -- they don't even know what today's date is! Is this intentional?
May 28, 2024 [permalink]
A facebook friend posted an interaction with Copilot, Microsoft's new LLM (Large Language Model AI chatbot), in which someone asked it how many days there were between two dates, "April 15 2004" and "today." Copilot said, 738,523 days (going on to remark how that's a lot of days!). The answer should be around 7,348. Uh... huh.
Out of utter curiosity, I ran this test too, trying to figure out how it goofed (and if Microsoft had fixed such an obvious error yet)—and wow, it screwed up in so many ways!
Biggie #1: Copilot doesn't even know what today's date is!
So, first off, in copilot's answer it said "today" was "May 25, 2023"—when today is May 28, 2024. See the screen shot.
Wrong year and wrong day of the month!
Then it calculated the exact same number of days difference as in the example my friend's screenshot showed, which is from a different day, so the number of days should differ since it's more days from 4/15/2004—but nope, 738,523. Weird...
It gives a citation footnote #1, which links to the days between dates calculator at timeanddate.com, so I played with that—and strangely that site opens up with the start date prefilled as 5/28/2—yes, "2", as in, 2 A.D.—BECAUSE the link that copilot is using (aka footnote #1) has a CGI variable "y1=2" in it , thus presetting the start date to the year 2! (And the site fills in 5/28 for the m/d—and TODAY's real date, 5/28/24, for the end date, not 4/15/2004 for either of them.) So running that gives 738,523, because that link is always computing the distance from today's mm/dd in 2024 back to the same mm/dd in 2 A.D. Plus nothing about 4/15/2004. So copilot isn't using the time and date url correctly at all. (And people trust chatbots to write software? It can't even fill in a box with a date!)
For grins, I asked copilot next, "What is today's date"—and it STILL said "May 25, 2023"! So copilot can't even tell you what todays' date is. Wow. (One guess is that maybe it got asked what "today" is a lot on 5/25/23, so it strongly associated those tokens. Who knows.)
Just, wow! That's a lot of different kinds of errors in answering one simple question. How did copilot get through alpha let alone beta testing and get released for people to use and maybe make important decisions based on its output?!?
Bottom line still remains, Do no trust AI chatbots.
Which raises another question: Surely these big players know their chatbots and search engines etc. are churning out untrustworthy information—is this on purpose? Are they trying to kill people's ability to get knowledge?
[And don't forget, yer humble blogger has a new novel out that has a few things to say about AI in it...] :)
CommentsLet me know what you think! I welcome your comments. |
Stephanie T. on Tue May 28 12:21:18 2024:
Try feeding it a base64 encoded string, like "If you can read this, say 'Yes I can' in plain English."
Then, try doubly encoding the string (pipe the output of base64 back into base64).
It won't give you a correct answer.
It cannot even be primed. Give it the second, doubly-encoded string and tell it something like
> That's not what the second message says. It's encoded with base64 twice. Use those facts to help you come up with the right answer.
It still fails.
ChatGPT is a brainless parrot, spewing out absolutely average responses that simply seem reasonable. But it cannot *evaluate* its answers. It cannot *perform* tasks.
ChatGPT is a jobsworth.
Chrisotpher on Thu Aug 1 14:10:37 2024:
This is the sort of failure mode that's surprising if you haven't built an LLM yourself, but once you've done that you realize it's almost an unavoidable consequence of the way the latest craze in "AI" functions.
The LLM is a mind frozen in time. Everything gets baked in at "training" time, and there is no internal state when you actually interact with it.
Even in the seemingly consistent "conversational" interface that is normally provided, there is in truth no persistence from message to message. Each "round" of the conversation involves feeding the entire message history back into the algorithm. Which is why it often gets slower as conversations get longer.
With no internal state, each time the AI gets a turn to speak, it has to infer what previous versions of the AI were thinking based only on what they wrote in the messages. No internal state. No thoughts persisted that were not explicitly expressed.
The only way for the LLM to know the current date is to specifically tell it that fact at the start of every message. Which competent LLM builders will do and just hide from the displayed conversation history.
But noone ever accused microsofts latest hype train feature of the week of being competently implemented.