I've been playing around with some of these new AIs, and it occurs to me that they're kind of a topical thing to post about. Not my usual topic, but then my usual recently has been not saying anything, so this probably counts as an improvement on that. I splurged for the paid version of chatGPT, and I've been really impressed with some of the answers I've gotten when I use GPT-4. I've been significant less impressed with any of the other models I've used, including GPT-3.5, Bard, and the various models on Novel AI.
GPT-4 seems really helpful on scholarly questions. So far, I haven't caught it making any mistakes on historical issues; if you've read that GPT-4 hallucinates less than other models, that definitely matches my experience. I actually haven't caught it hallucinating at all, while when I've asked similar questions of Bard or GPT-3.5, hallucinations were relatively common. For example, I asked both GPT-3.5 and GPT-4 about Moritz Schlick's murder; in that specific case GPT-4 didn't tell me anything I didn't already know (or anything you couldn't get from Wikipedia), but it also didn't get anything wrong, while GPT-3.5 wrongly said that Schlick was Jewish and that his murderer was executed for the crime.
GPT-4 also pretty good at generating book recommendations, at least in one sense. When I tell it what I have liked and why, it often recommends other things that I've also read and liked. So far more limited success (but still some success!) in getting it to recommend things I haven't heard of but might like, but again GPT-4 seems a big improvement over its predecessors, as GPT-3.5 sometimes recommended texts that don't exist, and GPT-4 hasn't yet made that mistake even once in my experiments.
One rather niche task; I'd read about someone else getting LLMs to engage in a rap battle, and GPT-4's rap was pretty impressive, so I asked it to compose a rap battle between Heidegger and Carnap, and the results were quite impressive. If you've ever found it amusing to wonder what a rap battle between some historical figures might have been like, GPT-4 can definitely help you out with that.
There are already other LLMs with internet access, persistent memory, or extremely large contexts; I think WGA is probably right to be concerned about AI being able to replace a lot of what they do; from what I've seen combining a GPT-4 strength model with some of those already existing features could probably replace most human writing tasks, even without any further innovations. GPT-4 didn't help me write any of this, but probably could have, and might even have made it more interesting.
Recent Comments