ChatGPT Does Not Mean Anything, Yet

Myungjun Kim

Florida State University

Given the recent hype around large language machines (LLMs), there are surprisingly few principled approaches to the meaning of LLM outputs. I explore the implications of one of the rare principled approaches, the Williamson-Cappelen-Dever (“W-C-D”) theory of AI metasemantics. I argue that assuming the W-C-D theory, LLM outputs currently do not mean anything. I use GPT-4 as my prime example.
  Cappelen and Dever (2021) propose a theory of AI metasemantics, building on Williamson’s (2007) principle of knowledge maximization. According to Williamson, the correct theory of reference maximizes the interpretee’s knowledge. While adopting this idea, Cappelen and Dever point out that we do not care about what the AI knows but what we users know as we use it. They thus conclude that the correct semantic theory of AI outputs maximizes the interpreter’s knowledge.
  Does the user’s knowledge increase by ascribing meaning to LLM outputs? I present an argument for a negative answer, backed by empirical evidence. New studies suggest that GPT-4 is surprisingly unreliable in certain types of situations. 1) Challenges – Wang et al. (2023) show that even when GPT-4 initially produces a correct statement, it is likely to (wrongly) agree to its negation when challenged with absurdly invalid arguments. 2) Applications – Arkoudas (2023) shows that even when GPT-4 provides correct statements about a mathematical theory, it is prone to get it absurdly wrong when asked to apply the theory to simple questions.
  Assuming Williamson’s (2000) safety condition for knowledge, if the user is not aware of those situations where GPT-4 is unreliable, she does not gain knowledge from it even when it produces correct outputs because she could have easily formed a false belief on a similar basis, i.e., GPT-4 in unreliable situations. Given its black-box nature and that research on its performance has only begun, it is extremely plausible that there are more unknown ‘unreliability partitions’. Until those partitions are discovered, beliefs gained by using GPT-4 would not satisfy the safety condition for knowledge. Therefore, GPT-4 outputs do not mean anything, as of now.
  Lastly, I argue that in order for future LLMs to mean something, more transparency is called for regarding research on their performance, even with respect to the results of proprietary studies.

Chair: Horia Lixandru

Time: September 11th, 17:40 – 18:10

Location: SR 1.005 (online)


Posted

in

by