The Dublin Papers

1 January 2026

Tags: philosophy

Table of Contents ↓

No One’s Home: Large Language Models, Reason, Thought & Bullshit Necessary Nuance: Why Metaphors are Not Paraphrasable

I wrote a lot of essays during my time at University College Dublin, and I thought it would be fun to share my favorites with the world. These have been taken directly from my assignments and are minimally edited for clarity.

No One’s Home: Large Language Models, Reason, Thought & Bullshit

Written as my final for PHIL 20640

Everyone’s heard of them, everyone’s using them, and some even think they’re conscious: Large Language Models (LLMs) such as GPT-4o (and their chat wrappers like ChatGPT) have completely taken over public discourse and have posed interesting questions on the nature of reason, thought, and bullshit.

In this essay I will make the claim that all LLMs are capable of thought (in a sense) but incapable of reasoning and that the transformer architecture as it currently exists is incapable of ever having the latter ability, instead mimicking human capabilities. First I examine what intent, reference, bullshit, thought, and meaning mean in a human context and how these definitions will be used in my later argument. I then consider each item and why LLMs are incapable of possessing them. Once I have stated my argument I examine various objections and refute them. Finally I conclude my essay by considering implications for the LLMs/chatbots of today and potential futures for chatbot-like systems, ones that may even come to bear the qualities of human reasoning and thought.

I will now set out the definitions to be used throughout my paper. Many people today ascribe true human emotiveness and reasoning to LLMs, relying on “folk psychology” to do so. This is the first important bit to consider, and determines whether an LLM has what’s called a propositional attitude, or in layman’s terms beliefs, desires, and an urge to perform intentional actions to achieve them¹. This is the building block to figuring out if an LLM can have the intention to do something, a core part of reasoning: one must have intent for there even to be an incentive to reason about anything. In his essay, Frankish makes the case that LLMs do have beliefs and intent, but only insofar as playing a “chat game”, similar to a chess computer outputting moves it “thinks” will best contribute to said game¹. While the fact that LLMs play this game does mean they have intent, it is not the same intent that humans have. Humans have multiple competing intentions ranging from survival to reproduction to socialization, and these create a highly dynamic set of “superdesires” which have true commitments behind them¹ that LLMs lack. This difference puts LLM intentions in a strange place with regards to the overall concept of intent.

While a LLM may have (very singular) intent when it generates tokens, do the tokens it generates with said intent refer, or achieve in grounding said tokens (and the words they compose) in real-world objects? I argue they do through a process I will call “referential inheritance”. This is similar in principle to Mandelkern’s argument that language models’ tokens have a “natural history” that then ground the tokens in the language². I take this argument further, arguing that this inheritance takes place during both the training and inference processes. During training, each token is encoded as a vector, and each vector is inherently a reference to the history of the word it refers to. Similarly, during inference a model may see a token it doesn’t have in the vector database. This is fine, however, since it breaks it down to smaller and smaller components (eventually single letters if necessary) and inherits the meaning from the user’s input via that method. The LLM doesn’t actually recognize the word-to-world relationship of each token, but it utilizes it implicitly anyways when it creates output that humans (with word-to-world relationships) read and act upon. Therefore, the output of an LLM always refers.

Now I will consider the actual content of these references. LLMs can produce useful information in a large number of cases (due to the fact that they refer and have a good enough sense of the game they intend to play), but there has always been the problem of “hallucinations”, or instances where an LLM generates false information. This terminology wrongly characterizes these occurrences as products of perceptional aberrations; instead, calling LLM outputs “bullshit” is more representative of what they produce, regardless of truthfulness³. Part of Hicks’ argument hinges on LLMs not having intent in order to classify all of their output as bullshit, however I have shown above that LLMs do in fact have intent, limited as it might be. I still believe LLMs generate only bullshit because their intent, to contribute to the “chat game”, is fundamentally indifferent to the nature of truth. The only objective of the game is to contribute to the conversation in a reasonable manner¹; truthfulness is neither tested nor selected for. Therefore, all of their output is still bullshit, specifically because said output lacks any concern for the truth.

With all of the preliminary terms defined, I will now move onto thought and reasoning. There is a clear dependency between the two terms: reasoning operates on thoughts. To start, I will define thought then build upon that definition for reasoning. I will take thought to mean an internal symbolic representation that can be manipulated or communicated. This is a pragmatic and very loose definition by design: stricter interpretations require consciousness and other higher-order operations whose presence is hard enough to determine in humans, let alone LLMs, so I think these definitions unnecessary compared to the more pragmatic definition most people would agree with as “thinking”. Anthropic has recently released a paper titled On the Biology of a Large Language Model, investigating the abilities of one of their models, Claude 3.5 Haiku. The Multi-step Reasoning and Entity Recognition and Hallucinations sections (ignoring the names which are misnomers when my assertion is applied) are particularly illuminating in the area of thought: Multi-step Reasoning reveals the chain of processing Claude executes to create an answer to a simple prompt⁴, and the process involves manipulating symbols (tokens) through transformers, the core of LLMs, and outputting the result (a form of communication). Similarly in Entity Recognition and Hallucinations, Claude uses various “circuits” in order to decide whether to try to answer a prompt⁴, similar to how I would describe thoughts being influenced by others. Its output may be bullshit, but that doesn’t matter for whether the thoughts themselves exist, and I have shown they do.

LLMs do think, in their own, very non-human way. However, this does not mean they can reason. I will define reasoning as the general ability to prioritize premises, goals, strategies, conclusions, and other cognitive items in order to suit the agent’s needs¹. When doing so the agent must evaluate the truth of each thought to make sure it contributes to agent’s goals. The latter condition is necessary because without it there would be no way for an agent to know the best path to take. This is also a relatively pragmatic definition, one applicable to not just humans, but also animals. In fact, there isn’t even a need to take a purely Dennettian approach here; animals that only meet Popperian criteria for problem solving still meet my definition¹. With this definition LLMs fail on an architectural fallacy: While it can be said that LLMs have ideas, they are unable to convert these ideas into true processes of reasoning because it is impossible for them to evaluate the truthfulness of any of them nor revise said thoughts when challenged. This is due to how transformers work: they do not operate on truth conditions, only abstract vector relationships that only have grounding via inherited reference. This does not mean that transformers as a building block prevent reasoning; transformers could well be a part of a new machine learning architecture that does pass the bar for thought and reasoning. However, solely using transformers and smaller supporting networks (what LLMs are) cannot meet this.

I have shown that while LLMs can think, they cannot, and in fact never can, reason. This assertion gives a philosophical basis for the problem of model collapse, or the issue where training a model on text generated from another model leads to significant degradations in the new model’s output quality⁵. In simple terms: garbage in, garbage out. This makes sense coming from my premise: The fact that LLMs can’t reason results in a fundamental lack in ability to create human-quality writing for training, since human-quality writing requires reasoning to tie all the thoughts presented together. The problem of model collapse alone could spell the end of rapidly-increasing LLM capabilities and force a reevaluation of current machine learning methods⁶.

Now I will consider some objections to my assertion. First I will cover the way my definitions are constructed. The difference between the constructions of “thought” and “reasoning” are present because while “thought” is a very broad term and could be applied to a wide range of mental states in both humans and animals, “reasoning” is loaded with a type of “smartness” and goal-direction that necessitates more precise consideration. Now that I have clarified my definitions, I will move on to other objections.

Functionalism is a philosophy that dictates mental states be defined solely in terms of their functional roles, or what they accomplish for the agent. I will make the distinction that I am specifically talking about a strong functionalist perspective on the issue of thought and reasoning in LLMs. Some of my earlier definitions do involve functionalist principles (specifically my definition of thought), but this does not mean functionalism can be ascribed the operation of an entire LLM. For example, while some parts of the human brain are defined as reflex-based, that does not imply the entire human brain operates on reflex alone. One could take this definition to mean that since LLMs produce output that is sometimes indistinguishable from humans, it must have the same mental state (and therefore reasoning capabilities) as a human. This is a tempting thought, but it falls flat when one considers model collapse and my assertion’s support for it. If there truly were no difference in the content an LLM generates versus what a human creates, there would be no problem feeding LLM-generated content into new models, but this doesn’t work. Additionally, there is doubt as to whether it is even possible for an LLM to have a mental state at all⁷, which only serves to prove my argument further. Therefore, functionalism fails to explain LLMs and cannot grant them the ability to reason.

I will next consider “reasoning” models like GPT o4-mini and the scaling problem, since they are similar in nature. One could make the claim that reasoning is right there in the title, but this is a misnomer. The way these models “reason” fails to meet the definition of reasoning posed earlier. Instead, they “reason” by outputting intermediate textual steps to try to tweak the probabilities such that the model performs better in certain tasks, something that falls short of humans. Specifically, all these models do is follow their singular intent to play the “chat game” with slightly tweaked parameters, nudging it to be more verbose with its outputs, not a fundamentally different game that could be taken as a real reasoning process. Similarly, scaling an LLM won’t magically create a model that can reason. All scaling does is give more predictive power to the same architecture, in other words allowing it to play the “chat game” with more skill, not imbue LLMs with the framework of truth-seeking and prioritization necessary for reasoning. Therefore, “reasoning” models don’t actually reason, and my assertion still stands.

I have now shown that LLMs can think due to their symbols, can’t reason due to bullshit and other attributes, and objections to my assertion do not hold. LLMs are great for a wide range of tasks, but they aren’t humans. When you really look inside the mind of an LLM, no one’s home.

Necessary Nuance: Why Metaphors are Not Paraphrasable

Written as my final for PHIL 31170

Metaphors are a crucial component of human communication, allowing people to express thoughts in an abstract manner different from normal language. In this essay, I will assert that metaphors are not paraphrasable without losing information, contradicting Lepore’s arguments in Against Metaphorical Meaning. I will start with an examination of what metaphor is and its use in human communication. I will then build on this to give a general framework to explain why it is impossible to paraphrase or reduce them to purely literal interpretations without a loss in meaning. Once I have established my argument I will consider its implications and refute objections to it. Finally, I will close out with an overall view of the importance of my assertion.

I will now create a ground understanding of the uses of metaphor in communication. I define metaphor as an utterance which the speaker intends the receiver to be able to decode with a given understanding of the language and environmental assumptions (generally known as a context). A metaphor would have no use to an isolated or inexperienced receiver since the goal of a metaphor is to map information from one group of concepts to another in a flexible manner⁸, something that requires a shared context of the concepts to be mapped. I will introduce a metaphor to understand metaphors in this regard: When a speaker utters a metaphor, they produce a linguistic equation with unfilled variables that the receiver must fill with their own experiences. The receiver may choose any set of values for the variables that satisfies the equation, and in doing so understands the metaphor⁹. This filling of variables also literalizes or concretizes the metaphor, collapsing the possibilities for different interpretations down to one. In formal terms, consider a speaker who produces a metaphor $m$ . A receiver can literalize $m$ by recognizing it as a metaphor, providing some context $c$ (usually different for each receiver), then deriving a proposition $p$ .

Now that I have created a base view of metaphor and its uses, I will define paraphrasing and its general usefulness in communication. Taking from Camp’s work, I define paraphrasing as an utterance that captures the speaker’s propositions (of which multiple can be packed into a single utterance) literally and explicitly⁹. This definition ensures a paraphrasing utterance functions how most would expect it to: convey the main information the original utterance without dropping any key points or adding new unfounded implications. It’s important to note the inclusion of “literally” here: it is possible to paraphrase a metaphor into another metaphor since the utterances stay in the same realm of interpretability. I am specifically talking about literal paraphrasing in this essay because non-literal paraphrasing is trivial and not philosophically interesting; the important question is whether metaphors are at their core literal or non-literal devices.

I have defined all terms relevant to my assertion, which I will now present: Metaphors cannot be paraphrased because doing so fundamentally discards information or, more precisely, interpretability from the utterance. This assertion clearly follows from the aforementioned definitions: A metaphor is a fundamentally non-literal device that yields literal interpretations when a receiver applies their context to it. This classification comes from how linguistic communities process metaphors: a mutual understanding between a speaker and receiver that the content of an utterance may be “grounded” in multiple different valid literal utterances. While a metaphor produces literal propositions when used in communication, the metaphor itself is a container for arbitrarily many literal propositions, of which the receiver reaches in and grabs one though the process of filling in the “equation” with their own context. Formally, the paraphraser would have to introduce their own context to apply to the metaphor to create a paraphrase, but this means that no one else could apply their own context to the original metaphorical function.

Now that I have stated my core assertion, I will expand on the effects of paraphrasing on the lifecycle of a metaphor to clarify why paraphrasing is unreasonable. When a metaphor is first coined, the lingual community is not accustomed to it and therefore the range of interpretations a receiver can choose for said metaphor is vast. Statistically, people would choose very different interpretations of the same metaphor, keeping its meaning highly contextual. The ill effects of paraphrasing at this stage are clear: paraphrasing a novel metaphor turns it into a set of words that means something completely different without a logical basis for doing so; if concretization to a single meaning was the purpose, the single literal proposition should have been used instead. If the metaphor survives this initial period, more people will receive the metaphor in conversation and the linguistic community will converge on a smaller set of interpretations. It is important to note there is still more than one interpretation available and some receivers could still take “out of left field” novel interpretations, but they would be the exception in a state of general societal adoption. At this point it would still be premature to paraphrase a metaphor due to the loss in flexibility. Metaphors in this stage are still useful for nondeterministic conversational communication when some contextual interpretation is warranted. In some cases, speakers may converge on one interpretation of a metaphor with such certainty that it collapses and becomes literal, otherwise known as a dead metaphor. This can be considered the ultimate paraphrase of the metaphor, but it isn’t a metaphor anymore due to the singular, literal proposition it would represent. I have now shown that at every point in the lifecycle of a live metaphor, avoiding paraphrasing is important in keeping a metaphor’s utility in communication.

It’s worth mentioning the justification for separating literal and non-literal utterances. In general, there are multiple ways to decode an utterance: literally, approximately, hyperbolically, and metaphorically, all existing on a spectrum⁸. While it may be possible for some utterance to be interpretable as hyperbole or metaphor, it is impossible for a metaphor to be interpreted as a literal utterance (or vice versa) without breaking the system of lexical pragmatics people use.

I will now move on to considering implications of and objections to my assertion. Aside from the trivial implication of the segmentation of metaphor into its own category of utterance, my assertion fits in and complements the general understanding of metaphorical interpretation. Camp makes a similar argument in her essay on metaphor, although she doesn’t go as far as I do in pushing for metaphors as fully irreducible to literal utterances. We agree on the main utility of metaphor being providing a conceptual map, and its crucial part in understanding fundamentally abstract concepts such as new psychological phenomena⁹. However, Camp makes the argument that there isn’t a meaningful way to understand the conditions of fulfilling a metaphor, specifically that “we ourselves don’t fully understand what those conditions are”⁹. I argue it is possible to understand the conditions as a reflection on contribution to the overall conversational ledger¹⁰. If the conversational ledger is preserved in a manner that allows the conversation to continue, then it is highly likely that a valid context was supplied to the metaphor; otherwise, something has probably gone awry. My assertion also works in defining new terms for a public language by means of a speaker creating an equation that approximates their internal understanding of the new concept, and the linguistic community slowly transforming said metaphor into a concretization. Similarly, my assertion works with Wilson’s placement of metaphor on an interpretation spectrum, since the literalization of a metaphor happens after the recipient processes the utterance as a metaphor. With these implications, I will now look at objections to my assertion.

The objections I will be covering will come from Lepore’s Against Metaphorical Meaning. The core of Lepore’s argument is that while metaphors can be used to induce a recipient into finding similarities between concepts, metaphorical meaning cannot be used to convey propositional content. Specifically, they argue a metaphor fails to contribute content to the conversational record or grant a cooperative understanding of the speaker’s intent to communicate a proposition to an audience¹⁰. Lepore’s insistence on this argument comes from a view of conversation and communication that requires singular, shared meanings in order to contribute to the record, but this baseline assumption that literal communication is perfect and has no room for interpretation ignores the complexities of real life. I argue that metaphors do contribute to the conversational record through the process of concretization. Formally, when a speaker produces a metaphor $m$ , the recipient must recognize $m$ as a metaphor to derive a concretization $p$ , which I will concede is an extra step, but this still ends with a proposition $p$ being added to the conversational record. The special nature of the metaphor is that the speaker intends there to be no “single correct answer” when using metaphors, but this does not mean it cannot add to a conversation or prevent an act of cooperative understanding. If two people were talking about their love lives and one person produced a metaphor to describe the devastation they recently felt and the recipient applied their own context, it would still add to the conversational record, just not in a deterministic manner (due to the variations in context a speaker might apply in the concretization process). When one paraphrases a metaphor, it is concretized into a single proposition, and therefore is no longer able to contribute to the conversational record in the same manner.

This core point undermines the rest of Lepore’s paper. In §2.1, Lepore agrees with Davidson in claiming “if a metaphor has speaker meaning, it should be possible to express it literally”¹⁰, implying a metaphor does not have speaker meaning. This is an incorrect position because metaphors can be and are expressed literally, they just require the context from the receiver. Requiring a single canonical meaning be communicated from speaker to receiver to add to the conversational record is an unnecessary restriction on the medium of communication. In §3.3, Lepore argues metaphors can be misunderstood only if the receiver misses its point¹⁰. This reasoning implicitly assigns a more stringent standard of “missing the point” than a literal proposition, but when the standards are equal this argument doesn’t work anymore. If someone were to mishear or misinterpret a literal proposition (for example due to the wide variety of meanings that words can have in languages), it would have the same negative conversational effect as when a receiver applies incorrect contextual hints to a metaphor. In either case, an incorrectly derived proposition is internalized by the receiver. If Lepore’s point were true, it would be easy to assign a paraphrase to a metaphor due to the narrow standard of what is “valid” for a metaphor, but due to my assertion Lepore’s point is false.

I have argued the irreducibility of metaphor due to their special function as interpretive tools relying on speaker-receiver contextual flexibility while still contributing to the conversational record. Paraphrasing concretizes a metaphor into a single proposition, removing the ability of the receiver to use their own context to engage with the metaphor. I have considered Lepore’s possible objections to my claim and addressed them. By preserving this flexible interpretability and resisting paraphrasing, metaphor keeps its role in facilitating nuanced communication and understanding.

Footnotes

Frankish, Keith. “What Are Large Language Models Doing?” Anna’s AI Anthology: How to Live with Smart Machines?, Xenomoi Verlag, 2024, pp. 55–78. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Mandelkern, Matthew, and Tal Linzen. “Do language models’ words refer?” Computational Linguistics, vol. 50, no. 3, 2024, pp. 1191–1200, https://doi.org/10.1162/coli_a_00522. ↩
Hicks, Michael Townsen, et al. “Chatgpt is bullshit.” Ethics and Information Technology, vol. 26, no. 2, June 2024, https://doi.org/10.1007/s10676-024-09775-5. ↩
Lindsey, et al., “On the Biology of a Large Language Model”, Transformer Circuits, 2025. ↩ ↩²
Shumailov, Ilia, et al. The Curse of Recursion: Training on Generated Data Makes Models Forget. 2024, https://arxiv.org/abs/2305.17493. ↩
2026 Jack here, another explanation closer to the probabalistc explanation of model collpase may be an aggressive collapse to the mean of token distributions, cutting off the long tail generated by humans with reasoning abilities choosing unlikely words to meet their goals. ↩
Block, Ned. “Troubles with Functionalism.” Minnesota Studies in the Philosophy of Science, vol. 9, 1978, pp. 261–325. ↩
Wilson, Deirdre, and Robyn Carston. “A unitary approach to lexical pragmatics: Relevance, inference and ad hoc concepts.” Pragmatics, 2007, pp. 230–259, https://doi.org/10.1057/978-1-349-73908-0_12. ↩ ↩²
Camp, Elisabeth. “Metaphor and that certain ‘je ne sais quoi.’” Philosophical Studies, vol. 129, no. 1, May 2006, pp. 1–25, https://doi.org/10.1007/s11098-005-3019-5. ↩ ↩² ↩³ ↩⁴
Lepore, Ernie, and Matthew Stone. “Against metaphorical meaning.” Topoi, vol. 29, no. 2, 19 Jan. 2010, pp. 165–180, https://doi.org/10.1007/s11245-009-9076-1. ↩ ↩² ↩³ ↩⁴