設計工具

輸入無效。不支援特殊字元。

DRAM

What does it mean for AI to ‘know’ something?

Evelyn Grevelink | August 2025

Explore embedding space to learn how AI simulates knowledge.

To ask what it means for AI to “know” something is to look beyond its outputs and into the strange, mathematical space where it makes its predictions. Understanding that process can change how you relate to AI, not as a source of embodied truth but as a sophisticated system whose probabilistic predictions you can navigate with more awareness and careful intent. Having this technical understanding creates space for more balanced thinking and invites you to engage in thoughtful inquiry: rather than accepting broad, sweeping claims about AI, you can admit uncertainty and ask more precise questions about when, how and under what conditions AI might augment or reshape human capability. This first blog in our four-part series on AI concepts begins with foundational terms like “tokens” and “embeddings” before exploring “embedding space,” a high-dimensional space that acts as a model’s internal representation of language. It’s here that relationships between words are encoded, guiding the model’s predictions in ways that are statistical rather than meaning-based. We also glimpse the role of memory, not as recollection but as part the underlying hardware architecture that quietly shapes what these systems can hold and express.

“The moral of Snow White is to never eat ...”

This playful twist from author Lemony Snicket reframes the ending of Snow White to expose how strange and rigid our inherited lessons can become when taken to their logical extreme. Even left unfinished, this sentence reveals more than it withholds. Its shape is so familiar that the next word seems to rise automatically, not so much from a conscious recall but from something more embedded, something shaped by fairytales in childhood. Apples, of course! Or perhaps, poisoned apples? As a reader, you may feel a quiet certainty, a reflexive sense of how the sentence ought to end, even though you know apples are not really the moral of that story. And that reflex raises deeper questions: If a language model can produce the same ending, or one close to it, what exactly is it drawing from and what kind of “knowing” does that represent?

If we provide that incomplete sentence to a large language model (LLM), it might finish it as: food from strangers,” “a poisoned apple” or simply “apples.” Each version seems reasonable. “Food from strangers” reflects the actual storyline while “a poisoned apple” points to its most iconic danger, and “apples” focuses on the infamous fruit. But none of these possible endings emerge from something experiential or lived. The LLM is not interpreting a story so much as it is identifying statistical patterns in the English language. That is, because LLMs generate text by sampling from a distribution of likely next tokens, results can vary depending on how the prompt is phrased, which model is used and even recent conversation history. So our examples above are illustrative, not deterministic.

Before we explore what it means for AI to know something through these statistical patterns, it’s worth considering what physical reality makes that knowing possible. What seems like a simple completion, “apples” or “food from strangers”, rests on a foundation as layered and complex as the stories we inherit. Every prediction emerges from numerical relationships that need physical infrastructure to exist. And these numbers require a home in circuits and memory systems that can store, calculate and process them at scale. Beneath the surface of what appears to be fluency lies hardware architecture largely abstracted from our view yet essential to every seemingly effortless response a model generates. While this blog focuses on the conceptual journey from text to prediction, along the way we also glimpse the physical reality that makes it all possible, with particular attention to the advanced memory technologies that form part of the foundational hardware enabling model intelligence.

What is the journey from text to prediction?

Tokens

When AI encounters a word, it neither reflects on its meaning as a human might nor does it recall experiences, memories or sensations. Instead, it begins the process of interpreting text by converting words into tokens, which you can think of as the smallest units of text the LLM can process. A token might be a full word like “snow,” part of a word like “app,” or even a punctuation mark, letter or space. When we enter a phrase like “The moral of Snow White is to never eat …” into an AI system and ask what might come next, the text is first prepared into tokens. While the exact tokenization varies by model, it might look something like this:

[“The” | “ moral” | “ of” | “ Snow” | “ White” | “ is” | “ to” | “ never” | “ eat”]

Note that we’re simplifying this process for illustration. Real tokenization is based on complex algorithms and often proprietary. The commonality of these short words means they are unlikely to be broken when tokenized. Notice the space before internal words. This denotes their inclusion in the sentence versus the start of a new thought.1 Tokenization is just one stage in a broader transformation from text to prediction. To make sense of how language becomes legible to machines, it helps to think of the process as a kind of metamorphosis. Much like a caterpillar becomes a butterfly, before human language (in this case, input as text) becomes something a machine can interpret, it undergoes a series of structured steps where meaning is translated into mathematical relationships. For simplicity, we describe this progression as moving from text to tokens to embeddings. However, it’s much more nuanced and involves intermediate steps like token IDs and other representations to help bridge the gap between the richness of natural language and machine understanding.

Embeddings

Tokens are then mapped to embeddings, which are mathematical representations that capture both the semantic (ideas that words represent) and syntactic (the roles that they play in sentences) relationships between words.2 Importantly, these relationships are not explicitly programmed; they are learned from patterns in how words appear and reappear across training data (which may include literature, articles, transcripts and other sources). These embeddings are then projected into a high-dimensional space created during training. This space, known as embedding space, is a mathematical structure that encodes the embedding model’s internal representation of language. It’s definitely not a dictionary of definitions; rather, it’s a statistical map of proximity and word association. Meaning, words that often appear together or serve similar functions cluster near each other. “Pirouette” and “arabesque” might, for example, linger near each other, as might “apples” and “caramel,” but not because the model understands ballet or dessert pairings. These words simply appear near one another in similar contexts.

Simulated knowledge

This explanation brings us to a key distinction. For a human, the word “apple” might evoke layered meaning from a lifetime of memories: a neighbor’s tree, the weight of a pound of apples, a sweet taste on your tongue. Our understanding emerges from culture, sensation and lived experience. But for AI, “apple” doesn’t live in memory this way. Its meaning is entirely positional, inferred from relationships within embedding space and based on patterns learned from enormous amounts of text. AI may not “understand” in the way humans do, but it simulates understanding well enough to be quite useful. By “simulate,” we mean the LLM can generate responses that resemble knowledge without suggesting that the AI system possesses understanding or awareness in a human sense. And if we want to better appreciate how AI generates what seemingly appears to be knowledge, we’ll need to take a closer look at embedding space.

Explore embedding space

Click image to enlarge

What do we mean by ‘embedding space’?

When you write a prompt for an AI chatbot, it may feel like you’re giving instructions in plain English (or the language of your choice). But what’s actually happening is far more abstract. Your words aren’t interpreted for meaning in the human sense; they’re transformed into high-dimensional vectors within an embedding space. This space can span hundreds or even thousands of dimensions, where the model’s response is guided not by understanding but by the vector relationships in this space.3 This process has caused a profound change in the way we interact with machines. Whether through typing or voice, our inputs now resemble natural conversation more than code. Computer scientist Andrej Karpathy captured this shift with a quip: “The hottest new programming language is English.” It’s a clever way of pointing to a deeper transformation: how language itself has become a kind of interface, especially with large language models. In this context, English doesn’t behave like ordinary speech; it functions more like structured input. But although AI appears to “speak” English, what it’s actually doing is far more complex.

A house of echoes

Embedding space captures how words tend to co-occur statistically across the full spectrum of human expression that it has access to and that includes everything from news articles and essays to Reddit threads and tabloid headlines. Meaning, in this space, is not defined by rules of grammar or any deliberate intent; it’s defined by how often words tend to appear together, echoing where humans have placed those words next to other words in the past. The statistical nature of prediction may also help explain why AI struggles to hold tension in a sentence: it’s always in search of the most probable next word, not the most intellectually provocative one. Depending on the model, the output may be more or less inclined to deliver a stream of short, punchy sentences that all resolve too quickly. At least for now, AI cannot really hold its breath. Trained to complete, the model often finishes too quickly, resolving ideas prematurely and avoiding ambiguity or contradiction, things that precisely separate humans from machines. This tendency is something to keep in mind next time AI seemingly agrees with you, happily providing you with the closure and certainty you long for. The model is not “thinking” about what to withhold or sustain so that tension can build and complexity emerge; it’s just trying to be statistically coherent and aligned with its training data.

n-dimensional tensors: how AI approximates meaning

The embeddings that make up this space are represented as tensors, a way to mathematically represent relationships across many dimensions. A tensor can be a one-dimensional list (like a vector), a two-dimensional table of rows and columns (like a matrix) or a higher dimensional object, stretching into three, four or n dimensions. In the case of language models, each embedding is a one-dimensional tensor, which is a vector containing hundreds or thousands of numerical values. These numbers are not arbitrary. Each one represents a distinct dimension in a high-dimensional space, capturing subtle patterns in meaning, syntax and context.

Figure 1: 2D interpretation of embedding space with flat organic, irregular shapes. Click image to enlarge

Figure 1: 2D interpretation of embedding space with flat organic, irregular shapes. 

2D to 3D: A conceptual walkthrough

To make this space more accessible, let’s start with a simplified version: a two-dimensional embedding space. Imagine two axes that capture statistical proximity, relationships we can’t directly interpret. Along one axis, “apple” might sit near “fruit,” “huntsman,” “jealousy” and “bite.” Along another, it might be close to “red,” “sweet” and “crisp” (Figure 1). These groupings don’t reflect conceptual categories like “fairytale” or “flavor” but rather the frequency with which these words appear together in language. This idea echoes linguist J.R. Firth’s famous insight: “You shall know a word by the company it keeps.” Meaning, in this context, emerges from use, not from definitions.

Now, let’s add a third dimension. “Apple” might appear near “knowledge,” “temptation” and “snake,” hinting at its biblical associations (Figure 2).4 These are different contexts in which “apple” appears, and the model captures them as statistical patterns, not symbolic meaning.

Figure 2: 3D interpretation of embedding space with spherical organic, irregular shapes space (using the visualization concept guided by the TensorBoard Embedding Projector, which helps explore and understand embedding layers).4 Click image to enlarge

Figure 2: 3D interpretation of embedding space with spherical organic, irregular shapes space (using the visualization concept guided by the TensorBoard Embedding Projector, which helps explore and understand embedding layers).

4D to 512 dimensions: adding complexity

If we imagine adding a fourth dimension, such as time, and then train the embedding model on both pre- and post-1980 data, “apple” might find itself surrounded by different words, depending on the era (Figure 3). By the 1980s, “apple” begins to co-occur with “computer,” “technology” and two guys, both named Steve. Before that, in a more agricultural Cupertino, “apple” doesn’t yet have a strong cultural connection to the area. Its associations remain rooted in orchards and citrus fruit, not yet powerfully connected to the icons of a Fortune 100 company. This change illustrates how language distributions are not fixed but dynamic; they evolve over time as cultural forces act on them.

n-dimensions

While adding time as a fourth dimension reveals cultural shifts in language use, real embedding spaces operate on a far greater scale. It’s easy for us to picture a point in two or three dimensions like a location on a map or an object in space. But embeddings exist in hundreds or thousands of dimensions, which quickly becomes impossible to visualize. To make this idea more concrete, Table 1 shows how each word is represented, as a list of numbers, and where the number of dimensions defines the size of the embedding. These values don’t convey any meaning, but they do position the word within the model’s embedding space. For example, “apple” might be represented as a vector like [0.2, 0.3, 0.2, 0.5, …, 0.8] across n dimensions, capturing a numeric representation of “apple.”

Figure 3: 4D interpretation of embedding space with time, representing how the distribution of language is shaped by cultural forces. Click image to enlarge

Figure 3: 4D interpretation of embedding space with time, representing how the distribution of language is shaped by cultural forces.

So, when we say that the words “apple” and “bite” are each represented by n values, we are describing vectors that position each word as a point in n-dimensional space. If we return to our example: “The moral of Snow White is to never eat …” The model solves this not by understanding the story but by calculating what is statistically most likely to follow. And in this case, “apples” emerges as a probable next word (and always among other possibilities) because the math insists it belongs there.

How do we move from math to memory?

Understanding embeddings as mathematical representations in n-dimension space, where each dimension captures relationships between words, helps explain how AI simulates understanding. But these coordinates aren’t just theoretical; they must be stored in physical memory. Each embedding contains hundreds or thousands of numerical values — represented in 4- to 32-bit precision, depending on the application. GPT-3, for example, uses over 12,000 dimensions, while smaller models might use 768 or 1024.5 The more dimensions an embedding has, the larger the size and the more numbers it must store. And each number takes up space. This need for space creates a fundamental constraint: as representations grow more sophisticated, they require more physical space in memory. Beyond storing embeddings, models must also manage intermediate computations in a key-value cache (KV cache), a kind of short-term memory that allows the model to recall previous interactions without recalculating from scratch. And as the model processes more tokens, it needs to store more temporary data in its KV cache, which in turn increases the overall memory use.

All these things add up: the embeddings, their sizes and the cache. As models take in more tokens and store more relationships, their memory needs swell, and not just in size but also in complexity. It’s here that compute and memory architecture begins to shape what the model can and cannot do.

Table 1: Each token converted to an embedding is defined by n dimensions, where n might be 512 or more. Click image to enlarge

Table 1: Each token converted to an embedding is defined by n dimensions, where n might be 512 or more.
(These aren't real model values. The example is illustrative to show how each word is representated by a long list of numbers.)

The question is not just how much a model can compute, but also how quickly it can access what it already knows. To enable these needs, improvements in memory systems have become increasingly important. One such example is high-bandwidth memory (HBM), originally developed for scientific computing and now adapted to support AI applications. Today, it plays a quiet but powerful role in AI systems, helping models work more efficiently with large volumes of data. Often, it’s not just compute that can slow them down. They’re also influenced by the speed limits they have on memory access or how quickly a model can reach back and retrieve what it already knows.

What’s coming up next?

The memory challenge only deepens when we look beyond embeddings to the role of context. The more text the model needs to hold onto, the more memory it requires. If you’ve ever seen a message like “you have 3 responses left,” you know that the model is nearing the maximum context length for that conversation (measured in tokens), and you’ll soon need to start a new thread. This constraint doesn’t just affect what AI systems can do; it also shapes how they handle and represent context itself. That’s a topic we’ll explore in the next part of this blog series.

Technical contributors

系統效能工程師

Felippe Vieira Zacarias

Felippe 是 Micron Technology 的系統效能工程師他與資料中心工作負載量工程團隊合作,提供端對端系統觀點,以瞭解資料中心工作負載量的記憶體階層使用情況。Felippe 在高效能運算和工作負載量分析方面擁有豐富的專業知識,曾在著名的超級運算中心擔任研究工程師。他擁有 Universitat Politècnica de Catalunya 的電腦架構博士學位  

Ecosystem Development Manager

Shanya Chaubey

Shanya helps manage ecosystem development for high-bandwidth memory in cloud memory and AI applications at Micron Technology. In addition to cultivating strong relationships across the technology ecosystem, she combines technical expertise in AI, market intelligence, data engineering to help CMBU anticipate and adapt to rapidly evolving AI workloads. With a foundation in mechanical engineering and a Masters in Data Science from University of Colorado Boulder, she thrives at the intersection of rigorous technical analysis, emerging AI architectures, and strategic vendor collaboration. 

1. Schneppat, J.-O. (n.d.). Whitespace tokenization. Schneppat AI. Available at https://schneppat.com/whitespace-tokenization.html
2. Embeddings: Embedding space and static embeddings. (n.d.) Course within Google Machine Learning Education offerings. Available at https://developers.google.com/machine-learning/crash-course/embeddings/embedding-space
3. Tennenholtz, G., et al. 2024. Demystifying embedding spaces using large language models. Conference paper for ICLR 2024. Available at https://www.cs.toronto.edu/~cebly/Papers/2991_demystifying_embedding_spaces_.pdf
4. TensorFlow. (n.d.) Embedding Projector. Available at https://projector.tensorflow.org/
5. Li, C. June 3, 2020. OpenAI’s GPT-3 language model: A technical overview. Lambda. Available at https://lambda.ai/blog/demystifying-gpt-3 

內容策略行銷主管

Evelyn Grevelink

Evelyn 領導美光科技雲端記憶體業務部門(CMBU)策略行銷團隊的內容策略。她熱衷於透過富有創意的策略性故事,扮演工程與行銷團隊之間的媒介。Evelyn 擅長撰寫令人信服的敘述並製作設計插圖,以傳達大型語言模型、AI 和先進記憶體技術的複雜概念。她擁有加州州立大學沙加緬度分校的物理學士學位。