How do LLMs „brains“ „work“: Tokens and Vektors visualised

Large Language Models (LLMs) like ChatGPT or Claude impress with their ability to create human-like text. But how do these complex systems actually work? The answer lies in how they process language: through tokens and vectors. To make these abstract concepts more tangible, I had an interactive 3D dashboard developed (No Code, with Claude) that visualises the four basic steps of LLM processing.

Contents

Tokens & Vectors Interactive: A Window into the AI World

The dashboard offers you an intuitive 3D visualisation (HTML plus Three.js) to make complex AI concepts tangible. With just one click, you can switch between four different views:

LLM Concept Dashboard: Token Vectorisation and Sequence Processing

Dimensions: 512

Process Visualisation

Select a step to visualise the LLM processing process.

The four processing steps at a glance:

  • Tokenisation: Text is broken down into small units or assembled from small units (short words, syllables)
  • Vectorisation: Tokens are converted into mathematical representations (or assembled from different corners of n-dimensional space)
  • Attention: The model calculates relationships between all tokens (the connected tokens are checked for meaningfulness)
  • Sequence Processing: The information flows through the neural network – this becomes your output

The steps in detail:

  • Token View: Here you see (randomly) coloured spheres representing individual tokens. In the world of LLMs, tokens are the building blocks of language – they can represent individual letters, parts of words, or whole words.
  • Vector View: This shows arrows pointing in different directions. Each arrow represents a vector – a mathematical representation of the meaning of a token in a high-dimensional space (usually 512-4,096 dimensions). Similar words become similar vectors: „bank“ as a place to sit would be positioned near „chair“ and „furniture“, whilst „bank“ as a financial institution would be near „money“ and „finance“. The vector’s position in space determines the semantic meaning.
  • Attention: Here, vectors and tokens are related to each other. An example is the word „bank“. „Please help me build a bank“ could mean that someone wants to build a garden bench (in which case you’d be in the realm of carpentry or DIY) – or help with constructing a bank building, which would involve architecture and structural engineering. If the LLM isn’t well trained, it might a) not recognise the context and b) mask missing information through hallucination – instead of „admitting“ knowledge gaps or asking follow-up questions. More on this shortly.
    The attention mechanism works like a searchlight: for each token, the model calculates how strongly it should „pay attention“ to all other tokens in the sequence. This allows it to recognise, for example, that „he“ in „The man went to the doctor. He was ill.“ refers to „man“, not „doctor“. (In a larger context, this could be different, as the „man“ could also be a „doctor“ trying to help another doctor – if the human expresses themselves confusingly, the LLM can struggle).
  • Combined, Sequenced View: Here you see tokens and vectors together, illustrating how LLMs translate language into mathematical concepts.

How different LLM outputs can be

  • In the image, you can see what happens when an LLM has comprehension problems. The test uses Mistral’s small model (Mistral 7B), which inherently doesn’t handle German well. Prompt: „Bitte helfe mir beim Bau einer Bank“ (Please help me build a bank). The LLM thinks I want to establish a financial institution. First, it suggests I should „trace over“ (?) a plan. Later it talks about „buyers“ and „settlers“ (Huh? Are we in the Wild West?).
  • In this image, you can see how Claude Opus 4 with „Thinking“ mode activated responds to the same question „Bitte helfe mir beim Bau einer Bank“ (Please help me build a bank): First, it recognises the ambiguity of the question. Then the LLM makes the assumption that one probably wants to build a bench – which is correct in most cases. However, the LLM asks a follow-up question – it knows it could be wrong. The LLM also recognises that the request was in German and therefore the user probably expects measurements in centimetres (rather than inches).
  • And what does ChatGPT do, in this case the LLM o4-mini (with Reasoning set to „Medium“)? The LLM is completely certain that I want to build a bench and provides a very detailed guide. The possibility that I might want to construct a financial institution isn’t considered at all.
  • So you can see that these three LLMs produce three very different results. This is partly due to the default settings chosen by their operators.

Tokens, Vectors: Why is this important?

Understanding tokens and vectors is important to grasp how LLMs like Claude work. Only by understanding the underlying principles can one approach their use with the right expectations.

  • Tokens are the units into which text is broken down. They allow the model to digest language in manageable pieces.
  • Tokens per second: LLMs require tremendous computing power. As a normal user of online LLMs, you don’t notice this because you’re using LLMs running in the cloud on extremely fast computers. You get a better feel for this when running an AI on your laptop. For Intel-Windows users, Intel’s „AI Playground“ is interesting for running (smaller) LLMs locally. You’ll be surprised how slow it suddenly becomes, and how poor the results sometimes are. Sometimes the computer delivers only a few tokens per second, typing about as slowly as a human using a two-finger search system.
  • Vectors are how these tokens are represented in the „brain“ of the model. They capture subtle meanings and relationships between words. Vectors are also the foundation for RAG (Retrieval-Augmented Generation) – „chatting with your own documents“. In this process, documents are broken down into text sections, vectorised, and stored in a vector database. When a query is made, the system searches for similar vectors and adds the found text passages as context to the answer. This allows the LLM to access information that wasn’t in its training data.
    Vector databases like qdrant are used to process the data. Only with these – and other technical solutions – does a) fuzzy searching become possible and b) can the system generate text. A (relatively) simple explanation of vector search is available at qdrant: https://qdrant.tech/documentation/overview/vector-search/

By converting tokens into vectors, an LLM can capture the nuances of language and understand them. This is why models like Claude are able to generate contextually appropriate and nuanced responses.

LLMs have billions of parameters that represent the model’s „knowledge“. The vocabulary (token set) usually comprises 50,000-100,000 different tokens. The short-term memory (Context Window) is the number of tokens the model can process simultaneously – in modern models, this is 32,000 to 200,000 tokens per input – sometimes up to 1 million or more. However, when the context window is fully utilised, output quality tends to decrease – you can recognise this when the LLM makes typing errors, etc.

Conclusion

Understanding tokens and vectors is key to consciously working with LLMs. Tokens determine how precisely the model understands your input, whilst vectors capture the semantic relationships. Knowing these fundamentals allows you to use LLMs more effectively, better assess their limitations, and formulate more meaningful prompts. The differences between the models show: Not every LLM is equally well-suited for every purpose.

Stefan Golling

About the Author

Stefan Golling, Cologne, Germany. Worked since 1998 as a Copywriter and Creative Director in (Network) Agencies and freelances since 2011 as German Freelance Copywriter, Marketing Freelancer, Creative Consultant etc., e.g., in international projects.

Contact

Gern 5 Sterne vergeben
Teilen / Share
Zu Hause » Mag » How do LLMs „brains“ „work“: Tokens and Vektors visualised

Erstellt am:

Zuletzt aktualisiert: