AI Series: Learning about major language models

We live in an age where machines are beginning to be able to read, write, translate, summarize and even debate. There is a technology that is the basis of this: artificial intelligence. This series of articles is here to take apart this technology layer by layer, starting with how computers learn to understand human language, to what actually happens when you type a question into ChatGPT or Claude, to why these models can seem so “intelligent.” In this series of articles, we will learn the basics, how to use AI, deploy AI locally and even train LLM models.

Artificial Intelligence (AI) is a field of computer science that aims to create systems capable of imitating human intelligence from learning, reasoning, problem solving to understanding language. Simply put, AI is our attempt to make machines “think.”

Natural Language Processing (NLP) is a branch of AI that allows computers to understand, interpret and generate human language. NLP bridges the gap between how humans communicate (natural language) by the way machines process information.

🧠 Machine learning

An AI that learns from data without being explicitly programmed. The more data, the smarter the system.

👁 Computer vision

Ability of machines to “see” and interpret images or videos, such as facial recognition systems.

💬 Natural language processing

A branch of AI that focuses on the interaction between computers and human language – written and spoken.

🤖 Deep learning

The ML subfield uses deep neural networks to recognize complex patterns.

Every time you use Google Translate, type into a search engine, or talk to a virtual assistant like Siri, all NLP is working behind the scenes.

Common NLP tasks include: sentiment analysis (positive/negative), text classification, machine translation, document summarization, information extraction and text generation like ChatGPT or Claude do.

Contents

Popular AI Examples

In recent years, the AI ecosystem has developed rapidly. Here are some of the most well-known and widely used AI models and systems today:

🤖

ChatGPT
OpenAI · GPT-4o

🔷

Claude
Anthropic · Sonnet 4

✨

Gemini
Google DeepMind

🦌

Lama
MetaAI · Open Source

🐋

Deep search
DeepSeek · China

⚡

Grok
xAI · Elon Musk

Besides language models, there are also well-known AIs in other fields: SLAB And Halfway for image generation, Sora And Track for video generation, AlphaFold of DeepMind for protein structure prediction, and GitHub Copilot for coding aids.

The conversational AI models mentioned above are all LLMs (Large Language Models). This is what we will discuss in more depth in the next section.

What is text in vector?

Computers cannot understand words directly, they only understand numbers. So how can a computer “understand” text? The answer lies in the process of converting text into vectors, called Embedding text Or Word embedding.

A vector is a mathematical representation of a word or phrase in high-dimensional space, with words with similar meanings close together.

Imagine a 2D coordinate map, but in hundreds or thousands of dimensions. The words “King” and “Queen” are close to each other because their meanings are similar. The words “Apple” and “Orange” are also close to each other. The relationship between words can even be calculated mathematically:

Examples of word vector relationships

vektor("Raja") − vektor("Pria") + vektor("Wanita") ≈ vektor("Ratu")

This is the magic of word integration: meaning stored in numbers!

Popular techniques that produce word embeddings include Word2Vec (Google, 2013), Glove (Stanford), and the latest is Transformer-based integration which are used in modern LLMs. The more sophisticated the model, the richer and more precise its vector representation.

Similarity search

Finds documents most relevant to a user’s question based on vector proximity. Used in search engines and RAG.

Vector database

Special database for quickly storing and searching vectors. Examples: Pinecone, Weaviate, Chroma, Qdrant.

What is an LLM?

Large Language Model (LLM) is an AI model trained with huge amounts of text data to understand and generate human language. The word “Large” refers to two things: the size of the training data and the number of parameters in the model.

Modern LLMs are almost all built on the architecture Transformerwhich Google presented in the article “Attention is All You Need” (2017). The key to innovation is the mechanism personal attentionnamely the ability of the model to consider the context of the entire text simultaneously, rather than word by word sequentially.

How LLM works in simple terms: LLM receives input text → converts it into tokens → processes through hundreds of transformation layers → generates a prediction of the next word based on probability → repeats until the answer is complete.

The LLM training process includes two main stages: Pre-training (learning from large unsupervised datasets) and Fine tuning (customized for specific tasks, often using RLHF — Reinforcement Learning from Human Feedback) to make the model safer and more useful.

Pre-training

The model learns from billions of text tokens from the Internet, books, codes, and other sources. Very expensive – could be billions of dollars.

Fine tuning

The templates are tailored to follow instructions, be polite, and avoid harmful content.

RLHF

Humans provide feedback on the model’s responses to train the model to better match human preferences.

Inference

When a trained model is used to answer a new question. This is what happens every time you talk.

What is a token in LLM

LLM does not process text character by character or word by word. The smallest unit processed by LLM is called token. Tokens can be a whole word, part of a word, a punctuation mark, or even a space.

Example of sentence tokenization

The phrase: “Artificial intelligence is changing the world!” »

When cer Dasan This with the aim meng change world !

9 tokens for the sentence above (may vary depending on the tokenizer used)

For illustration: 1 token ≈ 0.75 kata In English. So 100 tokens ≈ 75 words. Indonesian and non-Latin languages tend to require more tokens per word because the tokenizer is optimized for English.

Pop-up window

Maximum limit of tokens that the model can process at one time. Claude has 200,000 tokens, GPT-4 has 128,000 tokens.

API Fees

LLM is charged per token – both input (questions) and output (answers). The longer the text, the more expensive it is.

Want to count the chips? To try platform.openai.com/tokenizer Or tiktokenizer.vercel.app. Paste your text and see how the template cuts it into tokens!

What are settings in LLM

If tokens are the “food” consumed by LLM, then setting This is the “intelligence” he possesses. The parameters are numerical weights (numbers) stored in the neural network model, which are the result of a months-long training process using thousands of GPUs.

Each setting stores a bit of “knowledge” about language, world facts, logic, or patterns. When the model answers your question, millions or even billions of these parameters interact mathematically to produce the correct answer.

Size comparison of popular models

Llama 3.2 (small)

Llama 3.1 (middle)

Deep search R1

GPT-4 (estimate)

B = Billion (Billion) · T = Parameters Trillion (Trillion)

However, more settings don’t always mean better. Smaller models but trained with high-quality data and advanced fine-tuning techniques can outperform larger models. This is the principle of the success of such effective models Mistral7B Or Phi-3 from Microsoft.

A model with 7 billion parameters requires approximately 14GB VRAM GPU to walk. The 70B model requires around 140 GB. This is why large LLMs can only be run in the cloud, not on a regular laptop.

That’s all for this article, I hope it’s useful and Happy Coding!

PakarPBN

A Private Blog Network (PBN) is a collection of websites that are controlled by a single individual or organization and used primarily to build backlinks to a “money site” in order to influence its ranking in search engines such as Google. The core idea behind a PBN is based on the importance of backlinks in Google’s ranking algorithm. Since Google views backlinks as signals of authority and trust, some website owners attempt to artificially create these signals through a controlled network of sites.

In a typical PBN setup, the owner acquires expired or aged domains that already have existing authority, backlinks, and history. These domains are rebuilt with new content and hosted separately, often using different IP addresses, hosting providers, themes, and ownership details to make them appear unrelated. Within the content published on these sites, links are strategically placed that point to the main website the owner wants to rank higher. By doing this, the owner attempts to pass link equity (also known as “link juice”) from the PBN sites to the target website.

The purpose of a PBN is to give the impression that the target website is naturally earning links from multiple independent sources. If done effectively, this can temporarily improve keyword rankings, increase organic visibility, and drive more traffic from search results.

Jasa Backlink

Download Anime Batch