Transformer AI models explained Why Your ChatGPT Can “Understand” Bahasa Malaysia?

Did you know it’s amusing to contemplate how if you asked your device to summarize an email that was lengthy you would have received just the first two lines of the email. The device didn’t provide useful/much assistance in doing that! Today you can ask ChatGPT to “Translate this ‘Monglish’ text to ‘proper’ English for my boss”. Note that ChatGPT would actually get the overall feel/vibe of the writing, not just the exact words. How did so much change? I do believe the major difference is not necessarily increasing amounts of data. But rather the major difference was the utilization of Transformer AI models explained. However, do not let the name ‘Transformer AI models’ fool you, it is much less complex than it sounds. The best way to explain all of this would be with a nice cup of “teh tarik” at a “mamak”!

Contents

Wait, so what exactly is a Transformer?
How does attention actually work?
But how does Transformer AI models explained learn to write so well?
Why do bigger models sometimes feel smarter?
So does the AI actually understand what it’s saying?

⌘ CORE SIGNALS / DECODED v1.0 · AI ESSENTIALS

/01 CONCEPT

Autocomplete on Steroids

READ MORE ↗

/02 ARCHITECTURE

The Transformer Power

READ MORE ↗

/03 APPLICATION

Productivity Game-Changer

READ MORE ↗

/04 REFINEMENT

Human Feedback (RLHF)

READ MORE ↗

⏱️ 6 min signal scan 📡 Language Model Logic ⚡ Local Insights

Wait, so what exactly is a Transformer?

Before 2017, most AI that attempted to comprehend language behaved like a forgetful but patient person. Envision reading a lengthy message from your aunt about her neighbour’s cat, a recipe, and then suddenly asking, “can someone pick up my kids?”, by the time AI reads the word kids it had already forgotten the information about your aunt’s cat and the recipe. This was the dilemma with earlier AI models such as RNN and LSTM. They did not have prolonged attention spans. In 2017, Google released a paper called “Attention Is All You Need” which really sums up the introduction of a new method of AI language comprehension called the Transformer Architecture.

Rather than understanding words via left to right scanning of the text, Transformers look at all the context in the sentence as a whole. It then uses an internal mechanism (self-attention mechanism) to determine which of the words in the context of the entire sentence have the most weight in understanding the meaning of the sentence. For instance, in the sentence, “He gave me that old kayu cabinet from his tok kedai”, a Transformer does not think of the word “kayu” by itself but relates it to a part of a context connection to the phrase of “old cabinet” and the phrase “he gave me” to “tok kedai”. In this sense, it builds relationships between terms. Thus, when people refer to Transformer AI models explained, they really mean AI that now has a better understanding of context.

How does attention actually work?

Now let’s make it really practical. You know when you’re in a meeting with five other people, and you’re writing down just the important bits? You miss the « erm », side chats, and people repeating themselves, then get the key points. Multi-head Attention works in a similar manner as this but eight or more times, simultaneously. One of the « heads » may focus on the grammar; another on names; another on emotion or tone. The model then allocates importance of each head to interpret the sentence as a whole. That’s why GPT or Claude or any LLM Model can process things like: Reword this email to remove the keratin, make this 10-page PDF into 3 bullet points for my boss. Other than that, is explain what cloud computing is to me like I’m 15 years old.

The model learned which portions of a sentence are critical based on millions of various inputs. One thing people don’t consider is that the Transformer does not know the order of words as of yet. If you take a bunch of words and put them in a bag, Ali eats fish and Fish eats Ali look the same. This causes confusion. To remedy this, the Transformer employs positional encoding – a method of applying a position number to each word before being processed – similar to a seat number in a movie theatre. If you look at all of the seats all at once, you will still have knowledge of who sits where. If context/phrasal order is not known to the AI, then it would conclude that the phrase I love you is the same as you love me. This would be problematic.

But how does Transformer AI models explained learn to write so well?

The beginning of every transformer model is like a blank slate. It knows nothing. During pretraining, the transformer does not yet have to understand or produce language but is merely processing many amounts of text (books, webpages, and articles) in order to learn to predict the next word of an input sentence. Billions of repetitions of this process cause the transformer to learn patterns, grammar, facts, and biases within language, but at this time, the transformer has not developed any real understanding of how to follow instructions or produce useful responses.

Fine-tuning represents the next stage in developing a transformer language model. After pretraining is completed, the transformer will be trained on a smaller, structured dataset of question and answers or dialogue, which teaches it how to provide useful responses, follow instructions, and interact with humans (users) more naturally than just providing predictions of a word. Thus, pre-training and fine-tuning give modern large language models (LLMs) their general knowledge while providing a practical application for that knowledge.

Transfer learning is an important part of this process. When a transformer model has been trained in one knowledge domain it can be quickly adapted to other knowledge domains. For example, if a transformer has been trained in English, it can be readily fine-tuned to perform well using Malay legal documents even though it has not been exposed to that type of data previously; this is because the transformer does not begin again at zero; it builds upon a set of prior learning; much like if you can drive one vehicle, you can quickly learn how to drive another vehicle.

Why do bigger models sometimes feel smarter?

AI models often get hyped based on size, particularly the number of parameters. They’re like adjustment knobs and twisted during training. The more parameters you have, the deeper you can tune the model, like a convenience store versus a shopping mall.
The mall is big and powerful, but also slower and costs more. Lots of little 1M parameter models make shopping fast and efficient, but in very limited ranges. It turns out it’s not always the case that bigger is better. Formative AI says that lots of large costly models will blurt out long-worded answers to direct questions.

A little model that only costs you a coffee a day can answer better, especially on token saving skills. “So what abut those benchmarks? They matter,” AI says. “Perplexity, Bleu and Rouge scores, human scroing and others are helpful to measure performance on the model end of things.
Most NLP tasks of course work much better with smaller purpose built models, simple things like translation. And those big transformers are where they shine?”
That’s right. They’ve got that self-attention, running up long text and back again, being able to keep going, knowing what’s been said, connected strings of text being generated in generative AI.

So does the AI actually understand what it’s saying?

AI doesn’t understand language like we do. When I tell you that “Nasi Lemak is yum,” it conjures up images and memories, a smell, a taste, perhaps a favourite memory you ate it! not vectors and probability patterns. Words. Bank is not the same as an overhead 2200-pound smart bomb. Models like ChatGPT see it as a cluster of numbers.

Words are represented as contextual embeddings, which is just a way of saying, the “meaning” of a word depends on its surrounding words. Is it “bank” as in the side of a river or “money bank?” Even our context relies on our prior knowledge, not true understanding. This is why AI isn’t sentient, or even aware. It’s just one of the most advanced pattern recognizers, and it doesn’t even know what a river is, it’s learnt that “smash” and “mad,” “sick” and “chill” and “Nasi Lemak” come together enough to make correlations and adjust scores accordingly.

It’s a revolution in how AI happens to work, and to the performance leap behind the latest round of “really amazing AI,” is the attention mechanism. Basically, it enables the model to learn relationships between words across a sentence or indeed entire paragraphs. That’s what is giving it this feeling of coherent context awareness in its responses. So, when it summarizes your notes or even writes them from a prompt, “Nothing special about this really”, unless it really is magic, it’s just that math’s is now very powerful and its system is very good at contexts. Which is more than enough for me.

⌘ Roundtable FAQ · Tech Explained insider only

Q1 Does ChatGPT actually understand Bahasa Malaysia, or is it just translating in its head? ▼

Neither, really. It doesn’t “understand” any language the way we do. But because the Transformer model was pre-trained on massive multilingual data — including Malay, English, and even Manglish found online — it builds contextual embeddings that capture meaning across languages. So when you ask in Malay, it predicts responses based on patterns it saw during pre-training and fine-tuning. No translation step. Just probabilities.

Q2 Why do Transformers perform so much better than older AI at long documents like contracts or reports? ▼

Older models like RNNs process text sequentially — they “forget” the beginning by the time they reach the end. Transformers use the self-attention mechanism to look at every word simultaneously. This allows them to connect a clause on page one to another on page ten. That’s why large language models (LLMs) today can summarise 50-page documents or find contradictions in legal contracts without losing context.

Q3 If I run a small business in KL, why should I care about model scaling or parameters? ▼

Because bigger models (more parameters) cost more to run per query. For simple tasks like sorting customer inquiries by urgency or auto-replying “we’re closed on Sundays,” a smaller, fine-tuned model works perfectly. Model scaling and parameters matter when you need complex reasoning — like analysing customer feedback across five languages. Don’t pay for a Ferrari if a Proton Saga does the job.

Q4 Can a Transformer model translate Hokkien or other Chinese dialects that don’t have much written data? ▼

This is tricky. Transformers rely on lots of parallel text (same sentence in two languages). For dialects with very little written data, performance is poor. However, transfer learning in transformers helps: a model pre-trained on Mandarin and Cantonese can sometimes adapt to Hokkien with very little extra data. But honestly? For truly low-resource dialects, human translators are still far better.

Q5 How do companies actually measure if one Transformer model is better than another? ▼

They use performance evaluation metrics depending on the task. For translation: BLEU scores. For summarisation: ROUGE scores. For general conversation: human preference rankings (which answer feels more helpful). But here’s the catch — a model that scores well on benchmarks might still fail in real-world NLP applications like understanding Manglish. That’s why local testing matters more than leaderboards.

💡 Pro-tip: Better instructions (more context) = better results 🔔 5 insights · Local Context

Transformer AI models explained Why Your ChatGPT Can “Understand” Bahasa Malaysia?

Wait, so what exactly is a Transformer?

How does attention actually work?

But how does Transformer AI models explained learn to write so well?

Why do bigger models sometimes feel smarter?

So does the AI actually understand what it’s saying?

Leave a Reply Cancel reply

SEARCH

LATEST POST

YOU MAY ALSO LIKE

The Strategic Shift to AI Customer Service Systems in Malaysia

Biomimetic and Technomimetic Molecular Science Laboratory

Astronomers Find Shell Structures in Milky Way Galaxy

Learn About Energy and Its Impact on the Environment

More News

Our Information

Transformer AI models explained Why Your ChatGPT Can “Understand” Bahasa Malaysia?

Wait, so what exactly is a Transformer?

How does attention actually work?

But how does Transformer AI models explained learn to write so well?

Why do bigger models sometimes feel smarter?

So does the AI actually understand what it’s saying?

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

SEARCH

LATEST POST

YOU MAY ALSO LIKE

The Strategic Shift to AI Customer Service Systems in Malaysia

Biomimetic and Technomimetic Molecular Science Laboratory

Astronomers Find Shell Structures in Milky Way Galaxy

Learn About Energy and Its Impact on the Environment

More News

Our Information