“We Use ChatGPT Everyday, But Do We Really Get What Large language models explained In Simple Words?”
If you’ve been using the Internet lately, you have likely come across all of the popular screenshots. A friend requests an AI to generate a breakup message or even an extremely formal resignation letter that is actually quite rude. Or perhaps you have used it yourself to rewrite work emails when you really didn’t feel like being professional today. We’re all engaging with this technology now.
It’s all around us on our mobile devices, browsers, and vehicles. But the majority of people are still treating it as if it were some forms of digital wizardry, or even worse, as if it was a brain inside a box with independent thought. What the reality actually is much more compelling and much more straightforward than you may think! So why not grab a cup of coffee and help us dissect in detail how Large Language Models explained in order to remove any corporate jargon or science fiction panic from the definition.
It’s Just Fancy Autocomplete, Really

I realize this may sound like I’m trying to diminish the impact of what I’m saying. But please bear with me here! When you use your mobile device to type the words “I’m going to the” your phone automatically provides the following suggestions: “mall,” “office,” and “bathroom.” Each of these suggestions is determined by looking at how frequently people have typed those words together in previous examples. Now, imagine if your phone had access to much more data and greatly enhanced its ability to make predictions based on patterns found in that excess of data. This is one of the simplest ways I can describe Large Language Models to your non-techy friend.
Large Language Models (LLM’s) are not conscious beings with emotions regardless of whether an LLM produces a highly convincing romantic letter. LLM’s are prediction tools that calculate probabilities. LLM uses its initial input (known as the “prompt”) as a starting point to determine what word will most likely follow by executing a statistical probability analysis across all of the text it has ingested from across the World Wide Web. The end result of this exercise results in determining which word is statistically probable next. This process continues until all necessary words have been produced. This is commonly referred to as Natural Language Generation (NLG), which sounds cool but is simply the generation of text by an AI using predetermined structures created from all interpretations created by thousands of users over many years.
How It Learned to ‘Read the Room’
So, How can a machine become so good at estimation? There’s no magic to it; just a lot of study. Developers use a large-scale architecture known as the Transformer architecture. This was a breakthrough invention. Prior to Transformer technology, AI would read a sentence one word at a time and completely forget about the previous words after reaching the last word in the sentence (which was very frustrating).
Attention is the uses of the Transformer model. The attention mechanism allows the Transformer model to analyze the entire sentence in tandem. Allowing the Transformer model to determine that “bank” has two distinct meanings based on the two-preceding word, “river” and “money.” Once the developers have built this Transformer architecture model that utilizes attention, the developers will input basically the entire internet into this model, billions of words from books, Reddit, news articles, and Wikipedia.
Pre-training and Fine-tuning is means two phases of training Transformer architecture. Pre-training has been termed the bulk of the effort. The Transformer model will simply read for an extended period of time and play a “Fill in the blank” game until the Transformer model has developed strong grammar, factual support, and reasoning pattern recognition. Fine-tuning is the next step and includes improving the useful stage of the model for the input of a user/assistant rather than the randomness of a being an internet troll. This is where the model learns its utility.
Through pre-training and fine-tuning, this model will begin to create contextual embeddings, which involves a mathematical formula and turning words into a coordinate pair. Words like “King” and “Queen” will usually have coordinates that are very close to each other, while “Apple” (fruit) will have coordinates that are very far away from “Apple” (company).
Large language models explained: The ‘Aha’ Moment

Now we will discuss the cool stuff as it relates to model parameters and scale. People say GPT-3 has 175 billion parameters because those are like 175 billion little knobs and dials inside of it that can be turned to change the output that it will create. That’s an incredible number! And because of that scale, something else happened, as well. The AI began to do things that no one taught the AI directly.
That’s where few-shot and zero-shot learning comes into play. Zero-shot refers to not having been told how to do something yet being asked to do something that you have not specifically been given an example of. For instance, “Translate this English sentence into Malay”, and it just does it. Because it understands the concept of translation through the context. Few-shot refers to having at least two examples of doing something, then being given one more object and/or sentence and being told to do the same. You give the AI two sarcastic tweets (the AI will see the first two) and you give the AI a third example and tell it to “Make me a sarcastic statement.” The AI nails the job.
So, all of a sudden, this tool became much more than just a grammar checker. It started to become something that could actually reason. The reason so many started using GPT-3 for NLP (Natural Language Processing) applications that many of us could not even dream of before. Writing code? Yes. Getting therapy? Apparently. Planning a D&D campaign? Absolutely.
But It’s Not Perfect (The Hallucination Problem)
The next likely word, it does not know what is true at all. In some cases the AI tries to please its user, and this leads to it creating falsehoods. For example, if a user requests a quote from a book that does not exist, the AI will provide a quote that is written in a style that could pass for that author’s writing style. While the quote may be inaccurate, it gives the impression of certainty on the part of the AI.
Also, the use of the AI would consider ethical and bias related issues. The AI learns from humans, as humans input data onto the internet (i.e., the training data for the AI). Humans have their own biases, so when the training data is biased in some way, the AI has also learned that bias. Eliminating those biases is a high priority for the industry at present. Lastly, the cost of AI, both during modeling inference and deployment (using AI) is an important issue. Operating AI requires a variety of large server farms that require a significant amount of electrical energy (running on a utility provided business account; i.e., it’s not free). There are many physical costs associated with using AI.
So, What Now?

LLMs can be viewed as a very efficient pattern matcher. While LLMs imitating thought processes do not contain true thoughts, they imitate humans’ thinking process to the extent they have impacted how the world functions today. Vectrain has been able to witness these changes in action and saw that organizations/firms using these tools with their workforce were 10 times more productive than those that don’t use them. Ultimately, large language models should not function to replace humans, but rather to assist humans with mundane tasks (e.g. drafting an email and summarizing a report). So that humans can engage in more creative work. Large language models explained will be part of everyday life and now would be a great time to explore it!