
Most LLMs are pre-trained on a large, general-purpose dataset that is similar in statistical distribution to the task-specific dataset. Large language models are used for few-shot and zero-shot scenarios when there is little or no domain-tailored data available to train the model.īoth few-shot and zero-shot approaches require the AI model to have good inductive bias and the ability to learn useful representations from limited (or no) data. This type of AI architecture uses self-attention mechanisms to calculate a weighted sum for an input sequence and dynamically determine which tokens in the sequence are most relevant to each other. Large language models typically have a transformer-based architecture. Techopedia Explains Large Language Model (LLM)


LLMs are trained with immense amounts of data and use self-supervised learning to predict the next token in a sentence, given the surrounding context. Some of the most successful LLMs have hundreds of billions of parameters.

The label “large” refers to the number of values (parameters) the model can change autonomously as it learns.
