icon: LiNotebookTabs
Title: Not So Typical Intro to LLMs
We think you probably have already heard a thousand times about what an LLM is, so we won’t overload you with all the definitions again. If there is one key thing to understand about Large Language Models (LLMs), it is this: they are LARGE neural network models designed to predict the next token in a sequence based on the preceding tokens. That’s the essence of their functionality.
The popularity of LLMs is due to their versatility and effectiveness. They perfectly cope with tasks such as translation, summarisation, sentiment analysis, information extraction, etc. We will learn more about these use cases along the way.
While there are quite a few differences between the Open Source vs Closed Source Models, there is no definitive answer as to which is better or worse. We highlight the following as some key considerations:
What you prioritize the most | Which is generally preferred |
---|---|
Quick development and industrial-grade quality | Closed Source Models |
Minimal infra setup and in-depth technical knowledge | Closed Source Models |
Low Running Costs* | Closed Source Models |
Avoid the continuous effort to update the models | Closed Source Models |
Privacy: No Data can be sent out | Open Source Models |
Need to adapt the architecture of the LLM | Open Source Models |
No reliance on external vendors | Open Source Models |
When it comes to quality, which most of us care the most about, the majority of open-source LLMs are still performing worse than GPT-3.5 and GPT-4. Both on standard benchmarks.
💡 Don't worry about understanding how to interpret the benchmarks table. These benchmarks are used to evaluate the capabilities of language models in understanding, reasoning, and problem-solving in various domains.
Here is the models' performance on various tasks:
Ever since the start of LLM hype, you may have found a lot of discussions around “Fine-tune your Private LLaMA/Falcon/Another Popular LLM”, “Train Your Own Private ChatGPT”, “How to Create a Local LLM” and others.
However, very few people will tell you why you need it. Are you really sure you need your own self-hosted LLM?
To illustrate this further, let’s consider the cost of hosting a LLaMA-2–70B model on both AWS and GCP. It’s worth noting that most companies employ smaller model versions and fine-tune them according to their tasks. However, in this example we intentionally chose the largest version because it’s a model that can match the quality of GPT-3.5 (Yes, not GPT-4).
It's estimated this to be approximately$40k — $60k per month on GCP for inference LLaMA-2–70B.
However, don't take us wrongly, it doesn't mean self-hosting is not resource feasible or reasonable. For lower usage in the realm of 10,000 to 50,000 requests per day, it might be cheaper to use managed services where the models are hosted by companies (e.g., OpenAI, Claude, or Gemini). But after a certain usage level, the cost for self-hosting LLMs would be lower than using managed services. See the image below.
The LLM community believes that in the near future, we will witness a significant increase in the accuracy of new models, including the open-source models, thanks to the active involvement and support of the community.
LLaMA-2–70B
, including server costs and additional expenses for DevOps and ML engineering support, are rough approximations and should be used as a guideline rather than a definitive forecast.