Title: Not So Typical Intro to LLMs

  • Not So Typical Intro to LLMs
  • Prompt Engineering
  • Formatting Prompt in Python
  • Hands-on Walkthrough and Tasks

Large Language Model(s)

We think you probably have already heard a thousand times about what an LLM is, so we won’t overload you with all the definitions again. If there is one key thing to understand about Large Language Models (LLMs), it is this: they are LARGE neural network models designed to predict the next token in a sequence based on the preceding tokens. That’s the essence of their functionality.

The popularity of LLMs is due to their versatility and effectiveness. They perfectly cope with tasks such as translation, summarisation, sentiment analysis, information extraction, etc. We will learn more about these use cases along the way.

Comparison of the number of parameters of models. Just look at how big GPT-3 is. Nobody knows about GPT-4 since the details are not disclosed.



Open Source and Closed Source Models

  • Closed-source Large Language Models:
  • Open-source Large Language Models:
    • These are developed in a collaborative public manner where the source code is freely available.
      • You can host these models by yourself, usually on a server or powerful machine.
      • A great place is to find these model is Hugging Face’s Hub, which provides thousands of pre-trained models in 100+ languages and deep learning frameworks like PyTorch and TensorFlow.
    • Popular Open Source models



A Bird's-eye View of the Differences

While there are quite a few differences between the Open Source vs Closed Source Models, there is no definitive answer as to which is better or worse. We highlight the following as some key considerations:

What you prioritize the most Which is generally preferred
Quick development and industrial-grade quality Closed Source Models
Minimal infra setup and in-depth technical knowledge Closed Source Models
Low Running Costs* Closed Source Models
Avoid the continuous effort to update the models Closed Source Models
Privacy: No Data can be sent out Open Source Models
Need to adapt the architecture of the LLM Open Source Models
No reliance on external vendors Open Source Models

When it comes to quality, which most of us care the most about, the majority of open-source LLMs are still performing worse than GPT-3.5 and GPT-4. Both on standard benchmarks.

💡 Don't worry about understanding how to interpret the benchmarks table. These benchmarks are used to evaluate the capabilities of language models in understanding, reasoning, and problem-solving in various domains.

Benchmarking of GPTs and other open-source models source

Here is the models' performance on various tasks:

Task Categories Benmarks on MT-bench source



A Quick Peek into Self-Hosting Costs

Ever since the start of LLM hype, you may have found a lot of discussions around “Fine-tune your Private LLaMA/Falcon/Another Popular LLM”, “Train Your Own Private ChatGPT”, “How to Create a Local LLM” and others.

However, very few people will tell you why you need it. Are you really sure you need your own self-hosted LLM?

To illustrate this further, let’s consider the cost of hosting a LLaMA-2–70B model on both AWS and GCP. It’s worth noting that most companies employ smaller model versions and fine-tune them according to their tasks. However, in this example we intentionally chose the largest version because it’s a model that can match the quality of GPT-3.5 (Yes, not GPT-4).

Comparison of LLaMA-2–70B-chat deployment costs on two Cloud Service Providers (CSPs): Amazon Web Services (AWS) and Google Cloud Platform (GCP)
  • ✦ Additionally, let’s include the following extra expenses to the server cost:
    • Payment for DevOps specialists who will handle server setup, load balancing, and monitoring.
    • Payment for ML engineers responsible for model preparation, maintenance, and fine-tuning.
    • Optionally, one-time payment for dataset collection and annotation for fine-tuning.

It's estimated this to be approximately$40k — $60k per month on GCP for inference LLaMA-2–70B.

However, don't take us wrongly, it doesn't mean self-hosting is not resource feasible or reasonable. For lower usage in the realm of 10,000 to 50,000 requests per day, it might be cheaper to use managed services where the models are hosted by companies (e.g., OpenAI, Claude, or Gemini). But after a certain usage level, the cost for self-hosting LLMs would be lower than using managed services. See the image below.

Schematic comparison of OpenAI GPT-3.5 and self-hosted LLMs

The LLM community believes that in the near future, we will witness a significant increase in the accuracy of new models, including the open-source models, thanks to the active involvement and support of the community.

Disclaimer:
  • ✦ The information provided above is intended for illustrative purposes only and is based on a set of assumptions that may not apply to all scenarios.
    • The cost estimates for deploying LLaMA-2–70B, including server costs and additional expenses for DevOps and ML engineering support, are rough approximations and should be used as a guideline rather than a definitive forecast.
    • Actual costs can vary significantly based on a variety of factors such as specific cloud service provider rates, the scale of deployment, and the extent of usage.
    • We strongly advise anyone to conduct a detailed cost analysis based on their unique requirements and to consult with financial and technical experts to obtain a more accurate and personalized estimate before making any decisions regarding self-hosting Large Language Models (LLMs).