Large language models (LLMs), like ChatGPT, have attracted significant public attention. Many companies are now exploring opportunities to incorporate similar functionality into their products, but with a focus on adding domain expertise.
This can be achieved through transfer learning, a technique where an existing state-of-the-art model such as GPT-3 is fine-tuned for a specific use case using domain-specific data. For example, a use case may require generating medical notes with a specific style and format. Transfer learning allows the refinement of LLMs using proprietary datasets to produce outputs that align with these requirements.
Although transfer learning is not a novel concept, the recent surge in LLM popularity has sparked discussions on how to effectively train and deploy LLMs, leading to the emergence of LLMOps.
LLMOps focuses on the operational capabilities and infrastructure required to fine-tune existing foundational models and deploy these refined models as part of a product. While LLMOps may not introduce entirely new concepts to observers familiar with the MLOps movement, it represents a sub-category with specific requirements for fine-tuning and deploying these types of models.
Foundational models, such as GPT-3 with its 175 billion parameters, are enormous and require vast amounts of data and computation for training. Training GPT-3 on a single NVIDIA Tesla V100 GPU would take approximately 355 years, according to Lambda Labs. Although fine-tuning these models does not demand the same scale of data or computation, it remains a non-trivial task. The key lies in having infrastructure that enables parallel usage of GPU machines and can handle massive datasets.
The inference phase of these large models requires a different level of computing resources compared to more common traditional ML models. Additionally, the inference process may involve a chain of models and other measures to ensure the generation of the best possible output for end users.
As mentioned earlier, LLMOps shares similarities with MLOps, and therefore, the landscape is somewhat familiar. However, many existing MLOps tools designed for specific use cases may not be suitable for fine-tuning and deploying LLMs. For example, a Spark environment like Databricks, which works well for traditional ML, may not be the optimal choice for fine-tuning LLMs.
In general, the current LLMOps landscape includes:
Please note that the diagram provided is for illustrative purposes only, and the list is not exhaustive.
In the long term, it is uncertain whether the term LLMOps will continue to be widely used. Nevertheless, its emergence serves as a reminder that the field of machine learning is rapidly evolving, with new use cases continually emerging.