RAG Versus Fine-Tuning
Introduction Let’s talk about RAG versus fine-tuning. Now they’re both powerful ways to enhance the capabilities of large language models, but today you’re going to learn about their strengths, their use cases, and how you can choose between them. So one of the biggest issues with dealing with generative AI right now is one enhancing the models, but also to dealing with their limitations. For example, I just recently asked my favorite llm a simple question, who won the Euro 2024 World Championship, and while this might seem like a simple query for my model, well there’s a slight issue because the model wasn’t trained on that specific information. It can’t give me an accurate or up-to-date answer. At the same time these popular models are very generalistic, and so how do we think about specializing them for specific use cases and adapt them in Enterprise applications. Because your data is one of the most important things that you can work with, and in the field of AI using techniques such as rag or fine-tuning will allow you to supercharge the capabilities that your application delivers. So in the next few minutes we’re going to learn about both of these techniques, the differences between them, and where you can start seeing and using them in. Let’s get started. Retrieval Augmented Generation (RAG) So let’s begin with retrieval augmented generation, which is a way to increase the capabilities of a model through retrieving external and up-to-date information, augmenting the original prompt that was given to the model, and then generating a response back using that context and information. And this is really powerful because if we think back about that example of with the Euro Cup, well the model didn’t have the information in context to provide an answer, and this is one of the big limitations of llms. But this is mitigated in a way with rag because now instead of having an incorrect or possibly a hallucinated answer, we’re able to work with what’s known as a corpus of information. So this could be data, this could be PDFs, documents, spreadsheets, things that are relevant to our specific organization or knowledge that we need to specialize in. So when the query comes in this time, we’re working with what’s known as a retriever that’s able to pull the correct doc doents and Rel relative context to what the question is, and then pass that knowledge as well as the original prompt to a large language model. And with its intuition and pre-trained data, it’s able to give us a response back based on that contextualized information, which is really really powerful because we can start to see that we can get better responses back from a model with our proprietary and confidential information without needing to do any retraining on the model. And this is a great and popular way to enhance the capabilities of a model without having to do any fine-tuning. Fine-Tuning So as the name implies, what this involves is taking a large language foundational model, but this time we’re going to be specializing it in a certain domain or area. So we’re working with labeled and targeted data that’s going to be provided to the model, and and when we do some processing, we’ll have a specialized model for a specific use case to talk in a certain style, to have a certain tone that could represent our organization or company. And so then when a model is queried from a user or any other type of way, we’ll have a a response that gives the correct tone and output or specialty in a domain that we’d like to receive. And this is really important because what we’re doing is essentially baking in this context and intuition into the model. And it’s really important because this is now part of the model’s weights versus being supplemented on top with a a technique like rag. Strengths and Weaknesses of RAG and Fine-Tuning Okay so we understand how both of these techniques can enhance a model’s accur output and performance, but let’s take a look at their strengths and weaknesses in some common use cases because the direction that you go in can greatly affect a model’s performance, its accuracy outputs, compute cost, and much much more. So let’s begin with retrieval augmented generation, and something that I want to point out here is that because we’re working with a corpus of information and data, this is perfect for dynamic data sources such as databases and other data repositories where we want to continuously pull information and have that up to date for the model to use understand. And at the same time because we’re working with this retriever system and passing in the information as context in the prompt, well that really helps with hallucinations, and providing the sources for this information is really important in systems where we need trust and transparency when we’re using AI. So this is fantastic. But let’s also think about this whole system because having this efficient retrieval system is really important in how we select and pick the data that we want to provide in that limited context window. And so maintaining this is also something that you need to think about. And at the same time what we’re doing here in this system is effectively supplementing that information on top of the model, so we’re not essentially enhancing the base model itself, we’re just giving it the relative and contextual information it needs. Fine-Tuning Strengths and Limitations Versus fine-tuning is a little bit different because we’re actually baking in that context and intuition into the model. Well we have greater influence in essentially how the model behaves and reacts in different situations. Is it an insurance adjuster, can it summarize documents, whatever we want the model to do, we can essentially use fine tuning in order to help with that process. And at the same time because



