10 January 2024 / 6:00 PM / ATLAS Building, CU Boulder
Launching a Large Language Model (LLM) like GPT-4 which powers ChatGPT can involve at least three distinct processes: training, fine-tuning, and optimizing. Each of these has its own purpose and methodology in the development of AI models.
At a high level, training establishes the foundational knowledge of the LLM, fine-tuning adapts it to specific tasks or domains, and optimizing enhances its performance and efficiency for practical use. Each process is crucial in developing an LLM that is both powerful and applicable to real-world tasks. This meeting will focus on steps 2 and 3: fine-tuning and optimizing LLMs.
Our speaker, Mark Hennings, will cover what fine-tuning is (and isn’t), when to use it, its benefits, and limitations. Mark will also cover how to optimize LLM performance from a broader perspective. See how fine-tuning, prompt engineering, and Retrieval-Augmented Generation (RAG) can all work together to improve LLM performance.
Mark is the founder of Entry Point AI, a modern platform for fine-tuning large language models. He’s a serial entrepreneur, Inc 500 alumni, and self-taught developer who is passionate about UX and democratizing AI.
Notes
This meeting discussed various techniques for optimizing and fine-tuning large language models (LLMs), including prompt engineering, retrieval augmented generation (RAG), and fine-tuning. The presenter, Mark Hennings, explained each technique and how they can be used together or separately to improve LLM outputs. Some key topics discussed included reducing hallucinations, preventing harmful outputs, connecting LLMs to traditional software, and narrowing an LLM’s scope to specialized tasks through fine-tuning. There was also discussion around bias in training data and synthetic data, as well as legal and ethical considerations around certifying AI systems.
Some of the specifics discussed include the following:
- Prompt engineering: This involves carefully crafting the input prompt/context to steer model behavior. Techniques include priming, examples, and “chain of thought” reasoning.
- Retrieval augmented generation (RAG): This supplements the prompt with relevant external knowledge by searching a text corpus for similar embeddings and including them. This can reduce hallucinations and allow referencing real-time or proprietary data.
- Inference parameters: Settings like temperature, top-p/k, and frequency/repetition penalties can affect which tokens models select during output generation.
- Function calling: Models can intelligently recommend actions/functions for an application to take based on the prompt, like calling APIs. This gives models more capabilities but requires carefully controlling what functions they can access.
- Fine-tuning: Re-training models on more domain-specific data narrows their behavior and bakes in desired formatting, style, and capabilities. Task tuning creates specialists for very focused use cases.
- Measuring input diversity, such as by computing cosine similarity between input embeddings. This could help evaluate true diversity.
- The differences between RAG and fine-tuning, with RAG acting more as a wrapper around the LLM and not modifying it.
- Diminishing returns with adding more fine-tuning examples, especially if they are too similar to existing ones. New examples for under-served cases are most impactful.
- Appropriate model sizes for tasks, with larger models generally better for complex writing but smaller models sufficient for classifiers or specialized tasks.
- Bias in models, including bias in pre-training data, synthetic training data, and challenges around certifying “unbiased” models when real-world data contains biases.
- Practical workflows for fine-tuning, including identifying and removing unnecessary data attributes and focusing on examples that teach desired behaviors rather than facts.