Small is ...
Many of us are focused on the largest and "best" LLMs available, and with good reason. However, I want to point out that there is also a quiet revolution at the other end of the spectrum with smaller models. For example, Microsoft recently announced Phi-2 (2.7B) and Google announced Gemini Nano (1.8B & 3.25B).
Though they won't compete on broad benchmarks with their larger brethren they can:
- achieve surprisingly good performance on targeted tasks.
- require less computation so that they can be deployed on the "edge"
- in your phone or other device
- possibly in your browser
- in your CDN
- in your modest data center
- incur less latency so they're a better fit for targeted interactive use cases where response speed is important.
Unfortunately, these two particular model families are not open source but I expect we'll start to see announcements for high performing open source ~3B or less parameter models with permissive business use licenses.
Two key points from all of this:
- The key to getting these models to perform well is data quality not just volume.
- Training a small model is within reach of many organizations and can yield a real competitive advantage.
Many organizations have years (if not decades) of proprietary data. If that data can be accessed, cleaned up, filtered, vetted, etc. it could be a great starting point for a proprietary model and a real competitive advantage.
Phi-2 allegedly took 14 days on 96 A100 GPUs ... so at current hourly GPU rental rates a rough estimate would be 14 days * 24 hours * 96 GPUs * ~$2.06 rate ~= $67k ... of course, that is only part of the costs involved, but that is within reach of many organizations and projects.
Plus, if a base open source model is released that is close to your use case, you may be able to fine-tune it for tens of dollars.
So, you may not be ready to train your own model today but it may be useful to start thinking about it and exploring your organization's data and possible use cases.
What model would you like to have available today?