19th December 2024
It’s been a while! The last edition of this newsletter was way back in February, which, I must admit, is a bit rubbish of me. I did manage to write the second edition of my book and have it published during that time, but, as the year draws to a close, I thought it was about time I got back to my trusty typewriter. This edition is really a review of the year, which has been phenomenally busy for anyone who is working in AI right now, with a bit of a look forward at what 2025 might bring.
And I promise more regular editions in 2025!
The Year in AI
When will the AI hype ever end? Two and a bit years after the launch of ChatGPT, the new functionality, product releases, and marketing spin have continued unceasingly and, it could be argued, have actually accelerated during 2024. And it is all about Generative AI. There is no room in people’s heads for all of the AI that came before (image recognition, prediction, clustering, etc) even though these AI technologies are still delivering huge value in a reliable and robust way. As an AI adviser, this is what 2024 has felt like:
So, with the understanding that there is still a load of good work being done, and to do, with these ‘narrow’ AI models, let’s look at the key (and heavily biased toward GenAI) themes of 2024.
The biggest difference between 2023 and 2024 is that there is now more than one big kid on the block. The Large Language Models (LLMs) developed by Anthropic (Claude), Google (Gemini), and Mistral are all up there with the capabilities of OpenAI’s latest model (GPT-4o). I regularly switch between them or try each of them for different versions of answers, and there is very little obvious difference. My ‘go to’ model, though, tends to be Claude Sonnet 3.5, which seems to give ‘warmer’ answers than the rather colder GPT-4o. For an even softer approach and more for personal than business questions, Inflection’s Pi model can be useful (it has been trained to be more responsive to emotional cues). Meta has also released bigger and better versions of their LLaMa model, but I’ll talk more about ‘open source’ models further down.
And then there is CoPilot. This was the year that it finally started being a bit useful. Most organisations I speak to are trialling a handful of licences, and, it must be said, the word I hear most about their experiences is ‘underwhelming’. This hasn’t been helped by Microsoft’s incessant marketing claims that CoPilot will solve all of your problems and make your whole organisation so much more productive. Most of the M365 use cases are marginal, saving users a little bit of time here and there, and in tasks that, in most cases, could be done in ChatGPT (and potentially for free). The killer use case for me, though, is the Teams transcription and summary, which does a pretty good job and saves loads of time. (Although more open tools like Otter allow you to search across all your conversations). For the more technical, Copilot for Github (which acts as a coding assistant) and Azure are godsends and are now part of my normal working practices.
CoPilot does lead the way in integrating LLMs into enterprise applications, which has been another big 2024 trend. It is now almost expected that any major enterprise app should have an LLM somewhere doing something and they will they generally charge you a premium for that. Some of these can be useful (I use Notion, and the AI search across all my notebooks can sometimes unearth just the right thing), but they are all inherently siloed to their own data. CoPilot can only look at Microsoft data, Apple Intelligence can only look at Apple data, but, unless I build it myself, nothing looks across all my email accounts. Most software vendors live in this fantasy land where each person only uses their own products, but a lot of us have M365, iCloud and Gmail, for example, and we are the ones that an LLM across everything would really help.
As well as the integration trend, a few of the LLM vendors have developed their own desktop apps, allowing direct access to the model without having to use a browser. The ChatGPT one is particularly good as it can sit quietly in your taskbar until needed. It also does voice commands, searches the internet and connects directly to some other apps. Another good desktop app is from Perplexity, which is actually the best tool to use if you do any sort of research. Behind the scenes, it uses an LLM of your choice (GPT-4o, Claude 3.5, LLaMa 3.1, etc) but layers over its own LLM that is constantly scouring the internet, which means it is much more up-to-date than the base models. It also thinks through your question in more detail and provides citations to the source documents. I use Perplexity extensively in my work.
Another research tool I use is NotebookLM from Google (so it has Gemini at its back-end). The app allows you to upload up to 50 PDFs that you can then answer questions on. It’s like a simple, user-friendly RAG tool (which I’ll talk about more in a sec). I’ve also used this to help me complete research papers: I upload a corpus of relevant papers plus my own and ask Gemini to tell me what I have missed. It usually finds one or two little gems that I can add in. NotebookLM also has a very quirky feature that lets you automatically create a podcast of the documents you have uploaded. It’s a little American at the moment (although UK voices are coming) but will be great for quickly creating audio content from an interview transcript, for example. (I’ll create one for the newsletter).
Beyond the major vendors, there is now a really long tail of LLM-enabled applications on the market. Whilst there are a few diamonds in there, many are simply wrappers around ChatGPT that offer very little more but at a price. Many are also opaque about how they use your data (some cynics might suggest that the apps have been built primarily for data collection, with their outward functionality just a secondary consideration).
Some of those useful little apps are to do with LLM development, such as Langchain. This is a great open-source tool for developers to help build LLM-enabled applications for enterprises, and it has actually made its way into some of the big platforms, like Microsoft Azure. One of Langchain’s core uses is to build Retrieval Augmented Generation (RAG) applications - this is where the LLM is pointed at a specific corpus of data so that it can answer questions only on that content: its language comes from the internet, but its knowledge comes from your corpus of data. This is akin to some of those enterprise apps with built-in LLMs but with major advantages: you can have multiple sources of data, you can control the prompting much more, you can control how your prompt data is used and you can choose whichever LLM you want. All of this means that built (rather than bought) LLM apps will be more accurate, more relevant, more transparent and cheaper to run. Most of the work I have been involved with this year, beyond AI strategy work, is building LLM-enable applications for clients, especially chatbots.
The development has been made a lot easier this year as the big cloud platforms (Azure, AWS, GCP, etc) have become much better at supporting LLMs. Last year, the tools were in their infancy and would change every few weeks, but now they are more stable, user-friendly, and collaborative. For example, Microsoft’s AI Studio allows chatbots with RAG connections to data on your enterprise cloud and deployment through multiple channels, including Teams.
The greater availability of different LLMs now provides more than just freedom of choice. Some vendors are more open about how they use your data than others (clue: don’t go by the company name), and can have different ultimate ‘missions’ that may or may not align with your values. But, as I hinted earlier, one of the most welcome changes this year has been the option of being able to select capable open source models.
Open source models are not strictly open source - they are ‘open weight’. This means that the code that was used to develop the models is hidden, but the weights of the model (i.e. how all the different words relate to each other) are available. This makes fine-tuning of these (i.e. creating specific versions of the model based on your own data) much easier and more transparent. The king of the open-weight models is currently Meta’s LLaMa 3.1 or 3.2 405B versions, which are almost as capable as ChatGPT. But it is with the smaller models where the really interesting stuff can happen. Models with parameters less than 100 billion (typically 70B, 32B or 7B) can be downloaded and run on ‘domestic’ computers. For example, I have LLaMa 3.1 32B running on my MacBook Pro (using Ollama). This makes it very easy to connect to my own data, but it is crucially very private. My source data and my prompting data stay on my laptop - in fact, I can turn off the WiFi on my laptop, and the model will still work. I can up the performance of the model by fine-tuning if needed. For those who still need the capabilities of the bigger models, there is now a halfway house option - Microsoft (and others) offer ‘serverless’ open-weight models (such as LLaMa 405B), which means you access it via an API and just pay per token used, whilst the cloud provider takes care of hosting the (820 Gb) model.
And finally, over 2024, we have also seen advances in the image and video generation capabilities of AI. Most of this has resulted in lots of really poor images, that are clearly AI generated, populating websites and social media. Part of this is the technology’s fault and part the user’s execution. But one of the most interesting announcements, only a few weeks ago, was the general availability (in the US at least) of Sora, OpenAI’s highly-capable video generation model. In my job, and for those I advise, I don’t use video generation much at all, but I am really curious how this will develop for others, especially considering the risk to a whole industry of people.
The Year Ahead In AI
One of my predictions for 2024 was the rise of ‘agentic LLMs’, and this has certainly been one of the key developments we have seen in the second half of the year. Agentic LLMs are where the model, or models, put much more effort into considering the question that is being asked and how it should be answered. Sometimes, there is a ‘manager’ LLM that plans the whole process and then brings in different LLMs or services to complete the process. This is typically used in more complex situations because it takes longer to come up with an answer and will use many more tokens. Developers can build agentic workflows using tools like Langchain, but the big guys are also getting in on the act, with the release of a number of specialist models such as OpenAI’s GPT-o1 which have these processes ‘built in’ (Perplexity also uses this approach). So far, the results have been very mixed, especially at the more complicated end of the processes (with the usual gap between the hyped examples from the vendors and the actual experience of enterprises). But 2025 will certainly see this branch of the technology get better and better and, at some point, will become the default approach for any querying of LLMs. And this all leads naturally to the ‘personal assistant’ model that Sam Altman is pushing so hard (especially when you combine it with the Small Language Models that can fit on smartphones). We may not get that far in 2025, but we will be a few more steps along the path.
Of course, at some point, all this technology will need to start making (profitable) money for the vendors. At the moment, it is the ‘shovel and pick salesmen’ who are getting the profit, but Microsoft, OpenAI, Google, etc, will be looking for the promised returns. This will be particularly acute for those vendors who have taken on lots of investment (which is most of them). While Google (obviously) and OpenAI (surprisingly) are starting to look at potential ad revenue, Microsoft will need to start seriously monetising CoPilot very soon. If and how they do that (which is unlikely to come from the £30 per user per month stream) will greatly affect how the product develops. It will certainly get better at doing what Microsoft claims it can do, but will it end up just being bundled as part of Office? That would be my bet.
I also think we will see Generative AI being used outside its core use cases of language and image generation. We are seeing transformer models (one of the core technologies underpinning GenAI) being used for image recognition tasks, and there are GenAI applications now which can generate protein structures.
One of the big arguments against AI, and GenAI in particular, is the enormous amount of energy that is consumed to train and use the models. Whilst awareness has risen this year, I think 2025 will be the year this issue really comes to the fore. Vendors and users will no longer be able to ignore this fact, especially as demand grows and grows. My clients are already challenging me (quite rightly) to build applications that are proportional to the task at hand so that they use the minimum amount of energy possible. Why use a Ferrari to drive to the shops when a Fiat 500 will do?
This is where the Small Language Models come in. Rather than being a peripheral case, they will become the default choice when building GenAI applications. Only the more complex applications will turn to GPT-o1, for example. And this will, of course, create another revenue challenge for the vendors. While 2024 saw some existential challenges for the big players (remember OpenAI almost imploding?), this coming year will be a big test for them and their investors.
As is traditional, I have put together a Spotify playlist of some of my favourite tracks from 2024. It’s an eclectic mix, as ever. The band English Teacher get the honour of two tracks in the list, since their record ‘This Could Be Texas’ was my LP of the year. (They were also amazing to see live last month). Hope you enjoy the playlist!
One highlight for me this year was the publication of the second edition of my book, The Executive Guide to Artificial Intelligence, which has been massively updated to include everything GenAI.
And that only leaves me with the lovely task of wishing you and your families a very merry Christmas and a happy New Year. I will see you all, hopefully well-rested, in 2025…