Hello Friends,
The power went off in Charles' house yesterday so we went to bed very early. When it came back on at about 1:00 in the morning he woke up and could not get back to sleep. So he started writing this week's AI for the Rest of Us newsletter at around 3:00am armed with a cup of Earl Grey. He is sorry if it feels a bit sleepier than usual.
This week Hannah was back in her natural habitat stomping up and down the mountains of Sapporo, some of which still had snow at the summit. Hokkaido, Japan’s north island, was absolutely beautiful but she’s since flown south to Fukuoka where she’s writing this newsletter from a coffee shop impatiently waiting for a storm to pass over so that she can go and explore the coast. This isn’t supposed to be a travel journal but it’s starting to feel like one! Skip ahead for AI content!
On Saturday Charles went to see his brother-in-law and brother-in-law’s wife in a production of Come From Away. He doesn’t, as a rule, go a bundle on musicals but thought this was terrific, with a glorious Celtic-influenced score. It is based on the true story of what happened in the small town of Gander, Newfoundland, on 11 September 2001. When US airspace was closed following the attacks, 38 planes were diverted to Gander's airport, effectively doubling the town's population overnight with nearly 7,000 stranded passengers from around the world. The show follows the five days that followed, telling the stories of both the bewildered travellers and the locals who housed, fed, and looked after them with extraordinary generosity. It's a rare piece of post-9/11 storytelling that focuses not on the attacks themselves but on human kindness — just the kind of uplifting story we need at the moment.
Newsletter readers have just one more week to grab a ticket for Agent Craft with a whopping 50% off. These tickets are just £48 with code AI4NEWS and reserved exclusively for YOU our lovely readers. Since last week we have added two more incredible speakers to our line up:
- Shaun Smith is a Maintainer of MCP and specialises in Open Source Agents at Hugging Face
- Foluso is a Founding Engineer at Georgie AI, an enterprise security and governance platform for agents
Check out the full agenda here.
We hope to see some of you there on Friday June 12th.
Hannah & Charles
What's Hannah reading this week?
There is a lot going on in the world of cyber-security right now, the ripple effects of AI Coding Agents are hitting security teams like a tsunami. It was hard to keep up before, it feels impossible now.
A couple of weeks ago NIST announced that they would no longer be enriching all CVEs (Common Vulnerabilities and Exposures) - there are simply too many new CVEs for them to keep up. The term Vulnpocalypse has been coined by many security researchers in response to the enormous volume of new (both AI discovered and AI authored) vulnerabilities. Cameron Walters explains the impact in simple terms in his recent post.
“What I'm telling every team I work with right now: rethink your vuln management program from the triage logic up, because severity reclassification processes and lifecycle assumptions built on NVD enrichment data are no longer reliable.”
In response to Anthropic’s Mythos Model and Project Glasswing an A-Team of cyber-security experts and CISOs have pulled together a practical response, to help security professionals orient themselves to the huge changes that are happening. The “AI Vulnerability Storm”: Building a “Mythosready” Security Program is 24 pages of expert advice and well worth a read for anyone who works in technology, not just cyber-security professionals. Page 16 is where the meat of the document begins with the risk register and from page 19 a list of priority actions has been provided.
For me, the woman with the supply chain security start up, I appreciated this advice on vulnerability management.
Long-term, there is no alternative to building a permanent Vulnerability Operations (VulnOps) function, staffed and automated like DevOps, but for autonomous vulnerability research and remediation.
VulnOps is a new term to me but I’ll certainly be using it going forwards. BIMP is essentially a fully automated VulnOps platform for container base images! It’s simply not possible to put humans on the critical path for vulnerability response anymore and I hope BIMP will help teams automate their way out of a huge pile of security toil.
In other security news we’ve seen some of the most successful AI Coding platforms suffer from major security incidents. Both Vercel and Lovable have found themselves in a pickle, both with the vulnerabilities themselves and their response to them. Sergio Visinoni lays it out better than I ever could in his blog post Lovable, Lovableed, Lovabad.
Security is hard and we want to believe that the software creators we rely on are doing their best. We lose faith when the response to issues does not reflect the importance of the issue, or potential user impact. If we accept that every company is now a software company, then every company is now vulnerable to automated attacks. That means every company needs their security incident response runbook at the ready.
From my experience, very few have this in place. Not only does everyone need their incident response plan to remediate the issue but they also need the PR playbook, because a security incident does not necessarily need to be a disaster, but you can certainly turn it into one!
Another ripple effect we’re seeing in the world of software development is the unprecedented reliability issues at GitHub. Mitchel Hashimoto wrote about how he is moving Ghosty off GitHub despite a lifetime spent on the platform.
For the past month I've kept a journal where I put an "X" next to every date where a GitHub outage has negatively impacted my ability to work. Almost every day has an X. On the day I am writing this post, I've been unable to do any PR review for ~2 hours because there is a GitHub Actions outage. This is no longer a place for serious work if it just blocks you out for hours per day, every day.
Reading this post made me reflect on the importance of GitHub as the home of so much of the world’s code. We need GitHub. It is critical global infrastructure, and it is struggling under the weight of millions (if not billions) of AI Agents spewing out code. I hope the engineering team at GitHub are doing OK under this immense strain - HugOps all round!
In my career I’ve introduced a lot of teams to the practices of Site Reliability Engineering and now more than ever I believe teams need to get their SLOs, Error Budgets and Error Budget Policies in place. Reliability is fundamental and your users will notice if you prioritise speed above stability. An Error Budget Policy lays out what a team should do if they have failed to meet their availability and reliability targets (SLOs) and I think everyone should have one, it’s more important today than ever.
What’s Charles reading this week?
I joined my friend and long-term collaborator Anne Currie on her excellent new podcast, Asynchronous and Unreliable, to look at how we communicate the increasingly complex world we’re building. There are a lot of my thoughts on AI here. I talk about why AI generated copy is so joyless to edit; compare AI music to "something that sounds like the sort of thing that somebody who didn't know what music sounded like but had read about it in a book might produce," (*cough* Hannah *cough*); how at roughly a billion users, the societal effects of LLMs are enormous, with documented cases of people experiencing psychosis or being encouraged toward self-harm through interactions with chatbots; and I make a fairly pointed aside about the industry's attitude toward training data.
Anne and I also talk about AI and sustainability. I slightly push back on Hannah Ritchie's argument that individual LLM use doesn't meaningfully affect your carbon footprint. It is technically true, but risks missing the point, which is that we’re building out the infrastructure at staggering speed and that has an impact at scale. But how much impact is hard to be sure about since the companies building the models are not required to disclose it. This results in wildly varied estimates as to how serious AI’s contribution to climate change actually is. Back in December we mentioned an example from ‘Empire of AI’, a bestseller focussing on water consumption, which appears to be off by several orders of magnitude (the book has since been revised).
Last weekend Damien Gayle reported in the Guardian that the UK Government’s Department of Science, Innovation and Technology (DSIT) has revised it carbon estimates:
…energy use by AI datacentres in the UK could cause the emission of up to 123m tonnes of carbon dioxide (CO₂) – about as much as generated by 2.7 million people – over the next 10 years.
That latest figure replaces a previous estimate – since deleted – that claimed emissions would reach a maximum of 0.142m tonnes of CO₂ in a single year.
I don’t blame them. Getting this stuff right in the absence of reliable data is extremely hard. Nevertheless DSIT’s revised 10 year estimate is just shy of two full orders of magnitude higher. Oops. The revision seems to be the result of an investigation from Carbon Brief.
The Musk vs Altman trial got underway this week, which is one of those trials where I want them both to lose. The judge in the case is Judge Yvonne Gonzalez Rogers, who also presided over the Epic v. Apple case.
The consensus view, at least from the reports I’ve read, is that Musk was ill-prepared and the whole thing is petty and spiteful, so it is perhaps worth reiterating that this is a real federal lawsuit with the entire future of the biggest startup in history at stake. Reporting from the courtroom in Oakland for the Verge, Elizabeth Lopatto describes Musk as appearing unfocused and uncharming. “This is not the first time I’ve seen Musk in court. During his defamation suit, he turned on the charm and the jury responded by finding him not guilty. Today he looked adrift and unprepared. The only times he showed real animation were when he was bragging about how much he’d done for OpenAI,” she writes. Matteo Wong for The Atlantic says, “The trial makes the AI boom seem sordid and small. In his sworn deposition, Altman wrote that Musk used to message him complaints that he wanted more credit for the success of OpenAI and took offense at not being included in an anniversary photo.”
Back in February, Elon Musk arranged for SpaceX to buy his AI lab xAI, which after launching a frontier model last year lost all of its founders, with Musk subsequently saying it needs to be rebuilt from scratch. Part of that turnaround was apparently exploring a merger with the French lab Mistral, and now there’s a deal to ‘partner’ with Cursor for $10bn, with an option to buy the company later in the year for $60bn. Cursor has also been struggling — it was the super-hot AI coding story this time last year, but Anthropic is now far ahead, and Cursor is ultimately dependent on OpenAI and Anthropic models. The Information has reported that Cursor had struggled to raise from VCs, not least because it had over 20% negative gross margins last quarter, though it turned positive more recently.
Microsoft finally managed to ship a version of Copilot that can interact with your documents in Office. You might think Microsoft would be first to do this with their own office suite, but you’d be wrong. Anthropic launched this in January after a preview last October, and OpenAI launched the same in March. Microsoft: yesterday's technology, tomorrow.
OpenAI and Microsoft jointly announced an amended agreement that will allow OpenAI to go beyond Microsoft’s Azure and “serve all its products to customers across any cloud provider”. The announcement states that Microsoft will continue to have a license for OpenAI’s IP and models through 2032 and that Azure will remain the “primary cloud partner” for OpenAI during that time (should Microsoft continue to be able to honour that). But Microsoft’s licence “will now be non-exclusive,” the announcement says, letting OpenAI make its models available through other major cloud providers going forward. The first of these is Amazon, with OpenAI's top models officially available on Amazon Web Services' Bedrock managed inference and agent platform.
Something that comes up again and again in my consulting work is the need for a common language — what Domain Driven Design calls ubiquitous language. In the course of writing my most recent sponsored article for The New Stack and Kin Lane’s new venture Naftiko we talked about JSON Schema which, he suggested, is less a technical standard than a communication tool — a way for teams to agree on what they actually mean when they say "address" or "invoice" or "healthcare record". That shared vocabulary, encoded into something machines can enforce, is also something that enterprises need as they try to wrangle AI into reliable workflows. Because LLMs are inherently unpredictable, the organisations doing best with AI integration are those who did the unglamorous work first: cleaning up their data, defining their terms, and getting everyone — both human and machine — on the same page.
Updates
|
|
Agent Craft
Use AI4NEWS for a massive 50% for the next 2 weeks
|
|
|
Next Meetup May 14th
Join our next meetup in London
|
|
|
AI DevCon
1-2 June, London & Virtual. Newsletter readers get 30% off with discount code AIFTROU30 - thanks Tessl!
|
|
|
Follow us on LinkedIn
Bite sized nuggets of AI learning!
|
|
|
Follow us on BlueSky
Bite sized nuggets of AI learning!
|
|
|
Catch Up On The Conference
Subscribe now and don't miss all the latest recordings!
|