Anthropic's quest for better, more explainable AI attracts $580M

Less than a year ago, Anthropic was founded by former OpenAI VP of research Dario Amodei, intending to perform research in the public interest on making AI more reliable and explainable. Its $124 million in funding was surprising then, but nothing could have prepared us for the company raising $580 million less than a year later.

“With this fundraise, we’re going to explore the predictable scaling properties of machine learning systems, while closely examining the unpredictable ways in which capabilities and safety issues can emerge at-scale,” said Amodei in the announcement.

His sister Daniela, with whom he co-founded the public benefit corporation, said that having built out the company, “We’re focusing on ensuring Anthropic has the culture and governance to continue to responsibly explore and develop safe AI systems as we scale.”

There’s that word again — scale. Because that’s the problem category Anthropic was formed to examine: how to better understand the AI models increasingly in use in every industry as they grow beyond our ability to explain their logic and outcomes.

The company has already published several papers looking into, for example, reverse engineering the behavior of language models to understand why and how they produce the results they do. Something like GPT-3, probably the most well-known language model out there, is undeniably impressive, but there’s something worrying about the fact that its internal operations are essentially a mystery even to its creators.

As the new funding announcement explains it:

The purpose of this research is to develop the technical components necessary to build large-scale models which have better implicit safeguards and require less after-training interventions, as well as to develop the tools necessary to further look inside these models to be confident that the safeguards actually work.

If you don’t understand how an AI system works, you can only react when it does something wrong — say, exhibits bias in recognizing faces, or tending to draw or describe men when asked about doctors and CEOs. That behavior is baked into the model, and the solution is to filter its outputs rather than prevent it from having those incorrect “notions” in the first place.

It’s sort of a fundamental change to how AI is built and understood, and as such requires big brains and big computers — neither of which are particularly cheap. No doubt that $124 million was a good start, but apparently the early results were promising enough to make Sam Bankman-Fried lead this enormous new round, joined by Caroline Ellison, Jim McClave, Nishad Singh, Jaan Tallinn and the Center for Emerging Risk Research.

Interesting to see none of the usual deep tech investors in that group — but of course Anthropic isn’t aiming to turn a profit, which is kind of a deal-breaker for VCs.

You can keep up with Anthropic’s latest research here.

Topics