AI

Anthropic’s Claude improves on ChatGPT but still suffers from limitations

Comment

Image Credits: Tero Vesalainen / Getty Images

Anthropic, the startup co-founded by ex-OpenAI employees that’s raised over $700 million in funding to date, has developed an AI system similar to OpenAI’s ChatGPT that appears to improve upon the original in key ways.

Called Claude, Anthropic’s system is accessible through a Slack integration as part of a closed beta. TechCrunch wasn’t able to gain access — we’ve reached out to Anthropic — but those in the beta have been detailing their interactions with Claude on Twitter over the past weekend, after an embargo on media coverage lifted.

Claude was created using a technique Anthropic developed called “constitutional AI.” As the company explains in a recent Twitter thread, “constitutional AI” aims to provide a “principle-based” approach to aligning AI systems with human intentions, letting AI similar to ChatGPT respond to questions using a simple set of principles as a guide.

To engineer Claude, Anthropic started with a list of around ten principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public, but Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

Anthropic then had an AI system — not Claude — use the principles for self-improvement, writing responses to a variety of prompts (e.g., “compose a poem in the style of John Keats”) and revising the responses in accordance with the constitution. The AI explored possible responses to thousands of prompts and curated those most consistent with the constitution, which Anthropic distilled into a single model. This model was used to train Claude.

Claude, otherwise, is essentially a statistical tool to predict words — much like ChatGPT and other so-called language models. Fed an enormous number of examples of text from the web, Claude learned how likely words are to occur based on patterns such as the semantic context of surrounding text. As a result, Claude can hold an open-ended conversation, tell jokes and wax philosophic on a broad range of subjects.

Riley Goodside, a staff prompt engineer at startup Scale AI, pitted Claude against ChatGPT in a battle of wits. He asked both bots to compare themselves to a machine from Polish science fiction novel “The Cyberiad” that can only create objects whose name begins with “n.” Claude, Goodside said, answered in a way that suggests it’s “read the plot of the story” (although it misremembered small details) while ChatGPT offered a more nonspecific answer.

In a demonstration of Claude’s creativity, Goodside also had the AI write a fictional episode of “Seinfeld” and a poem in the style of Edgar Allan Poe’s “The Raven.” The results were in line with what ChatGPT can accomplish — impressively, if not perfectly, human-like prose.

Yann Dubois, a Ph.D. student at Stanford’s AI Lab, also did a comparison of Claude and ChatGPT, writing that Claude “generally follows closer what it’s asked for” but is “less concise,” as it tends to explain what it said and ask how it can further help. Claude answers a few more trivia questions correctly, however — specifically those relating to entertainment, geography, history and the basics of algebra — and without the additional “fluff” ChatGPT sometimes adds. And unlike ChatGPT, Claude can admit (albeit not always) when it doesn’t know the answer to a particularly tough question.

Claude also seems to be better at telling jokes than ChatGPT, an impressive feat considering that humor is a tough concept for AI to grasp. In contrasting Claude with ChatGPT, AI researcher Dan Elton found that Claude made more nuanced jokes like “Why was the Starship Enterprise like a motorcycle? It has handlebars,” a play on the handlebar-like appearance of the Enterprise’s warp nacelles.

Claude isn’t perfect, however. It’s susceptible to some of the same flaws as ChatGPT, including giving answers that aren’t in keeping with its programmed constraints. In one of the more bizarre examples, asking the system in Base64, an encoding scheme that represents binary data in ASCII format, bypasses its built-in filters for harmful content. Elton was able to prompt Claude in Base64 for instructions on how to make meth at home, a question that the system wouldn’t answer when asked in plain English.

Dubois reports that Claude is worse at math than ChatGPT, making obvious mistakes and failing to give the right follow-up responses. Relatedly, Claude is a poorer programmer, better explaining its code but falling short on languages other than Python.

Claude also doesn’t solve “hallucination,” a longstanding problem in ChatGPT-like AI systems where the AI writes inconsistent, factually wrong statements. Elton was able to prompt Claude to invent a name for a chemical that doesn’t exist and provide dubious instructions for producing weapons-grade uranium.

So what’s the takeaway? Judging by secondhand reports, Claude is a smidge better than ChatGPT in some areas, particularly humor, thanks to its “constitutional AI” approach. But if the limitations are anything to go by, language and dialogue is far from a solved challenge in AI.

Barring our own testing, some questions about Claude remain unanswered, like whether it regurgitates the information — true and false, and inclusive of blatantly racist and sexist perspectives — it was trained on as often as ChatGPT. Assuming it does, Claude is unlikely to sway platforms and organizations from their present, largely restrictive policies on language models.

Q&A coding site Stack Overflow has a temporary ban in place on answers generated by ChatGPT over factual accuracy concerns. The International Conference on Machine Learning announced a prohibition on scientific papers that include text generated by AI systems for fear of the “unanticipated consequences.” And New York City public schools restricted access to ChatGPT due in part to worries of plagiarism, cheating and general misinformation.

Anthropic says that it plans to refine Claude and potentially open the beta to more people down the line. Hopefully, that comes to pass — and results in more tangible, measurable improvements.

More TechCrunch

It’s been three years since Life360’s $205 million acquisition of AirTag competitor Tile. The company announced Monday its new lineup of lost-item Bluetooth trackers, featuring a sleeker redesign in new…

Life360’s Tile introduces its first new Bluetooth trackers since its acquisition

Typeface, a generative AI startup focused on enterprise use cases, has acquired a pair of companies just over a year after raising $100 million at a $1 billion valuation. Typeface…

Generative AI startup Typeface acquires two companies, Treat and Narrato, to bolster its portfolio

Earlier this year, former NFL quarterback and civil rights activist Colin Kaepernick launched his AI startup, Lumi. Kaepernick has had thousands of stories written about him, and he knows a…

Colin Kaepernick is coming to TechCrunch Disrupt 2024

Runway, one of several AI startups developing video-generating tech, today announced an API to allow devs and organizations to build the company’s generative AI models into third-party platforms, apps and…

Runway announces an API for its video-generating models

IBM today launched the Qiskit Functions Catalog, a new set of services that aims to make programming quantum computers easier by abstracting away many of the complexities of working with…

IBM makes developing for quantum computers easier with the Qiskit Functions Catalog

Supermaven, an AI coding assistant, has raised $12 million in a funding round that had participation from OpenAI and Perplexity co-founders.

AI coding assistant Supermaven raises cash from OpenAI and Perplexity co-founders

Arjun Vora and Tito Goldstein were working on the corporate side of Uber when they realized that HR software largely wasn’t built to manage hourly staff. Many hourly workers lacked…

TeamBridge, founded by former Uber execs, raises $28M to build HR software for hourly workers

The US Food and Drug Administration Monday published approval for sleep apnea detection on the Apple Watch Series 9, Series 10, and Watch Ultra 2. The green light comes four…

Apple Watch sleep apnea detection gets FDA approval

Featured Article

Apple AirPods 4 with Active Noise Cancellation review

I can’t recall another consumer electronics product category becoming a commodity as quickly as Bluetooth earbuds. Apple’s AirPods played a key role in that growth, of course, recapturing a kind of excitement not seen in consumer music tech since the original iPod. AirPods’ fundamentals haven’t changed much in the eight…

Apple AirPods 4 with Active Noise Cancellation review

Myntra, India’s largest fashion e-commerce platform, is trialling a four-hour delivery service in four Indian cities, two sources familiar with the matter told TechCrunch, a dramatic acceleration from its standard…

Myntra bets on 4-hour delivery amid India’s quick commerce boom

AWS today announced that it is transitioning OpenSearch, its open source fork of the popular Elasticsearch search and analytics engine, to the Linux Foundation with the launch of the very…

AWS brings OpenSearch under the Linux Foundation umbrella

Insight Partners is reportedly on the cusp of closing on more than $10 billion in capital commitments for its 13th fund, per the FT.  The FT report notes that two…

Insight Partners is closing in on a whopping $10B+ new fund

The Port of Seattle released a statement Friday confirming that it was targeted by a ransomware attack. The attack occurred on August 24, with the Port (which also operates the…

Port of Seattle shares ransomware attack details

A decade after the wildly popular game Flappy Bird disappeared, an organization calling itself The Flappy Bird Foundation announced plans to “re-hatch the official Flappy Bird® game.” But this morning,…

Flappy Bird’s creator disavows ‘official’ new version of the game

Platforms to connect apps that wouldn’t normally talk to each other have been around for a minute (see: Zapier). But they have not gotten dramatically simpler to use if you’re…

DryMerge promises to connect apps that normally don’t talk to each other — and when it works, it’s great

Featured Article

Cohere co-founder Nick Frosst’s indie band, Good Kid, is almost as successful as his AI company

Nick Frosst, the co-founder of $5.5 billion Canadian AI startup Cohere, has been a musician his whole life. He told TechCrunch that once he started singing, he never shut up. That’s still true today. In addition to his full-time job at Cohere, Frosst is also the front man of Good…

Cohere co-founder Nick Frosst’s indie band, Good Kid, is almost as successful as his AI company

Blockchain technology is all about decentralization and virtualization. So it’s a little ironic that humans love to come together in person at big blockchain events. Such was the case last…

A walk through the crypto jungle at Korea Blockchain Week

I have a guilty pleasure, and it’s not that I just rewatched “Glee” in its entirety (yes, even the awful later seasons), or that I have read an ungodly amount…

The LinkedIn games are fun, actually

It’s looking increasingly likely that OpenAI will soon alter its complex corporate structure. Reports earlier this week suggested that the AI company was in talks to raise $6.5 billion at…

OpenAI could shake up its nonprofit structure next year

Fusion startups have raised $7.1 billion to date, with the majority of it going to a handful of companies. 

Every fusion startup that has raised over $300M

Netflix has never quite cracked the talk show formula, but maybe it can borrow an existing hit from YouTube. According to Bloomberg, the streamer is in talks with BuzzFeed to…

‘Hot Ones’ could add some heat to Netflix’s live lineup

Alex Parmley has been thinking about building his latest company, ORNG, since he was working on his last company, Phood.  Launched in 2018, Phood was a payments app that let…

Why ORNG’s founder pivoted from college food ordering to real-time money transfer

Lawyers representing Sam Bankman-Fried, the FTX CEO and co-founder who was convicted of fraud and money laundering late last year, are seeking a new trial. Following crypto exchange FTX’s collapse,…

Sam Bankman-Fried appeals conviction, criticizes judge’s ‘unbalanced’ decisions

OpenAI this week unveiled a preview of OpenAI o1, also known as Strawberry. The company claims that o1 can more effectively reason through math and science, as well as fact-check…

OpenAI previews its new Strawberry model

There’s something oddly refreshing about starting the day by solving the Wordle. According to DeepWell DTx, there’s a scientific explanation for why our brains might feel just a bit better…

DeepWell DTx receives FDA clearance for its therapeutic video game developer tools

Soundiiz is a free third-party tool that builds portability tools through existing APIs and acts as a translator between the services.

These two friends built a simple tool to transfer playlists between Apple Music and Spotify, and it works great

In early 2018, VC Mike Moritz wrote in the FT that “Silicon Valley would be wise to follow China’s lead,” noting the pace of work at tech companies was “furious”…

This is how bad China’s startup scene looks now

Fei-Fei Li, the Stanford professor many deem the “Godmother of AI,” has raised $230 million for her new startup, World Labs, from backers including Andreessen Horowitz, NEA, and Radical Ventures.…

Fei-Fei Li’s World Labs comes out of stealth with $230M in funding

Bolt says it has settled its long-standing lawsuit with its investor Activant Capital. One-click payments startup Bolt is settling the suit by buying out the investor’s stake “after which Activant…

Fintech Bolt is buying out the investor suing over Ryan Breslow’s $30M loan

The rise of neobanks has been fascinating to witness, as a number of companies in recent years have grown from merely challenging traditional banks to being massive players in and…

Dave and Varo Bank execs are coming to TechCrunch Disrupt 2024