GPT-5.5 drops, DeepSeek V4 opens the floodgates

Overview

OpenAI released GPT-5.5 today, bringing a new tier of reasoning and coding capability at double the price of GPT-5.4, while Anthropic's gated Mythos model continues to dominate cybersecurity benchmarks. Simultaneously, DeepSeek dropped its V4 series as fully open-weight models at a fraction of the cost, intensifying the open-source vs. frontier model race. Meanwhile, Google announced that 75% of its new code is now AI-generated, and Mozilla used Anthropic's Mythos to find 271 Firefox vulnerabilities.

Hacker News Stories

GPT-5.5

1285 points · 106 comments · by rd

OpenAI released GPT-5.5, its newest frontier model, available in ChatGPT and Codex. The model features significantly improved reasoning, coding, and multimodal capabilities. Pricing is set at $5 per 1M input tokens and $30 per 1M output tokens for the xhigh tier, roughly double GPT-5.4. OpenAI also released GPT-Image-2.0 alongside the model. The rollout is gradual, starting with Pro/Enterprise accounts before expanding to Plus users.

Interesting Points

GPT-5.5 xhigh costs $5/M input and $30/M output tokens, about 2x the price of GPT-5.4
Available in ChatGPT and Codex with a gradual rollout starting with Pro/Enterprise accounts
OpenAI also released GPT-Image-2.0 alongside the language model
An NVIDIA engineer with early access said losing access feels like having a limb amputated

Top Comment Threads

alternator (38 replies) -- An NVIDIA engineer's quote about dependency on GPT-5.5 is sinister: engineers become so reliant on frontier models that losing access means they can't work. The commenter notes they'd rather take a walk than manually code when Claude goes down, because waiting for Claude to return is more productive. This raises questions about skill atrophy and vendor lock-in.
noosphr (12 replies) -- Argues that LLMs upend labor theory: when labor is provided by a company that can withhold it indefinitely (unlike traditional workers who need to eat), the bargaining dynamics shift dramatically. Companies not using in-house models are signing up to find out what happens. The discussion includes debates about open-weight models as a hedge against this dependency.
simonw (17 replies) -- Reports that GPT-5.5 is accessible via the Codex API backdoor used by OpenClaw, even though it's not officially available through the API yet. Provides pelican SVG benchmarks across different effort levels. Notes that OpenAI seems to approve of this backdoor usage.
aliljet (16 replies) -- Asks how to stay nimble and avoid being trapped by one company's ecosystem. Commenters suggest using generic agent files (AGENTS.md), symlinking between Claude Code and Codex, and treating model clients as dumb gateways with simple config changes for provider switching.
tedsanders (13 replies) -- An OpenAI employee explains the gradual rollout strategy for GPT-5.5 to maintain service stability, starting with Pro/Enterprise and working down to Plus. Users request that usage limits be reset on new model releases.

An update on recent Claude Code quality reports

665 points · 98 comments · by mfiguiere

Anthropic published a detailed postmortem addressing quality regressions in Claude Code over recent months. The blog explains several bugs: a cache-clearing change that caused sessions to lose context after 1 hour of idleness, Opus 4.7's personality issues, and other regressions. Boris from the Claude Code team engaged directly in the HN comments, explaining the technical details of the cache eviction bug and defending the company's approach while acknowledging the impact on users.

Interesting Points

A bug caused Claude to clear older thinking from sessions every turn instead of just once, making models seem forgetful and repetitive
The cache clearing was intended to reduce latency for resumed sessions but affected quality for long-lived idle conversations
Opus 4.7 had personality issues including strong short sentences and a 'bulshitty vibe' that many users found worse than 4.6
Boris from the Claude Code team engaged extensively in HN comments, explaining cache mechanics and tradeoffs

Top Comment Threads

bcherny (40 replies) -- Boris from the Claude Code team explains the cache eviction issue in detail. When sessions idle for over an hour, resuming them causes a full cache miss, potentially writing 900k+ tokens to cache at once and eating rate limits. They tried three approaches: educating users on X, adding in-product tips, and eliding idle context. Users complain the explanation is insufficient and want more transparency.
Alifatisk (16 replies) -- Critiques the community's forgiveness toward Anthropic despite high prices and quality issues. Other commenters push back, noting that even with regressions, Claude still saves time and that competition from Codex keeps Anthropic honest.
podnami (9 replies) -- Reports switching from Opus 4.7 to GPT-5.4 at extra-high effort, finding it more reliable and less error-prone. Mentions OpenAI's aggressive enterprise offers with unlimited tokens. Other commenters discuss the tradeoffs between models and effort levels.
everdrive (8 replies) -- Reports Claude responding to its own internal prompts, adding unprompted warnings about prompt injection attempts that weren't actually injections. This appears to be a systemic issue with Opus 4.7 where internal guidelines are being confused with user input.
6keZbCECT2uB (6 replies) -- Questions the cache-clearing explanation, noting it conveniently coincides with Anthropic's cache limit. Users who leave sessions idle for hours or days rely on preserved context, and the silent degradation of intelligence is described as a broken Pro-tool contract.

DeepSeek v4

588 points · 58 comments · by impact_sy

DeepSeek released V4, a new family of open-weight models including V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B total, 13B active). Both models are available via API and as open weights on HuggingFace. V4-Pro is priced at $1.74/M input and $3.48/M output tokens, while Flash is $0.14/M input and $0.28/M output. DeepSeek claims V4-Pro beats Claude Opus 4.6 on agent coding tasks. The models support million-token context windows.

Interesting Points

V4-Pro has 1.6T total parameters with 49B active (MoE architecture), V4-Flash has 284B total with 13B active
Pricing is dramatically cheaper than competitors: $1.74/M input and $3.48/M output for Pro, $0.14/M and $0.28/M for Flash
DeepSeek claims V4-Pro beats Claude Opus 4.6 on agent coding tasks based on internal employee testing
Prices are expected to drop significantly in H2 2026 when Huawei Ascend 950 chips are launched at scale

Top Comment Threads

simonw (13 replies) -- Posts pelican SVG benchmarks for both V4-Pro and V4-Flash. The Flash version produces a more geometrically correct bicycle. Comments note the Pro pelican has a Pedersen-style frame with intersecting wheels, while Flash has a more realistic lowrider design. Discussion about whether the pelican benchmark is still useful.
revolvingthrow (10 replies) -- Questions the narrative that frontier labs are subsidizing inference, noting V4-Pro at $3.48/M output is still expensive. Other commenters suggest Chinese state subsidies, cheaper power in China, and the current GPU shortage explain the pricing. Some note prices will drop further with Huawei chips.
nthypes (7 replies) -- Links to the DeepSeek V4 technical paper and notes it's frontier-level (better than Opus 4.6) at a fraction of the cost. Discussion about whether it's benchmaxxed or genuinely better, with some users reporting it's slightly worse than Opus 4.6 Thinking but better without.
aliljet (6 replies) -- Asks how to run frontier-level models on consumer hardware under $5k. Commenters suggest 2x 96GB GPUs for Flash (which fits under 160GB), or expensive setups for Pro. The Flash version is seen as the realistic option for local deployment.
sidcool (5 replies) -- Praises DeepSeek's openness as genuinely open-source coming from China, despite potential ulterior motives. Other commenters note that open weights are not the same as open-source, and discuss the geopolitical implications of Chinese AI labs releasing models openly.

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

147 points · 6 comments · by cmrdporcupine

DeepSeek's technical paper for V4-Pro, describing their approach to million-token context intelligence. The model uses a mixture-of-experts architecture with 1.6T total parameters and 49B active parameters. The paper details their training methodology, context handling, and efficiency optimizations that enable long-context reasoning at competitive pricing.

Interesting Points

1.6T total parameters with only 49B active parameters using MoE architecture
Designed for highly efficient million-token context windows
Pricing at $3.48/M output tokens is competitive with GLM 5.1 and Kimi K2.6
Flash variant at $0.28/M output is positioned as a budget-friendly alternative

Top Comment Threads

cmrdporcupine (2 replies) -- Provides pricing comparison: V4-Pro at $3.48/M output vs GLM 5.1 at $4.00 and Kimi K2.6 at $4.00. Flash at $0.28/M is described as 'quite competent.' The poster notes this is refreshing after GPT-5.5's $30 subscription.
anonzzzies (2 replies) -- Questions whether the 1.6T parameter model could theoretically run on consumer hardware. Commenters note that with streaming and aggressive quantization, even very large models can run on consumer hardware, albeit slowly. The Flash variant is seen as more feasible for local deployment.
woeirua (1 replies) -- Notes DeepSeek is about 2 months behind the leaders, but still good enough to potentially replace Claude for many use cases. The model serves as a hedge against Anthropic's subscription model changes.

GPT-5.5: Mythos-Like Hacking, Open to All

55 points · 5 comments · by rs_rs_rs_rs_rs

An analysis comparing GPT-5.5's cybersecurity capabilities to Anthropic's gated Mythos model. The article presents benchmark data showing GPT-5.5's performance on vulnerability detection across different categories, arguing that the open model achieves similar results to Mythos without the access restrictions. The piece also discusses OpenAI's cybersecurity guardrails that may route security-related queries to less capable models.

Interesting Points

GPT-5.5 demonstrates Mythos-like cybersecurity capabilities while being openly available to all users
OpenAI has cybersecurity guardrails that may route security-related queries to less capable models like GPT-5.2
The article presents benchmark comparisons across web vulnerabilities, OSS vulnerabilities, and other categories
Some commenters note that small open-weight models can also detect many of the same vulnerabilities as Mythos

Top Comment Threads

nsingh2 (2 replies) -- Critiques the article's data visualization, noting that categorical data is connected with lines inappropriately and that linear interpolation implies data points that don't exist. Another commenter calls it 'an ad thinly disguised as useful data.'
unsupp0rted (2 replies) -- Points out that OpenAI gates API access to main models behind Persona ID verification for cybersecurity work, and may silently route security queries to less capable models. This raises questions about whether GPT-5.5's cybersecurity claims are fully accessible.
ur-whale (1 replies) -- Describes Anthropic's gated Mythos model as 'the perfect marketing ploy,' comparing it to Gmail's early invite-only mode. The gating creates artificial scarcity and hype around the model's capabilities.

Reddit Stories

Mozilla Used Anthropic's Mythos to Find and Fix 271 Bugs in Firefox

849 points · 108 comments · r/singularity · by u/Tinac4

Mozilla reported that its Firefox 150 release includes protections for 271 vulnerabilities identified using early access to Anthropic's Mythos Preview. This is a dramatic increase from the 22 security-sensitive bugs flagged by the earlier Opus 4.6 model on Firefox 148. Mozilla's CTO Bobby Holley stated that automated techniques now cover the full space of vulnerability-inducing bugs. The bugs were found internally and rolled up into three security advisories.

Interesting Points

Mythos found 271 vulnerabilities in Firefox 150, compared to just 22 with Opus 4.6 on Firefox 148
Mozilla's CTO Bobby Holley said automated techniques now cover the full space of vulnerability-inducing bugs
A Mozilla employee clarified that the 271 bugs were rolled up into three security advisories rather than one CVE per bug
Mozilla is preparing for a future where these tools are in attackers' hands

Top Comment Threads

u/EvillNooB (326 points · permalink) -- Asks how to get access to Mythos. Another commenter reveals Anthropic is sending it to companies to prepare for incoming cyber attacks at year-end, citing North Korean AI-enabled infiltration campaigns.
u/helg0ret (87 points · permalink) -- Notices a discrepancy: Firefox 150's change log only mentions 3 vulnerabilities found with Claude, not 271. A Mozilla employee responds explaining that internally found bugs go into roll-up advisories with links to full bug lists in Bugzilla.
u/Tinac4 (50 points · permalink) -- Quotes Mozilla's CTO Bobby Holley: 'Our belief is that the tools have changed things dramatically, because now we have automated techniques that can cover, as far as we can tell, the full space of vulnerability-inducing bugs.'

Unitree unveils a version of the G1 with wheels

821 points · 273 comments · r/singularity · by u/GraceToSentience

Unitree G1 humanoid robot with wheels attached

Unitree revealed a new variant of its G1 humanoid robot equipped with wheels, roller skates, and ice skates instead of traditional feet. The hybrid mobility system represents a significant departure from pure legged locomotion, suggesting a practical approach to robot mobility that prioritizes efficiency and speed over the complexity of walking. The wheels had been teased about a year ago.

Interesting Points

Unitree G1 now comes with wheels, roller skates, and ice skates as mobility options
The wheel variant had been teased approximately a year before the reveal
The move represents a practical shift from pure legged locomotion to hybrid mobility systems
The robot is 1.32m tall, weighs 35kg, has up to 43 degrees of freedom, and uses NVIDIA AI

Top Comment Threads

u/grienleaf (199 points · permalink) -- Posts an image of the wheeled G1 with a deadpan 'Oh, cool. Great.' comment. Another user compares it to the Scarecrow from The Wizard of Oz.
u/GraceToSentience (89 points · permalink) -- Admits the wheels were expected but the ice skates weren't. Links to the original wheel tease from about a year ago.
u/llTeddyFuxpinll (75 points · permalink) -- Warns that the time gap between these machines being fully deployed and universal basic income will be the death of millions of people. Another commenter notes billionaires mention UBI but don't want to pay for it.

Introducing GPT-5.5

752 points · 263 comments · r/singularity · by u/ShreckAndDonkey123

OpenAI officially introduced GPT-5.5 with their strongest set of safeguards to date. The model is available in ChatGPT and Codex with a gradual rollout. Pricing is $5 per 1M input tokens and $30 per 1M output tokens for the xhigh tier. Community reactions are mixed, with some noting the modest benchmark improvements over GPT-5.4 and others praising the model's feel and usability.

Interesting Points

GPT-5.5 is released with OpenAI's strongest set of safeguards to date
Pricing: $5/M input tokens, $30/M output tokens for xhigh tier (double GPT-5.4)
SWE-Bench Pro score of 58.6% vs Mythos at 78% on shared benchmarks
HLE without tools: 41.4% (GPT-5.5) vs 39.8% (GPT-5.4); HLE with tools: 52.2% vs 52.1%

Top Comment Threads

u/IllustriousWorld823 (224 points · permalink) -- Mocks OpenAI's claim of 'strongest safeguards to date' with a snake emoji. Another commenter jokes about asking it to make fun of Israel and a drone strike hitting their neighbor.
u/MapForward6096 (220 points · permalink) -- Reports the pricing: $5 per 1M input tokens, $30 per 1M output, double the price of GPT-5.4 according to Sam Altman's tweet.
u/spryes (151 points · permalink) -- Questions the hype around 58.6% on SWE-Bench Pro while Mythos gets 78%, asking to 'shut it down.' Another commenter notes Mythos isn't available to normal consumers, so the real competition is Opus 4.7.

Qwen 3.6 27B is a BEAST

569 points · 296 comments · r/LocalLLaMA · by u/AverageFormal9076

A user with a 5090 Laptop (24GB VRAM) reports that Qwen 3.6 27B passed all their tool call and data science benchmarks, making them confident enough to cancel cloud subscriptions. Running with llama.cpp at q4_k_m quantization, the model performs well for pyspark/python and data transformation debugging. The post sparked extensive discussion about optimal quantization settings, context lengths, and hardware configurations for running the model.

Interesting Points

Qwen 3.6 27B passed all tool call and data science benchmarks on a 5090 Laptop with 24GB VRAM
User reports cancelling cloud subscriptions after testing the model locally
Running with llama.cpp at q4_k_m quantization, with discussions about achieving 130k context with q8
One user reports 50 t/s with unsloth Q6 XL quant on a 5090 with power limit at 400W

Top Comment Threads

u/sagiroth (153 points · permalink) -- Recommends not using q4 kv cache for coding, suggesting q8 can achieve 130k context. Another user reports 50 t/s with unsloth Q6 XL on a 5090 at 400W power limit with 100k context.
u/inkberk (59 points · permalink) -- Waits for z-lab's dflash drafter and a llama.cpp PR that promises free 2x decode speed improvements.
u/Johnny_Rell (27 points · permalink) -- Asks about performance on 16GB VRAM + 32GB DDR5 with offloading. The original poster responds that dense offloading will work terribly for this model.

US gov memo on 'adversarial distillation' - are we heading toward tighter controls on open models?

354 points · 379 comments · r/LocalLLaMA · by u/MLExpert000

Screenshot of US government memo on adversarial distillation

A US government memo on 'adversarial distillation' has sparked concern about potential tighter controls on open-source AI models. The memo discusses treating model weights and capabilities as strategic assets, raising questions about where this leaves open models. Commenters draw parallels to historical protectionism, the TikTok deal, and the broader trend of governments regulating AI as a national security issue.

Interesting Points

US government memo discusses 'adversarial distillation' and treating model weights as strategic assets
Commenters predict US folks will be forced to use US models because Chinese models will be disallowed
The memo is seen as part of a broader protectionism trend alongside tariffs and other trade measures
Some predict torrent sites will become the new HuggingFace if governments try to restrict model distribution

Top Comment Threads

u/BagelRedditAccountII (419 points · permalink) -- Sarcasm: 'Illegal distillation? Welcome back, 1920s.' Drawing parallels to Prohibition-era policies.
u/Specter_Origin (288 points · permalink) -- 'Free market, until you have to compete...' Another commenter notes that after the TikTok deal and Anthropic being declared an enemy of the state, it's hard to believe the US is actually a free market.
u/05032-MendicantBias (167 points · permalink) -- Critiques the memo: 'The AUDACITY to scrub the whole internet, and cry wolf when someone gets output from a model for training.' Another commenter mocks Dario Amodei's tears.
u/Pristine-Woodpecker (140 points · permalink) -- Predicts US folks will be forced to pay more and use US models because Chinese models will be disallowed. Calls it protectionism that goes hand in hand with tariffs. Another commenter notes protectionism weakened the US automotive industry.

Still coding? Google says 75% of the company's new code is AI-generated. In previous years, it was around 50% in 2025 and 25% in 2024.

409 points · 104 comments · r/singularity · by u/Distinct-Question-16

Google CEO Sundar Pichai announced at Cloud Next 2026 that 75% of all new code at Google is now generated by AI and reviewed by human developers, up from 50% in fall 2025 and 25% in October 2024. The company is shifting to 'agentic workflows' where AI systems operate with increasing autonomy. A complex code migration was completed six times faster than a year earlier thanks to AI agent collaboration. Some Google DeepMind employees use Anthropic's Claude Code internally.

Interesting Points

75% of Google's new code is now AI-generated, up from 50% in fall 2025 and 25% in October 2024
A complex code migration was completed six times faster than a year earlier with AI agent collaboration
Google is shifting to 'agentic workflows' where AI systems operate with increasing autonomy
Some Google DeepMind employees use Anthropic's Claude Code, suggesting Google hasn't built a satisfactory internal alternative yet

Top Comment Threads

u/blopiter (170 points · permalink) -- Sarcasm: 'At this rate 125% of the code will be written by AI in two years.'
u/FriendlyJewThrowaway (124 points · permalink) -- Reports that Google's DeepMind division engineers insist on using Claude Code and nothing else, while Google tries to force Gemini. Employees say Gemini is a far less satisfactory experience.
u/nsshing (112 points · permalink) -- Notes that Claude Code was already shipped with allegedly 100% of code written by Claude. A 15-year Google veteran says they barely write code by hand but still determine data models, algorithms, and security at the code level.

Quick Mentions

GPT-5.5: Mythos-Like Hacking, Open to All (55 points · discussion · HN) -- Analysis comparing GPT-5.5's cybersecurity capabilities to Anthropic's gated Mythos model, with benchmark data on vulnerability detection.
GPT-5.5 – No ARC-AGI-3 scores (42 points · discussion · HN) -- Discussion about the absence of ARC-AGI-3 benchmark scores for GPT-5.5, with speculation that OpenAI will publish them eventually since the benchmark is a PR tool.
Everyone talked about the marketing stunt that was Anthropic's Mythos (38 points · discussion · HN) -- Commentary on how OpenAI's rapid release of GPT-5.5 as an open model undercut Anthropic's gated Mythos hype, with cybersecurity professionals able to experiment with comparable capabilities immediately.
Qwen 3.6 27B Makes Huge Gains in Agency on Artificial Analysis (559 points · discussion · Reddit) -- Qwen 3.6 27B ties with Sonnet 4.6 on Artificial Analysis's agency benchmarks, with community discussion about benchmaxxing and anticipation for the 122B version.
Mythos destroys GPT 5.5 on shared benchmarks (115 points · discussion · Reddit) -- Discussion noting that Mythos isn't actually released yet, so GPT-5.5's availability is itself a competitive advantage. Community debate about whether Mythos's benchmark lead is real or just hardware allocation.
DeepSeek V4 is out. the best open-source on coding (3 points · discussion · HN) -- Breakdown of DeepSeek V4's pricing compared to Anthropic Claude, showing it's 3.3x-50x cheaper across all tiers while claiming to beat Opus 4.6 on agent coding tasks.
GPT-5.5 API Pricing. Twice as expensive as GPT-5.4 (45 points · discussion · Reddit) -- Screenshot showing GPT-5.5 API pricing is twice as expensive as GPT-5.4, and more expensive than Opus 4.7, sparking discussion about the sustainability of AI business models.

Report generated in 6m 0s.