# The master Claude Projects playbook, March 2026 Claude Projects has evolved from a paid-only feature into the central hub for configuring Claude across every plan tier, now supporting custom skills, multi-agent research, connectors to 50+ external tools, and a memory system that learns from your conversations. **This playbook covers the full configuration surface as of March 6, 2026** — from project creation and Opus 4.6 model selection through skill authoring, vibe coding integration, memory design, and session optimization. The most significant recent changes include the February free-tier expansion, Opus 4.6 and Sonnet 4.6 releases, the March 3 skill-creator evaluation update, and the memory import tool launch. --- ## Section 1: How Claude Projects work right now Projects are self-contained workspaces on claude.ai, each with its own chat history, knowledge base, custom instructions, and memory scope. They exist to give Claude persistent context without requiring you to re-explain your domain every conversation. **Creation and configuration flow.** Navigate to claude.ai/projects or hover over the left sidebar, click "+ New Project," enter a name and description. A critical detail often missed: **Claude does not see the project name or description** — these exist purely for your own organization. On Team and Enterprise plans, you can set visibility (private vs. organization-wide). Once created, the knowledge base panel appears on the right side. Click "+" to upload documents, code files, or text snippets. You can also move standalone chats into a project via the dropdown arrow → "Add to project," and star projects for quick sidebar access. Each chat within a project is independent — context does not flow between chats unless information lives in the knowledge base. [Source type: Official Anthropic Help Center] **Project instructions** are per-project custom directives that Claude follows in every chat within that project. Set them via the project configuration page. Unlike the name and description, **instructions are visible to Claude** as system-level context. Anthropic's guidance is to keep instructions concise and focused on essentials — general project context, key guidelines, and Claude's role. Task-specific details belong in individual chats. Instructions interact with two other personalization layers: profile preferences (account-wide) and styles (formatting and delivery). These three layers can be combined or used independently. [Source type: Official Anthropic docs] **Knowledge base files** support a broad range of formats: PDF, DOCX, TXT, RTF, ODT, HTML, EPUB, JSON, CSV, XLSX (requires Analysis Tool), common code file extensions (.py, .js, .ts, .html, .css), images (JPEG, PNG, GIF, WebP up to 8,000×8,000 pixels), and audio (MP3, WAV in project workspaces). The hard limit is **30 MB per file** with unlimited files. The effective constraint is the **200,000-token context window** — a visual indicator shows knowledge base capacity usage as a percentage. On paid plans, when the knowledge base approaches the context limit, Claude **automatically activates RAG mode**, expanding effective capacity up to **10x** by retrieving only relevant sections rather than loading everything. This transition is seamless and requires no configuration. Well-named files improve RAG retrieval accuracy. Both instructions and knowledge base files consume context window tokens, and project content is cached so reuse doesn't count against usage limits. [Source type: Official Anthropic docs] ### Key limitations to know about The **200K-token context window** is shared across instructions, files, and conversation history (500K on Enterprise with Sonnet 4.5). Projects are fully isolated from each other — no cross-project access. There is no cross-chat context sharing within a project unless information is in the knowledge base. The 30 MB file limit is lower than competitors (ChatGPT allows 512 MB). Some formats like .rst require renaming (e.g., .rst.txt). PDFs over 100 pages may only get text extraction. No folder uploads exist — files must be uploaded individually. Free users cannot access RAG mode, making the context window a hard cap. [Source type: Multiple official and community sources] ### What changed in January–March 2026 The most consequential change: **Projects became available to all users including free accounts** in February 2026. Free users get up to 5 projects. Previously, Projects were paid-plan-only. This expansion also opened artifacts, file creation, and app connectors to free users. Other milestones: Claude Cowork launched January 12 (initially Max-only, expanded to all Pro subscribers by January 16), interactive connectors announced January 26, **Opus 4.6 released February 5**, **Sonnet 4.6 released February 17** with a 1M-token context window in beta, Enterprise Cowork with "Deep Connectors" launched February 24, and the memory import tool launched around March 3 alongside memory expansion to free users. [Source type: Anthropic release notes, TechRadar, Dataconomy] ⚠️ **Contradiction flag:** Some older sources (gamsgo.com and similar) still state "Free users cannot create or save projects." This is outdated. Official Anthropic documentation and multiple independent reports confirm free users now have Projects (up to 5). ### Pricing and tier access | Feature | Free | Pro ($20/mo) | Max 5x ($100/mo) | Max 20x ($200/mo) | Team ($25–30/seat) | Enterprise | |---|---|---|---|---|---|---| | Projects | ✅ (max 5) | ✅ Unlimited | ✅ Unlimited | ✅ Unlimited | ✅ + sharing | ✅ + sharing | | RAG for projects | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | | Memory | ✅ (new) | ✅ | ✅ | ✅ | ✅ | ✅ (admin-controlled) | | Opus model access | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | | Context window | 200K | 200K | 200K | 200K | 200K | 200K (500K Sonnet 4.5) | | Usage vs. Pro | ~1/5 | 1x | 5x | 20x | 1.25x–6.25x | Custom | | Project sharing | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | Pro is $20/month ($17–18 annual). Max is monthly-only. Team Standard seats are ~$25–30/seat/month with Premium seats available for power users. Enterprise is now available self-serve as of February 2026. [Source type: Official Anthropic docs, independent comparison sites] --- ## Section 2: Model selection and feature configuration ### Opus 4.6 arrived February 5 as Anthropic's new flagship Claude Opus 4.6 is a hybrid reasoning model — it can answer normally or use extended thinking, toggled per-request. Its API model ID is `claude-opus-4-6` (pinned: `claude-opus-4-6-20260205`). API pricing is **$5/million input tokens, $25/million output** with up to 90% savings via prompt caching. The context window is 200K standard with a **1M-token beta** (requiring the `context-1m-2025-08-07` beta header), and the maximum output is **128K tokens** — doubled from 64K in prior Opus models. To select Opus 4.6 in Projects, click the **model selector dropdown** at the bottom of the chat interface and choose Opus 4.6. It requires a paid plan (Pro, Max, Team, or Enterprise). Free users are limited to Sonnet 4.5. In Claude Code, use `/model opus` to toggle. Opus 4 and 4.1 have been deprecated and removed — users pinned to them are auto-migrated. Benchmark performance is substantial: **65.4% on Terminal-Bench 2.0** (industry-leading agentic coding), **72.7% on OSWorld** (best computer-use model), leading scores on Humanity's Last Exam and BrowseComp, **90.2% on BigLaw Bench** (highest of any Claude model), and METR time horizons of 50% at 14 hours 30 minutes. In a cybersecurity test, it found **22 security vulnerabilities in Firefox** (14 high-severity) and wrote a successful browser exploit with minimal hand-holding. An agent teams preview built a 100K-line C compiler from scratch. For most production workloads (~80%), **Sonnet 4.6 at cheaper pricing remains optimal**. Opus 4.6 is justified for massive codebases, dense legal/financial documents, zero-tolerance pipelines, long-horizon autonomous tasks, and 1M context needs. [Source type: Official Anthropic blog, AWS, benchmark databases, independent reviews] ### Extended thinking shifted to adaptive mode The most important architectural change with Opus 4.6 is the move from binary extended thinking to **Adaptive Thinking** (`thinking: {type: "adaptive"}`). Claude now dynamically decides when and how much to think based on query complexity, controlled by an **effort parameter** with four levels: `low`, `medium`, `high` (default for Opus 4.6), and `max`. No beta header is required. Interleaved thinking (between tool calls) is automatically enabled. In internal evaluations, Anthropic found that **adaptive thinking reliably outperforms manual extended thinking**. Manual extended thinking (`thinking: {type: "enabled", budget_tokens: N}`) is deprecated on Opus 4.6 but still functional on Sonnet 4.6 and earlier models. The minimum budget is 1,024 tokens; diminishing returns set in above 32K for most tasks. To toggle in the claude.ai interface: click the "Search and tools" button (lower left), then switch the "Extended thinking" toggle. **Toggling mid-conversation starts a new chat.** In Claude Code, keywords like "think," "megathink," and "ultrathink" trigger different thinking depths (these only work in the CLI, not web or API). **When extended thinking helps:** complex multi-step reasoning, architecture decisions, debugging, security audits, legal/financial analysis, multi-step agentic workflows. **When it hurts:** community research documents **up to 36% performance degradation on certain "intuitive" task types**, analogous to humans overthinking simple problems. Anthropic's official guidance: "If you're finding that the model is overthinking, we recommend dialing effort down from high to medium." ⚠️ **Contradiction flag:** The 36% degradation figure comes from a community GitHub guide, not official Anthropic documentation. Anthropic's language is more measured: "can add cost and latency on simpler ones." The exact magnitude is not officially quantified. [Source type: Official Anthropic API docs, Claude Help Center, community research] ### Web search now features dynamic filtering As of the February 17 Sonnet 4.6 launch, web search includes a **dynamic filtering** upgrade (`web_search_20260209`): Claude writes and executes code to post-process and filter search results before they enter the context window, improving accuracy and reducing token waste. Web search is available on all plans including free. When enabled, Claude can also fetch full content from specific URLs and return images directly in conversations. API code execution is free when used with web search. To enable on claude.ai: click the "+" button or slider icon in the chat input, locate "Web search," toggle on. On Team/Enterprise, an Owner must first enable web search at the workspace level. Current limitations include high token intensity (a single query can consume significant context), the model deciding autonomously when to search (steerable via prompt instructions), and searches powered by an undisclosed third-party provider. [Source type: Official Anthropic API docs, Claude Help Center] ### Research mode operates as a multi-agent system Research goes far beyond basic web search. Claude operates as a multi-agent system: a lead agent orchestrates several sub-agents running in parallel, each exploring a different part of the problem space. Sub-agent results flow back for synthesis. A dedicated citations agent ensures proper sourcing. Extended thinking activates automatically. Typical runs take **5–15 minutes standard, up to 45 minutes for deep investigation**, accessing **hundreds of sources** (one benchmark test: 261 sources in ~6 minutes). Research is available on paid plans only. **Web search must be toggled on** for Research to function. Invoke it by clicking the "Research" button on the bottom left of the chat interface. It integrates with connectors — when connected, Research can pull from Gmail, Google Calendar, and Google Docs. Best practice: be highly specific in research prompts, define parameters, constraints, and output format. Enterprise users with Google Drive Cataloging get RAG-based retrieval across their document ecosystem. [Source type: Official Anthropic blog, Claude Help Center, ByteByteGo architecture analysis] ### Connectors now span 50+ integrations Connectors use the **Model Context Protocol (MCP)** standard. Free users get access to directory connectors (Notion, Slack, Google Workspace); Pro and above add custom MCP connectors and interactive apps. Setup: go to Settings → Connectors, browse the directory (50+ integrations, new ones added weekly), click "+" next to the desired connector, complete OAuth. **Google Drive integration** allows searching and retrieving files (Docs, Sheets, Slides, PDFs, plain text), adding Docs to chats by pasting URLs, viewing permissions, and saving Claude-generated files directly to Drive. Google Docs added to projects sync directly from Drive (always the latest version). Limitations: text extraction only (embedded images are not processed), the connector for Projects is only available in private projects (disabled for shared ones), and Google Drive Cataloging (RAG-based indexing) is Enterprise-only. Other notable connectors include Slack (interactive since January 26), Asana and Figma (interactive since January 2026), Linear, Jira, Monday.com, Notion, GitHub, Stripe, and Canva. Paid plans can add unlimited custom connectors via remote MCP server URLs. [Source type: Official Claude Help Center, independent connector guides] --- ## Section 3: Custom skills deserve your attention ### The progressive disclosure architecture powers skills Skills, launched October 16, 2025, are organized folders of instructions, scripts, and resources that Claude discovers and loads dynamically. Available to Pro, Max, Team, and Enterprise users across claude.ai, Claude Code, and the API. The architecture operates across three levels: **Level 1 — Metadata.** The name and description from YAML frontmatter are pre-loaded into Claude's system prompt at startup (~100 words per skill). This is always in context and is what Claude uses to decide when a skill should activate. **Level 2 — SKILL.md body.** Read only when the skill is deemed relevant, loaded into context on-demand. **Level 3 — Bundled resources** (scripts/, references/, assets/). Loaded or executed only as needed. Scripts can be executed without loading their code into context — only the output enters the context window. A key technical insight: **the skill selection mechanism has no algorithmic routing or intent classification**. No regex, no keyword matching, no ML-based intent detection. The system formats all available skills into text embedded in the Skill tool's prompt, and Claude's language model makes the selection decision purely through its forward pass. [Source type: Official Anthropic engineering blog, platform docs] ### SKILL.md format and YAML frontmatter requirements Every skill requires a `SKILL.md` file at the root of a folder structure: ``` skill-name/ ├── SKILL.md (required) │ ├── YAML frontmatter (between --- markers) │ └── Markdown instructions body └── Bundled Resources (optional) ├── scripts/ Executable code for deterministic tasks ├── references/ Docs loaded into context as needed └── assets/ Files used in output (templates, icons) ``` The **name** field has a maximum of **64 characters**, must use kebab-case (`[a-z0-9-]+`), cannot start or end with a hyphen, and cannot contain "anthropic" or "claude." This becomes the `/slash-command` in Claude Code. ⚠️ **Contradiction flag on description length:** The official best practices page at platform.claude.com states a maximum of **1,024 characters**, and the validation script (`quick_validate.py`) enforces this limit. However, the Claude Help Center article "How to create custom Skills" states **200 characters maximum**. The user's task references 200 characters. This discrepancy likely reflects different contexts (Help Center guidance for claude.ai uploads vs. API/Claude Code spec). Recommend treating 1,024 as the technical limit and 200 as a best-practice target for conciseness. Optional frontmatter fields include `compatibility` (required tools), `allowed-tools` (limits available tools), `context: fork` (for subagent execution), `agent` (subagent type: Explore, Plan, or custom), and `disable-model-invocation: true` (skill can only be invoked via /slash-command). Packaging for upload: compress the skill folder as a `.zip` file, then upload via Settings → Customize → Skills → "+" button. [Source type: Official Anthropic docs, GitHub repo, DeepWiki] ### The March 3 skill-creator update adds evaluation and benchmarking The skill-creator — a meta-skill for creating skills — received a major update on **March 3, 2026**, confirmed by an official Anthropic blog post titled "Improving skill-creator: Test, measure, and refine Agent Skills." The update introduces six capabilities: **Evals (automated testing).** Authors define test prompts and describe expected outputs. The skill-creator runs prompts through Claude with the skill loaded, reporting pass rate, elapsed time, and token usage. Test cases save as JSON in `evals/evals.json`. **Benchmark mode** provides standardized assessment using eval sets, tracking pass rate, time, and tokens with mean ± standard deviation and delta comparisons. **Multi-agent parallel evaluation** runs evals in parallel with independent agents, each in clean context, eliminating context bleed. **Comparator agents (blind A/B testing)** evaluate two skill versions without knowing which output came from which, removing subjective bias. The most innovative addition is **description optimization.** The system generates 20 eval queries (mixing should-trigger and should-not-trigger), splits 60/40 train/test, evaluates the current description (running each query 3x for reliable trigger rate), uses Claude with extended thinking to propose improvements, and iterates up to 5 times selecting the best by test score. Anthropic tested this across their document-creation skills and **improved triggering on 5 of 6 public skills**. The skill-creator now operates in four modes — Create, Eval, Improve, and Benchmark — using four composable sub-agents: Executor, Grader, Comparator, and Analyzer. [Source type: Official Anthropic blog, GitHub repo] ### Solving the undertriggering problem Anthropic explicitly acknowledges in the skill-creator SKILL.md: **"Currently Claude has a tendency to 'undertrigger' skills — to not use them when they'd be useful."** The recommended mitigation is writing "pushy" descriptions. Instead of a factual description like "How to build a simple fast dashboard to display internal data," append explicit trigger conditions: "**Make sure to use this skill whenever the user mentions dashboards, data visualization, internal metrics, or wants to display any kind of company data, even if they don't explicitly ask for a 'dashboard.'**" Best practices for effective descriptions: include both what the skill does and specific triggers/contexts for when to use it. Write in third person (inconsistent POV causes discovery problems). Be specific and include key terms — Claude must choose from potentially 100+ skills. Include specific user phrases that should trigger the skill. Note that Claude only consults skills for tasks it cannot easily handle on its own, so simple one-step queries may not trigger even with perfect descriptions. For the SKILL.md body: keep under **500 lines** for optimal performance. Use imperative form. Explain the "why" rather than heavy-handed MUSTs — Claude has good theory of mind. Include input/output examples. Avoid time-sensitive information. Use consistent terminology. Prefer one-level-deep references over deeply nested file references. [Source type: Official Anthropic skill-creator SKILL.md, platform best practices docs] ### What separates effective skills from ineffective ones **Effective skills** contain actionable content (step-by-step instructions, specific scripts, concrete examples), clear structure (numbered workflows, conditional paths), appropriate degrees of freedom matched to task fragility, feedback loops (run validator → fix → repeat), and bundled utility scripts that save tokens and ensure consistency. The "plan-validate-execute" pattern catches errors early by producing verifiable intermediate outputs. Progressive disclosure works well: SKILL.md as a table of contents pointing to detailed reference files loaded on demand. **Ineffective skills** contain narrative or explanatory content that wastes tokens on things Claude already knows, vague descriptions ("Helps with documents"), too many options without default recommendations, deeply nested file references (Claude may only `head -100` referenced files), magic numbers without justification, Windows-style paths, and time-sensitive instructions. [Source type: Official Anthropic complete guide PDF, platform docs] ### Skills compose naturally when multiple are active Claude can load multiple skills simultaneously and automatically identifies which ones are needed. Skills "stack together" in a single conversation thread — no user intervention required. In practice: uploading a CSV might activate a data-analysis skill, then saying "turn this into a deck" activates a presentation-builder skill in the same thread. Skills compose with MCP connectors (MCP connects data; skills teach what to do with it), with subagents in Claude Code (subagents access and use skills like the main agent), and with Projects (Projects provide persistent knowledge; skills provide portable procedures). While **skills cannot explicitly reference other skills**, Claude uses multiple skills together automatically. [Source type: Official Claude Help Center, Zapier guide] ### Three installation methods plus the marketplace **Claude.ai/Desktop — ZIP upload.** Package the skill folder as a .zip, navigate to Settings → Customize → Skills → "+", upload. Custom skills are private to individual accounts. Enterprise/Team admins can provision organization-wide. **Claude Code — multiple paths:** copy to `~/.claude/skills/` (personal) or `.claude/skills/` (project, shareable via git). The plugin marketplace supports `/plugin install` commands. **API — programmatic management** via the `/v1/skills` endpoint, uploading ZIP files and managing versions through the Claude Console. The Agent SDK provides the same support in TypeScript and Python. Additionally, community tools like `npx agent-skills-cli add`, Vercel's `add-skill` CLI, and SkillsMP (skillsmp.com) offer alternative installation paths. Agent Skills was published as an **open standard** on December 18, 2025 at agentskills.io, designed to be portable across platforms. [Source type: Official docs, Claude Code docs, GitHub] --- ## Section 4: Vibe coding is becoming agentic engineering ### The current tool landscape has stratified into three tiers **Tier 1 — AI-native code editors** target developers. Cursor (VS Code fork) is the default for developers who vibe code, with Cloud Agents, background agents, Bugbot for PR review, and MCP support for up to 40 tool integrations. Parent company Anysphere is valued at **$29.3 billion** and has surpassed **$1B in annualized revenue**. Claude Code is Anthropic's terminal-native agentic coding assistant featuring Plan Mode, sub-agents, Agent Teams (shipped with Opus 4.6), Skills 2.0, Hooks, a 6-level CLAUDE.md configuration hierarchy, and voice mode rolling out to ~5% of users. METR reports **~4% of GitHub commits are now authored by Claude Code**. Windsurf (formerly Codeium, acquired by OpenAI for $3B in December 2025) introduced the Cascade agent. GitHub Copilot maintains reliable positioning at $10/month. **Tier 2 — full-stack vibe coding platforms** target non-developers. Lovable (CEO Anton Osika targeting $1B revenue by late 2026), Bolt.new (full-stack web app generation from descriptions), v0 (Vercel, tops TechRadar's 2026 list, has blocked 100,000+ insecure deployments), and Replit (browser-based coding with autonomous agent deployment). **Tier 3 — platform-specific tools.** Wix Harmony (announced January 21, 2026) combines vibe coding with visual editing through the Aria AI agent, targeting enterprise-grade applications with 99.99% uptime. New 2026 entrants include OpenClaw (autonomous local agent), Devin (fully autonomous AI software engineer, $20–200/month), and GLM-5 from Zhipu AI (a foundation model explicitly designed "to transition the paradigm of vibe coding to agentic engineering," arXiv February 17, 2026). [Source type: TechRadar, official product docs, community reviews, arXiv] ### From "vibe coding" to "agentic engineering" — what actually changed Andrej Karpathy coined "vibe coding" in a February 2025 X post: "It's not really coding — I just see stuff, say stuff, run stuff, and copy paste stuff." By February 2026, the terminology shifted. On **February 4, 2026**, Addy Osmani (Google) published an influential blog post defining the spectrum: vibe coding (YOLO) ↔ AI-assisted engineering ↔ agentic engineering (disciplined). Karpathy subsequently posted that his "current favorite is 'agentic engineering.'" The practical distinction: **vibe coding** means "going with the vibes and not reviewing the code" — useful for prototypes and personal scripts. **Agentic engineering** means "AI does the implementation, human owns the architecture, quality, and correctness." This involves starting with a plan/spec/design doc before prompting, reviewing every diff with PR-like rigor, testing relentlessly ("the single biggest differentiator" per Osmani), and owning the codebase through docs, version control, CI, and monitoring. The emerging standard is the **PEV Loop**: Plan → Execute → Verify. This replaces "prompt and hope." Multi-agent orchestration uses specialized agents — author, tester, reviewer, security scanner — working in coordination. A new concept gaining traction is **"cognitive debt"** (Andrew Hunt) — the accumulated cost of poorly managed AI interactions and context loss — replacing technical debt as the primary 2026 threat. [Source type: Primary sources — Karpathy X posts, Osmani blog; independent reporting — The New Stack, IBM] ### Best practices have converged on four patterns **Spec-driven / PRD-first development.** Write a design doc or PRD before prompting anything. Red Hat (February 17, 2026): "Spec-driven development flips the relationship between instructions and code. Instead of treating prompts as throwaway tasks, you treat specifications as the authoritative blueprint." When something breaks, refine the spec and regenerate rather than debugging the code directly. This prevents "functionality flickering" — the phenomenon where AI fills in unspecified details differently each generation. **Rules files as configuration backbone.** CLAUDE.md (6-level hierarchy with @imports), .cursorrules / .cursor/rules/*.mdc, AGENTS.md (emerging cross-tool standard supported by Claude Code, Codex, Gemini CLI, Builder.io), .github/copilot-instructions.md, and .windsurfrules. Best practice: maintain one canonical standards doc and compile to each tool's format. Many teams symlink: `ln -s AGENTS.md CLAUDE.md`. **Scaffolding-before-walls pattern.** Build progressive complexity — structure first, then fill in. Keywords Studios: "The era of the mega prompt is over; the era of strategic decomposition has arrived." Integrate one component, test it, review it before moving to the next. Mandatory diff reviews and unit tests between AI-driven changes. **Git checkpoint discipline and untrusted-code treatment.** `git commit -m "Working state before AI refactor"` before major AI changes. Automated security scanning (Snyk, Semgrep, CodeQL) at every integration point. Fresh-context critique pattern: start new sessions for review rather than reviewing in the same context that generated the code. [Source type: Practitioner blogs — Osmani, Red Hat; tool documentation; community consensus] ### Failure modes have specific, documented evidence **Security vulnerabilities — the 2.74x finding.** Two studies converge on this number. Veracode's 2025 GenAI Code Security Report tested 100+ LLMs and found **AI-generated code contains 2.74x more vulnerabilities than human-written code**, with 45% of samples containing OWASP Top 10 vulnerabilities and an 86% failure rate on XSS. CodeRabbit's State of AI vs Human Code Generation Report (December 2025, analyzing 470 GitHub PRs) found AI PRs averaged 10.83 issues vs. 6.45 for human PRs (**1.7x more**), with XSS vulnerabilities specifically at **2.74x** and insecure object references at 1.91x. Apiiro found in Fortune 50 enterprises: **322% more privilege escalation paths**, 153% more design flaws, and a 40% jump in secrets exposure. ⚠️ **Contradiction flag:** The "2.74x" appears in two different contexts — Veracode's overall vulnerability rate across 100+ LLMs and CodeRabbit's specific XSS finding. Sources frequently conflate these. **The scaling wall / 70% wall.** Projects work as MVPs but can't scale. Red Hat: "So many vibe-coded projects hit a wall around the three-month mark. The codebase has grown beyond anyone's ability to hold it in their head." AI-generated code is "brittle and poorly organized under the hood — inconsistent structure, minimal comments, ad-hoc logic." **The confident stranger problem.** AI generates code that "looks reasonable on the surface but lacks proper error handling, introduces security vulnerabilities, breaks existing functionality, or creates unmaintainable architecture." A METR study participant compared AI to "a new contributor who doesn't yet understand the codebase." The code is syntactically valid but architecturally purposeless. **Dependency poisoning via hallucinated packages (slopsquatting).** A 2025 academic study (UT San Antonio/Virginia Tech/University of Oklahoma) analyzed 576,000 code samples from 16 LLMs: **19.7% of recommended packages didn't exist**. Open-source models hallucinated at 21.7%, commercial at 5.2%. Critically, **58% of hallucinated packages appeared repeatedly** across runs, making them predictable attack targets. The PhantomRaven campaign (detected 2026) planted 126 malicious npm packages achieving 86,000+ downloads. An npm worm discovered in 2026 steals tokens and API keys, injects malicious MCP servers into Claude Code, Cursor, and VS Code using a 48-hour "time bomb" delay. **Context loss in long sessions.** The agent starts, modifies files, its mental model diverges from disk state, it hallucinates, calls non-existent functions, and corrupts the codebase. Cursor has a documented 2026 bug where file locking conflicts between the Agent Review Tab and editor cause code changes to silently revert. [Source type: Research reports — Veracode, CodeRabbit, Apiiro; security research — BleepingComputer, Trend Micro; academic study] ### The METR finding reveals a 43-point perception gap METR's original randomized controlled trial (published July 10, 2025, data from February–June 2025) studied 16 experienced open-source developers across 246 real tasks on large repos (average 22K+ stars, 1M+ lines). Using primarily Cursor Pro with Claude 3.5/3.7 Sonnet, **developers using AI took 19% longer** (95% CI: +2% to +39%). The critical finding: before the study, developers expected a 24% speedup; after experiencing the slowdown, they **still believed AI sped them up by 20%** — a **43-point gap** between perception and reality. The follow-up study (published February 24, 2026) expanded to 57 developers and 800+ tasks but encountered severe methodological problems. **30–50% of developers refused to submit some tasks** because they didn't want to work without AI, and one developer completed zero tasks in the AI-disallowed condition. Raw results for the original cohort: estimated speedup of -18% (CI: -38% to +9%); new developers: -4% (CI: -15% to +9%). METR's conclusion: "We believe it is likely that developers are more sped up from AI tools now — in early 2026 — compared to our estimates from early 2025." But selection effects make the data an "unreliable signal," and they are redesigning the study. [Source type: Primary research — METR.org; independent analysis — IBM, TechCrunch] ### The AI Slopageddon is reshaping open-source governance The term describes what happens when AI-generated contributions flood open-source projects. Key incidents in January–February 2026: **Ghostty** (Mitchell Hashimoto) implemented a zero-tolerance policy — submitting bad AI code means permanent ban ("This is not an anti-AI stance. This is an anti-idiot stance"). **tldraw** (Steve Ruiz) auto-closes all external PRs after discovering his own AI-generated issues were being fed into other people's AI tools, creating "AI slop all the way down." **cURL** (Daniel Stenberg) shut down its bug bounty program after 6 years and $86K in payouts because ~20% of 2025 submissions were AI slop; in January 2026, he received 7 submissions in 16 hours, none identifying genuine vulnerabilities. On February 26, 2026, RedMonk's Kate Holterhoff compiled AI policies of 70 open-source organizations (Linux Foundation, Apache, Eclipse, Linux Kernel, Gentoo, curl, Matplotlib), publishing an interactive visualization. The fundamental question posed by Steve Ruiz: "In a world of AI coding assistants, is code from external contributors actually valuable at all?" [Source type: Primary analysis — RedMonk, Kate Holterhoff; direct statements from maintainers] ### Post-mortem patterns for stabilizing vibe-coded MVPs Common findings in vibe-coded MVP post-mortems: inconsistent structure, minimal comments, ad-hoc logic, no test coverage, no documentation, "functionality flickering" between regenerations, and missing security controls. Stabilization follows a consistent pattern: **spec-then-regenerate** (don't fix code directly — refine the spec and regenerate), **incremental validation** (integrate one component → test → review → next), **automated quality gates** (security scanning and unit test generation at every integration point), **treat AI code as untrusted** (same rigor as third-party code from unknown contributors), **fresh context reviews** (start new AI sessions for security review, separate from coding sessions), and **git discipline** (checkpoint before every major AI change with meaningful commit messages). The architecture recovery philosophy from Red Hat: "Use the vibes to explore. Use specifications to build." [Source type: Practitioner guides, community consensus, Red Hat] ### Claude Code as the agentic development backbone Claude Code operates as a terminal-native agentic assistant with these key capabilities as of March 2026: **Plan Mode** (read-only analysis toggled with Shift+Tab), **sub-agents** (quick focused workers for parallel tasks), **Agent Teams** (experimental, shipped with Opus 4.6 — multiple coordinated sessions with a shared task list and mailbox system where teammates communicate directly), **Skills 2.0** (on-demand loading consuming only ~2% of context budget), **Hooks** (lifecycle automation before/after Claude's actions), **Plugins** (marketplace with dev-workflows, governance plugins), **Commands** (saved prompts in .claude/commands/), and **Voice Mode** (rolling out to ~5% of users, closing the 3.7x gap between speaking and typing speed). Pricing after the January 2026 Team pricing update: included with Pro at $20/month, Team Standard seats down to $20/month (from $40), Max 5x at $100/month. Average API usage is ~$6/dev/day with 90% of users below $12/day. Many teams use Claude Code alongside Cursor ($40/month combined) — Claude Code for architectural and terminal-heavy work, Cursor for visual editing and tab completions. [Source type: Official Claude Code docs, community comparison articles] ### What changed in the last 2–4 weeks specifically **February 4:** Addy Osmani publishes "Agentic Engineering," crystallizing the vibe coding → agentic engineering shift. **February 12:** Cursor 2.4 releases with autonomous agents that plan and execute for hours without human intervention. **February 17:** GLM-5 paper on arXiv — a foundation model explicitly designed for the vibe coding to agentic engineering transition; also Red Hat's "uncomfortable truth" article on spec-driven development. **February 24:** METR follow-up study acknowledging selection bias. **February 26:** The New Stack and RedMonk articles on industry response to terminology shift and open-source AI policies. **March 3:** Wix launches ChatGPT app via OpenAI Apps SDK + Wix MCP. Growing NPM supply chain attacks targeting AI-assisted workflows (PhantomRaven, npm worm with MCP injection). Community momentum toward AGENTS.md as cross-tool standard. [Source type: Dated primary sources and articles] --- ## Section 5: Memory entries and how they actually work ### The 24-hour synthesis cycle and project isolation Memory is now available on **all Claude plans** (expanded to free users ~March 2026) across web, Desktop, and Mobile. Chat search (RAG-based past conversation retrieval) remains paid-only. Claude automatically summarizes conversations and creates a synthesis of key insights, **updated every 24 hours**. This synthesis provides context for every new standalone conversation. Deleted conversations are removed from the synthesis on the next cycle. **Project-scoped isolation is absolute.** Each project has its own separate memory space and dedicated project summary. Project conversations contribute only to that project's memory; standalone chats contribute to the general memory pool. The main user memory summary covers non-project chats only. Project A's memory cannot inform Project B, and general memory cannot inform project-scoped chats. **Recency bias** is a documented characteristic: old conversations fade from auto-generated memory. Community analysis describes it as "a living synthesis, not a permanent record." [Source type: Official Anthropic Help Center, community analysis] ### Two types of memory serve different purposes **Auto-generated memory** runs in the background every ~24 hours, analyzing conversation history and synthesizing key insights about your role, projects, professional context, communication preferences, technical preferences, and ongoing work. It is designed to focus on work-related topics and may not retain personal details unrelated to work. **User-directed memory edits** are accessible via Settings → Capabilities → "View and edit memory" (pencil icon) or by telling Claude directly what to remember in a chat. These take **immediate effect** — no waiting for the daily synthesis. Community-observed limits suggest a cap of **30 edits at 200 characters each**, though this is not confirmed in official documentation. The interaction model: auto-generated memory handles patterns and evolving context broadly. User-directed edits handle permanent truths you cannot afford to have fade. Information Claude needs everywhere (role, company, core tools) → user-directed edits. Project-specific context → that project's knowledge base and instructions. Information that evolves naturally → let auto-generated handle it. Temporary context → use the chat itself, don't burn an edit slot. [Source type: Official Anthropic docs, community analysis — limitededitionjonathan.substack.com] ⚠️ **Contradiction flag:** The 30-edit / 200-character limits are reported by a well-informed community member but not explicitly stated in official Anthropic documentation. Treat as community-observed behavior, not officially confirmed. ### The memory import tool launched around March 3 Visit **claude.com/import-memory**, copy the provided extraction prompt, paste it into your current AI provider (ChatGPT, Gemini, Grok, Copilot), copy the output, and paste into Claude's memory settings at claude.ai/settings/capabilities. For ChatGPT specifically, you can also navigate to Settings → Personalization → Manage Memories and copy entries directly. The feature is **experimental** and still in active development. Imported memories merge with existing Claude memories (no overwriting), may take up to 24 hours to fully integrate, and Claude may not retain imported personal details unrelated to work. [Source type: Official Anthropic support article, Cybersecurity News reporting] ### Memory processing order and priority Official documentation does not explicitly define a processing order, but community analysis and architectural evidence establish this effective priority: **Layer 0** — system instructions (project instructions, styles), always present. **Layer 1** — user memories, distilled facts stored in XML format. **Layer 2** — conversation history, rolling window of recent messages. **Layer 3** — current message. User-directed edits take immediate effect; auto-generated updates every 24 hours. Within a project, project instructions + knowledge base + project memory all provide context. For Claude Code, CLAUDE.md files load in precedence order: managed policy → enterprise → project → user → auto memory → session, with more specific scopes taking precedence. The combined memory, conversation history, and instructions all compete for the same context window. [Source type: Community analysis — rajiv.com/blog; official Claude Code docs for CLAUDE.md hierarchy] --- ## Section 6: Configuration patterns that work in practice ### Project instruction design follows a clear hierarchy Anthropic's official prompt engineering guide recommends a **4-block pattern**: Instructions (what to do), Context (background information), Task (specific request), Output Format (expected structure). Use XML tags (``, ``, ``) to separate sections unambiguously. Setting a role in the system prompt is "the most powerful way to use system prompts." The template: who you are (one line), what success looks like, constraints, uncertainty handling ("If unsure, say so"), and output format. A critical finding from independent research: **frontier thinking LLMs can follow approximately 150–200 instructions reliably**. Claude Code's built-in system prompt already uses ~50, so your custom instructions compete for a limited instruction-following budget. As instruction count increases, quality degrades **uniformly** — Claude doesn't just ignore newer instructions, it follows all instructions less reliably. Practical limits: keep root instructions to **50–100 lines** or under 2,000 tokens. For each line, ask "Would removing this cause Claude to make mistakes?" If not, cut it. Claude Code wraps CLAUDE.md content with a system reminder saying "this context may or may not be relevant," meaning Claude will ignore instructions it deems irrelevant — the more non-universally-applicable content you include, the more likely Claude ignores everything. [Source type: Official Anthropic prompt engineering guide; HumanLayer independent research citing arXiv paper] Common mistakes to avoid: one massive project for everything (create separate projects per workflow), skipping custom instructions entirely, generic project names like "Project 1," using LLMs as linters ("never send an LLM to do a linter's job"), auto-generating instructions with /init (CLAUDE.md is the highest-leverage configuration point — manually craft every line), and stuffing code style guidelines into instructions (Claude is an in-context learner that follows patterns from your codebase). [Source type: Community consensus, independent guides] ### Multi-tool workflows use CLAUDE.md as the integration backbone CLAUDE.md bridges Claude Code, Projects, and GitHub. It supports @imports for modular organization, hierarchical structure (root CLAUDE.md for universal rules, subdirectory files for local rules), and version control so the whole team inherits it. Claude Code GitHub Actions (anthropics/claude-code-action) enables @claude mentions in PRs and issues to trigger automated analysis, review, and implementation. Claude respects CLAUDE.md guidelines when creating PRs. The recommended multi-tool workflow pattern: **one task per session** scoped narrowly, **plan before code** using Plan Mode, **parallel sessions** (run 2–4 Claude Code terminals for independent tasks, each with its own 200K context), **subagents for isolation** (each gets fresh 200K context, returning only summaries to your main session), and **git worktrees for parallel branches**. A proven pipeline: configure three specialized agents — Architect (reviews design), Builder (implements from approved plans), QA (writes tests) — and pass all features through sequentially. [Source type: Official Claude Code docs, Morph guide, community workflows on GitHub] ### Session management determines output quality Context window management is where many users fail silently. The 200K-token context window is shared — 80% typically consumed by file reads and tool results, only 20% by messages. Claude 4.5+ models have context awareness, receiving token budget updates after each tool call. Auto-compaction activates for paid users with code execution enabled, summarizing earlier messages when approaching limits. **When to start fresh:** different task, conversation compressed, context degradation signs, 60–70% context used, or you've corrected Claude 2+ times on the same issue. **When to continue:** same task, immediate follow-up, conversation still short. Use the "handoff technique" before ending: have Claude write a dense summary of key decisions, constraints, current state, and next steps, then paste this into the new conversation with "picking up where we left off." Prefer **30–45 minute focused sessions**. A 2-hour session often reaches 2–3 compactions, progressively diluting summary quality. Compact proactively at 70% capacity using `/compact` — don't wait for auto-compression at 95%. **Never work past 85% context usage** — hallucination risk increases significantly. Delegate to subagents (each gets fresh 200K), disable unused tools and connectors (web search and Research are token-intensive), and turn off extended thinking when not needed. [Source type: Official Claude API docs, community guides — Toolpod, Morph, SFEIR Institute] ### Power-user patterns the community has discovered **Friction-driven memory.** After a work session, ask Claude: "Looking at our conversation so far, are there any patterns in how I've corrected your outputs? What preferences aren't captured in your instructions?" Claude drafts memory entries based on observed patterns. This iterative refinement produces better instructions than trying to write them all upfront. "The system prompt is your hypothesis. Project memory is your evidence." **The progressive disclosure file pattern.** Instead of cramming everything into CLAUDE.md, keep task-specific docs in separate files (building_the_project.md, running_tests.md, code_conventions.md, service_architecture.md). Include a brief index in CLAUDE.md and tell Claude to read relevant files before starting. Prefer pointers to copies — use file:line references rather than code snippets that go stale. **The stop hook review pattern.** Use a Claude Code stop hook to trigger a separate subagent that reviews modified files before returning control. This subagent checks for naming problems ("helper"/"utils" should be domain names), logic leaking from the domain model, and dangerous default values. **Breaking out of loops.** When Claude tries the same approach repeatedly: clear the conversation, simplify the task, show instead of tell by writing a minimal example yourself, or reframe the problem ("Implement as a state machine" instead of "handle these transitions"). The meta-skill is recognizing when you're in a loop early. [Source type: Community — AI Maker Substack, Level Up Coding/Medium, Reddit r/ClaudeAI] --- ## Conclusion: What this playbook makes clear Three themes emerge across all six sections. First, **the configuration surface area has expanded dramatically** — Projects now integrate skills, connectors, memory, research mode, and multiple model options into a single workspace, but this complexity demands deliberate design. The 150–200 instruction limit, the undertriggering problem, and the context window competition mean that less is often more. Second, **the tooling is maturing from "prompt and hope" to systematic engineering** — the skill-creator's evaluation framework, the PEV loop, spec-driven development, and the shift from vibe coding to agentic engineering all point toward the same destination: AI does the implementation, humans own the architecture and quality. Third, **the failure modes are now well-documented and preventable** — the 2.74x security vulnerability finding, the METR perception gap, slopsquatting, and the AI Slopageddon are not reasons to avoid AI-assisted development but rather specific risks with specific mitigations. The teams that will succeed are those who treat their Claude Project configuration with the same rigor they apply to their CI/CD pipeline: version-controlled, tested, and continuously refined.