Twitter AI Evaluation (legacy)
Thursday, April 23, 2026
Quick Insight
This is about the fundamental shift from UI-first to agent-first software design, where most interactions will happen through AI agents via APIs/MCPs rather than traditional interfaces. The author argues we're moving from "User → Interface → Database" to "User → Agent → Software's Agent → Database" and provides concrete examples from Ramp and Salesforce.
Actionable Takeaway
Add MCP servers to your side projects and fintech platform - especially the print-on-demand automation and webhook systems. Design APIs with agent-friendly descriptions and proactive context (like Notion's enhanced markdown spec approach) rather than just human-readable documentation.
Related to Your Work
Your webhook integrations and analytics dashboards at the fintech startup should be designed for agent consumption first. Instead of just building credit-card-linked offers through traditional APIs, consider how AI agents would need to interact with transaction data, merchant partnerships, and user preferences programmatically.
Thread/Source Worth Reading
The full article provides detailed examples of how Notion and Slack handle agent interactions differently - Notion proactively provides formatting specs while Slack requires manual formatting fixes. The piece cuts off mid-sentence discussing observability at Ramp's MCP launch, but the existing content has solid tactical advice on agent-first design patterns.
Quick Insight
This is about structuring AI agent skills hierarchically (atoms → molecules → compounds) rather than as flat dependency graphs to improve reliability and leverage. The key insight is that your brain's "RAM" for managing agents is the bottleneck - you should drive fewer, higher-level compound tasks that orchestrate many atomic operations rather than micromanaging low-level tasks.
Actionable Takeaway
Audit your current AI workflows and restructure them into this three-tier hierarchy. Start by identifying your most reliable, single-purpose automations (atoms), then build explicit workflows that chain 2-10 atoms together (molecules), and finally create high-level orchestrators for entire processes like "run customer onboarding flow" or "deploy feature end-to-end."
Related to Your Work
This directly applies to your webhook processing and analytics pipeline work. Instead of having agents handle individual webhook validation, data transformation, and notification tasks separately, build molecules that orchestrate the entire webhook-to-dashboard flow, freeing your mental bandwidth to focus on higher-level product decisions and strategy.
Thread/Source Worth Reading
Yes, the linked article provides concrete examples and deeper explanation of the atom/molecule/compound framework. It includes practical guidance on structuring workflows and managing agent reliability that goes well beyond the typical "AI agents are cool" content. Worth reading for implementation details.
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
Quick Insight
This is announcing "Flipbook" - a prototype that renders UI directly from AI models as pixels rather than using traditional HTML/CSS/layout engines. It's essentially AI-generated interfaces streamed in real-time, which could completely bypass traditional web development if it works at scale.
Actionable Takeaway
Watch for the actual prototype demo or early access. If they release anything testable, Brian should experiment with it for simple dashboard interfaces in his side projects to see if AI-generated UIs could replace custom frontend work.
Related to Your Work
This could potentially replace the frontend development work Brian does for analytics dashboards and webhook admin interfaces. Instead of building React/Svelte components, he might describe what he wants and let the AI generate the interface - especially useful for his web agency tools where clients want quick custom dashboards.
Thread/Source Worth Reading
This is 1/5 of a thread, so the remaining tweets likely contain technical details, demo footage, or architectural explanations that would be valuable for understanding feasibility and current limitations.
Quick Insight
This is about Alibaba's Qwen 3.6 model delivering Claude Opus-level performance while running locally on consumer hardware for $2/hour instead of $300k/year in API costs. The author tested it for coding tasks and found it capable enough to replace expensive closed-source models for most development work, potentially changing the economics of AI-powered coding tools.
Actionable Takeaway
Rent a RTX Pro 6000 on RunPod for ~$2/hour and test Qwen 3.6 with Ollama against your current AI coding workflow. Compare costs and performance for your Chrome extension and automation projects where you're currently hitting rate limits or paying high API fees.
Related to Your Work
For your print-on-demand automation and AI-powered dev workflows, this could eliminate the bottleneck of expensive API calls while maintaining code quality. Your fintech webhook integrations could benefit from local AI that doesn't send proprietary code to external providers, addressing IP concerns.
Thread/Source Worth Reading
Yes - the linked article provides detailed benchmarks, setup instructions for RunPod, and actual performance comparisons between Qwen 3.6, Claude Opus, and GPT models. It includes practical cost analysis showing potential $300k/year savings per engineer and specific examples of coding task performance.
Quick Insight
This is a deep-dive analysis of OpenAI's new GPT Image 2 model, with 6 specific prompting techniques extracted from community research and 5 working examples. The key breakthrough is that this model actually "thinks" through prompts rather than just doing text-to-image matching, making it significantly more capable for design work.
Actionable Takeaway
Switch to GPT-5.4 Thinking mode and test the exact prompting patterns provided - especially the aspect ratio hack (include exact pixel dimensions) and double-quoting text for better typography rendering in any design work for side projects.
Related to Your Work
For your web agency tools and Chrome extensions, this could automate mockup generation and design assets. The UI mockup prompts could generate realistic app screenshots for marketing pages, and the brand identity capabilities could help with client work automation.
Thread/Source Worth Reading
Yes, absolutely worth the full read. The article provides 5 copy-paste prompts that are immediately usable, plus technical details on model switching and prompt structure that aren't documented elsewhere. The community research aggregation approach (/last30days) is also interesting for staying current on AI capabilities.
Quick Insight
This is a philosophical critique of the "taste" narrative in tech — the idea that humans' primary value in an AI world is curating/selecting AI output rather than creating. The author argues this is a "demotion" from humanity's historical role as co-creators, using examples from Renaissance art patronage where patrons and artists collaborated intimately on creation itself.
Actionable Takeaway
When building AI features, structure them as collaborative tools rather than pure generators that you just approve/reject. Design workflows where you specify constraints, provide feedback during generation, and iteratively shape the output — not just curate finished results.
Related to Your Work
This directly applies to your AI-powered dev workflows and side projects. Instead of building tools that generate code/content for you to review, build ones that let you guide the AI's process — specify requirements, provide iterative feedback, and co-create the solution rather than just picking from AI suggestions.
Thread/Source Worth Reading
The linked article is a thoughtful long-form piece contrasting modern "taste-based" AI collaboration with historical patron-artist relationships. Worth reading for the philosophical framework, though it's more conceptual than tactical. The Renaissance bottega examples are particularly insightful for rethinking human-AI collaboration patterns.
Today we’re introducing two big steps for health at OpenAI: - ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work - HealthBench Professional, a new benchmark to evaluate real clinician chat tasks We’re excited about what this can unlock for care. ❤️
Quick Insight
OpenAI is launching ChatGPT for Clinicians (free) and a new healthcare benchmark for evaluating AI performance on clinical tasks. This signals OpenAI's push into vertical-specific AI applications, which is relevant for anyone building industry-focused AI tooling or considering how to adapt general AI models for specialized use cases.
Actionable Takeaway
Study how OpenAI is positioning and packaging ChatGPT for a specific vertical (healthcare) - this could inform how Brian packages AI features for fintech use cases or builds vertical-specific versions of his dev tools for different agency niches.
Related to Your Work
Direct parallel to Brian's fintech platform work - if OpenAI can create a clinician-specific version, there's likely opportunity to build fintech-specific AI tooling (compliance-aware code generation, financial data analysis, fraud detection workflows) that could differentiate his platform or spawn new side projects.
Thread/Source Worth Reading
The linked content likely contains details about the clinical ChatGPT features and HealthBench benchmark methodology. Worth skimming to understand OpenAI's vertical specialization approach, but probably not deeply technical enough to warrant full reading unless Brian is actively exploring healthcare-adjacent opportunities.
8k stars!
Quick Insight
This is just a celebration post about a project hitting 8k GitHub stars with a link. Without knowing what the linked project is, this is pure noise - just someone excited about their repo's popularity with zero context about what it actually does or why it matters.
Actionable Takeaway
Nothing actionable without knowing what the project is. The link would need to be checked to determine if there's any actual value here.
Related to Your Work
Can't determine relevance without knowing what the 8k-star project actually is. Could be anything from a useful dev tool to a toy project.
Thread/Source Worth Reading
The linked content needs to be checked to evaluate if it's worth reading. Star count alone doesn't indicate quality or relevance - plenty of mediocre projects get stars for being first-to-market or having good marketing.
Quick Insight
Levie is arguing that as AI agents become commoditized, competitive advantage will shift to "context engineering" - how well companies feed their proprietary data, processes, and tribal knowledge to AI systems. It's a solid thesis: everyone gets the same AI lawyer, but the one with better context about your business will perform better.
Actionable Takeaway
Start documenting and structuring your fintech platform's tribal knowledge now - customer behavior patterns, fraud detection insights, integration quirks, business rules. Build this into searchable formats that future AI agents can consume as context.
Related to Your Work
Your webhook integrations and analytics dashboards are generating valuable context data about merchant behavior and offer performance. This operational intelligence could become a competitive moat when fed to AI agents that help optimize campaigns or detect anomalies.
Thread/Source Worth Reading
Yes, worth reading. The linked article dives deep into context graphs and practical challenges of getting AI agents access to enterprise data across systems. It addresses real implementation problems like data permissions, governance, and connecting disparate systems - directly relevant to fintech compliance and data management.
# The Context Layer The impact of AI on how organizations work is just beginning. Today humans use LLMs by pasting in text and asking for feedback. Even by next year, this will seem archaic and antiquated. AI is going to turbo-charge the productivity of information work. Skilled humans, with AI at their beck and call, will be 10-100x more productive than they are today. It's not even hard to imagine this- software that would have taken me many weeks or months to build can now be done in days. Next year - what takes days, will take hours. SaaS, as we have known it, is dead. This is a great re-wiring of labor. In the 90s - much of the discussion about information work naturally revolved around the internet. Oracle famously published their vision of "e-business transformation". "We wanted one unified system with one global database. With all our information in one place, we could easily access and share information. We'd make better decisions, groups would cooperate, and IT costs would go down." -- sounds familiar. When I look at Chroma's customer base - I already see this happening. This "context layer" is all an org's data and context being unified to empower agents and AI transformation. It makes sense that we would see the largest AI adoption happen first in internal productivity. First it came for coding - and now it will come for all other information work. It's relatively easier to accelerate work that is already being done through the hands of expert humans than risk "putting AI in front of customers". The market will demand that every Fortune 5000 company in the next 3 years build a context layer and transform their businesses with AI. If they don't - they will die. In 2023, I said, "Large models are extremely powerful and just getting more and more so. This will change how all questions are asked and answered. Every piece of data on earth will have multiple embeddings associated with it to make it interpretable to large and tiny models. That means every org in the world is going to be spinning up a new kind of DB inside their organization in the next 5 years." - we are right on schedule. The context layer will be the hive-mind of how organizations of the future live and breathe. It will make them faster, more responsive, and way more efficient. Chroma will be the central nervous system of how it all gets done. Models will come and go - but data is forever.
Quick Insight
Jeffrey Huber (Chroma CEO) argues that AI will create 10-100x productivity gains through a "context layer" - unified organizational data that feeds AI agents. He's essentially pitching that every company needs a vector database strategy or they'll become obsolete, which is predictably self-serving but not necessarily wrong.
Actionable Takeaway
Build a simple context layer for your own side projects - start with a vector database (Chroma, Pinecone, or Supabase's pgvector) to store embeddings of your project docs, code, and customer data. Test how much faster you can build features when AI has full context of your existing work.
Related to Your Work
Your fintech platform probably has scattered data across webhook logs, user analytics, transaction patterns, and support tickets. A context layer could let you build AI agents that automatically debug webhook issues, generate compliance reports, or identify user behavior patterns - turning weeks of manual analysis into minutes.
Thread/Source Worth Reading
No links provided - this is a standalone thread summarizing Chroma's positioning strategy.
Quick Insight
The tweet introduces a new workflow using ChatGPT Images 2.0 to generate website mockups, then feeding those images to code generation models to build the actual frontend. The author claims this produces better visual design than pure text-to-code prompting because the image model has superior "visual taste" compared to GPT's frontend generation abilities.
Actionable Takeaway
Try the two-step process on your next Chrome extension or web agency client project: generate a mockup image in ChatGPT first, then use that image as reference when prompting Claude/Codex to write the actual HTML/CSS/JavaScript code.
Related to Your Work
For your web agency tools side project, this could streamline client mockup-to-code workflows. Instead of going back-and-forth on design iterations in code, generate visual mockups first to get design approval, then convert to Astro/Svelte components.
Thread/Source Worth Reading
The linked article provides a specific "taste skill" prompt template and two concrete methods for implementation. The GitHub repo contains the actual prompt engineering templates that could be worth testing, though the writing quality suggests this might be more hype than substance.
Quick Insight
Garry Tan is linking to a detailed post about "skillify" - a methodology for making AI agents more reliable by turning failures into permanent structural fixes with testable skills. The author criticizes LangChain for providing testing tools without opinionated workflows, then demonstrates how to build deterministic scripts that prevent agents from repeating the same mistakes.
Actionable Takeaway
Implement the "skillify" approach in your AI integrations: when an agent fails, write a markdown skill file that teaches the proper process, then have the agent generate deterministic code to handle that type of task going forward. Start with your most common AI workflow failures.
Related to Your Work
This directly applies to your webhook integrations and analytics work where AI agents might handle data processing or customer queries. Instead of tweaking prompts when agents mess up offer categorization or transaction analysis, you could build testable skills that create deterministic fallback paths for common edge cases.
Thread/Source Worth Reading
YES - The linked article is substantial and practical. It shows two real failure cases with complete code examples of the "skillify" methodology, including how to structure skills as markdown procedures and generate deterministic scripts. Much more concrete than typical AI reliability content.
Quick Insight
This is a deep dive into Recursive Language Models (RLMs) - a new AI paradigm where models treat their own prompts as programmable environments they can inspect and recursively query. It's significant because it merges reasoning and tool use into a single abstraction, potentially solving long-context limitations and enabling more sophisticated AI workflows.
Actionable Takeaway
Experiment with RLM concepts in your AI-powered dev workflows - try building a system where your AI agents can recursively break down and process different parts of their context (like segmenting large codebases or complex financial data processing tasks).
Related to Your Work
This directly applies to your fintech platform's webhook processing and analytics dashboards - RLMs could help AI agents better handle complex, multi-part financial transactions by recursively analyzing different transaction components, user patterns, and offer matching logic without hitting context window limits.
Thread/Source Worth Reading
Yes, the linked article is comprehensive and technical. It provides the full RLM framework definition, empirical results from 6 months of research, practical limitations, and implementation starting points. Essential reading if you're serious about advanced AI agent architectures.
Quick Insight
Shpigford is sharing a Claude AI workflow that enhances the built-in Plan mode by automatically pulling in fresh documentation and web search results before coding. The key insight is that Claude's training data has gaps, especially for recent updates, so feeding it current docs via Context7 MCP and live search makes it significantly more effective for real development tasks.
Actionable Takeaway
Install Context7 MCP for Claude and create a similar /research skill that automatically gathers latest docs and search results before starting any coding task. Test this on your next webhook integration or dashboard feature to see if it reduces the back-and-forth debugging cycle.
Related to Your Work
This directly applies to your fintech platform work where you're constantly integrating with third-party APIs that have evolving documentation. Having Claude pull the latest Stripe, Plaid, or partner API docs before generating webhook handlers could prevent those frustrating "this endpoint changed last month" debugging sessions.
Thread/Source Worth Reading
Yes, worth reading. The linked article provides the actual /research skill implementation and explains the Context7 vs web scraping trade-offs for JS-heavy developer sites. It includes practical details on setup and shows real workflow improvements for feature development.
**Error:** Could not evaluate this article. OpenRouter API error (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 64000 tokens, but can only afford 63005. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2wV3dQAoVqfIROjawS8ZzEcvi2o"}
**Error:** Could not evaluate this article. OpenRouter API error (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 64000 tokens, but can only afford 63005. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2wV3dQAoVqfIROjawS8ZzEcvi2o"}
opus 4.7 seems to have a much better time in claude code if you run without most of the system prompt (claude --system-prompt ".")
**Error:** Could not evaluate this article. OpenRouter API error (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 64000 tokens, but can only afford 63005. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2wV3dQAoVqfIROjawS8ZzEcvi2o"}
Ce japonais a trouvé le vrai levier de Claude Code avant tout le monde. Il installe "Find Skills", décrit ce qu'il veut faire et le système lui propose les skills parfaits parmi des centaines. Son système YouTube automatisé cartonne grâce à ça.
**Error:** Could not evaluate this article. OpenRouter API error (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 64000 tokens, but can only afford 63005. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2wV3dQAoVqfIROjawS8ZzEcvi2o"}
**Error:** Could not evaluate this article. OpenRouter API error (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 64000 tokens, but can only afford 63005. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2wV3dQAoVqfIROjawS8ZzEcvi2o"}
**Error:** Could not evaluate this article. OpenRouter API error (402): {"error":{"message":"This request requires more credits, or fewer max_tokens. You requested up to 64000 tokens, but can only afford 63005. To increase, visit https://openrouter.ai/settings/keys and create a key with a higher total limit","code":402,"metadata":{"provider_name":null}},"user_id":"user_2wV3dQAoVqfIROjawS8ZzEcvi2o"}