What Are AI Agents and Where Do SEO Skills Fit In?

An AI agent does not wait for a question. It perceives inputs, decides what to do next, calls tools, inspects the results, and loops until the task is done or its permissions run out. AI Agent Skills are what fill that tool-use layer with actual capability, discrete callable modules that give an agent a specific function it can invoke without human intervention. Understanding the full architectural stack, from perception and memory through planning and orchestration, tells you which skills an agent can actually reach, how it decides between them, and whether their outputs survive a session reset or disappear into the void. The clean component diagram turns out to be an approximation. The boundary between "skills you install" and "skills the agent generates" is already dissolving, the routing problem is harder than the builder problem, and the planning layer is not fixed architecture but a swappable capability. All of that matters before you buy or build a single skill.

What Is an AI Agent and How Is It Different from a Chatbot or a Simple LLM Call?

An AI agent is an autonomous software system that perceives inputs, reasons over them, and takes multi-step actions toward a goal without continuous human instruction. That definition, drawn from Russell and Norvig's foundational treatment of intelligent agents, is what separates an agent from the two things it gets confused with most often.

A chatbot responds. It takes a message, produces a reply, and stops. No loop, no memory that persists, no action in the world beyond text output. A single LLM call is even simpler: one prompt in, one completion out, no state, no tool invocation, no awareness of what happened before or what needs to happen next. Both are useful. Neither is an agent.

What makes an agent an agent is the loop. Perception brings in data from an environment, including text, API responses, file contents, and sensor readings. A planning and reasoning engine decides what to do with that data. The tool-use layer executes the decision: calling an API, running code, querying a database, writing a file. The result feeds back into the next reasoning step. This continues across multiple steps until the goal is reached or the agent hits a constraint. AutoGPT made this loop visible to a mass audience in 2023, spawning an entire ecosystem of frameworks built around the same basic architecture.

AI Agent Skills sit inside that loop at the tool-use layer. A skill is not a prompt and not a plugin in the browser-extension sense. It is a discrete, callable capability module that the agent can invoke autonomously when the current task requires it, such as an SEO content audit, a SERP analysis, or a keyword clustering run. Without the agent loop, a skill has nowhere to run.

How Do Reactive Agents, Deliberative Agents, and Agentic LLM Systems Differ in Planning Depth?

The three agent types sit on a spectrum of planning depth, and where a system lands on that spectrum determines whether AI Agent Skills are useful to it at all.

Dimension	Reactive Agents	Deliberative Agents	Agentic LLM Systems
Planning depth	None, stimulus maps directly to response	Moderate, builds internal world model, plans before acting	Deep, multi-step planning, tool use, memory, feedback loops
Internal state	Minimal	Maintained world model	Rich: working memory, external stores, episodic traces
Skill-use capacity	Low to none	Moderate	Full
Latency	Lowest	Higher	Highest
Failure mode	Brittle on novel inputs	Slow on time-sensitive tasks	Can loop unproductively or over-plan

Reactive agents map perception directly to action. No internal model, no look-ahead, no skill registry. They're fast, and on latency-sensitive tasks such as real-time anomaly flagging or millisecond routing decisions, they outperform deliberative systems precisely because they skip the planning overhead. More planning is not always better.

Deliberative agents maintain a world model and plan sequences of actions before executing. They reason about future states. They're slower but more capable on tasks that require anticipating consequences.

Agentic LLM systems, the class that LangChain, CrewAI, and AutoGPT represent, extend deliberative planning into multi-step workflows with tool selection, multi-tier memory, verification steps, and adaptation over longer task horizons. The planning horizon is deepest here. This is the architecture class where AI Agent Skills become genuinely useful, because the agent has the reasoning capacity to select the right skill, invoke it, inspect the output, and decide what to do next.

What Are the Core Components of an AI Agent Architecture?

Four layers appear in every fully capable agent architecture. The tool-use layer, the direct integration point for AI Agent Skills, is the one most affected when new skills are added.

Perception handles inputs from the environment: text, structured data , API responses, file contents. The agent cannot reason about what it cannot perceive.

Memory is a multi-tier system, not a single component. Working memory holds the active context window, covering the current task, recent observations, and immediate skill results. Semantic memory persists learned facts across sessions. Episodic memory stores sequences of past interactions, the kind of memory stream that the Generative Agents research demonstrated could support reflection and planning over time. Procedural memory stores reusable "how-to" behavior, and this is the tier closest to AI Agent Skills themselves. Which tier a skill's output lands in determines whether the agent compounds knowledge across sessions or forgets everything on reset.

Planning decomposes goals into task sequences and prioritizes them. BabyAGI, the GitHub-based autonomous task management demonstration, implements task creation, prioritization, and execution as separate agent loops that can be composed or swapped out, showing that planning is not a monolithic core but a swappable capability. Effectively, planning is itself a skill.

Tool Use is the execution layer: the skill registry, selection logic, invocation mechanism, and output parser. Adding a new AI Agent Skill changes this layer directly. The orchestration logic that governs skill selection, sequencing, and output routing, managed by frameworks like LangChain and CrewAI, sits here. This layer is where the agent transitions from reasoning to action, and it's where the architecture gets complicated in ways the clean diagram doesn't show.

The agent runtime environment constrains this entire stack before any of the agent's own reasoning engages. Google Cloud's agent runtime documentation makes this explicit: the runtime exposes or restricts available tools, APIs, and memory backends. Skill selection is therefore an infrastructure question before it's a reasoning question.

What Is an AI Agent Skill and How Does It Differ from a Tool or a Plugin?

An AI Agent Skill is a discrete, callable capability module that plugs into the agent's tool-use layer to give the agent a specific function it can invoke autonomously. The distinction from a tool and a plugin matters practically, not just terminologically.

A tool is an execution primitive. It runs a function, calls an API, queries a database, and returns a result. Tools provide capability. A skill provides judgment about how to use that capability, including the workflow, the sequencing logic, the validation steps, the conditional branching, and the output formatting. One architecture guide states it directly: an agent with only tools lacks domain expertise; an agent with only skills can't execute. Skills are the procedural layer that sits above the execution primitives.

A plugin, in the browser-extension sense, adds a discrete external integration. A skill is broader and more behavior-shaping. It encodes the task method, the decision rules, and the workflow structure the agent should follow when it needs to accomplish a specific type of work. Several implementations describe skills as self-contained packages, including a main instruction document, optional scripts, reference files, templates, and assets, loaded on demand when a task matches the skill's scope. The agent first sees only a lightweight index of skill names and descriptions, then pulls the full instructions only when relevant. This progressive disclosure keeps the context window lean and avoids wasting tokens on procedures that don't apply to the current task.

For SEO work specifically, the skills worth focusing on, including keyword research , content auditing, SERP analysis, internal linking analysis, and on-page optimization, each encode a repeatable procedure that the agent can execute consistently across sessions, rather than relying on a fresh prompt each time. That repeatability is the commercial value. Exploring the full range of SEO skills for AI agents shows how that procedural packaging translates into deployable capability.

The more interesting boundary question is whether skills always come from a marketplace. Voyager, an LLM-powered agent operating in Minecraft, writes, verifies, stores, and reuses its own executable skill library through iterative in-context learning, accumulating capabilities at runtime without human curation. This is a research demonstration in a controlled game environment, not a production claim. But the architectural implication is real: the boundary between "skills you install" and "skills the agent generates" is already dissolving.

How Does the Tool-Use Layer Connect AI Agent Architecture to the Skills an Agent Can Call?

By the time an agent reaches the tool-use layer, the architecture has already done significant work: perception has ingested inputs, memory has surfaced relevant context, and the planning engine has decomposed the goal into actionable steps. The tool-use layer is where that reasoning becomes action. It maintains a registry of available skills with their schemas and descriptions, runs a selection mechanism that maps the current task to the right skill, executes the invocation, handles errors and retries, and routes the output back into the agent's memory or downstream steps.

The connection between the tool-use layer and AI Agent Skills is a division of responsibility. The tool-use layer provides the execution infrastructure, including the callable functions, the API integrations, and the safety constraints. The skill provides the procedural knowledge, specifically the instructions for when and how to use that infrastructure effectively. Neither works without the other. An agent runtime environment like the one Google Cloud describes establishes the infrastructure boundary: what tools are even exposed to the agent, what memory backends are available, what API permissions are granted. The skill marketplace operates above that infrastructure layer, not below it.

This matters for anyone buying or building skills: a skill that requires a web search tool cannot run on an agent runtime that doesn't expose web access. Checking AI agent framework compatibility for SEO skills before purchasing is not optional , it's the first constraint the architecture imposes.

How Do Function Calling and the ReAct Pattern Differ as Skill Invocation Mechanisms?

Function Calling and the ReAct pattern are both ways an agent invokes a skill, but they operate through structurally different mechanisms and suit different task types.

Aspect	Function Calling	ReAct Pattern
Invocation style	Structured JSON dispatch, model selects a named function and emits arguments	Iterative thought-action-observation loop, model reasons, acts, observes, repeats
Planning	Minimal, typically one decision per step	Built-in multi-step planning across turns
Skill choice	Selected from a predefined, enumerated set	Chosen dynamically based on intermediate results
Best for	Deterministic, governed, low-latency tasks	Ambiguous, exploratory, multi-step tasks
Latency and cost	Faster and cheaper	Slower and more token-intensive
Failure mode	Inflexible if the needed skill wasn't anticipated	Can loop unproductively on simple tasks

OpenAI's function-calling specification is the clearest implementation of the structured dispatch model. The model maps user intent to a named function with a defined schema, emits a structured call payload, and the runtime executes it deterministically. The allowed skills are enumerated upfront, which makes the system easier to constrain, validate, and debug. Enterprise-oriented deployments favor this for governed workflows precisely because the skill space is bounded.

The ReAct pattern, formalized in a 2022 paper by Yao, Zhao, Yu, and colleagues, works differently. The agent produces an explicit reasoning trace, a "thought," then issues an action, receives an observation, and loops. Skills are chosen dynamically based on what the intermediate observations reveal. This is the behavioral contract that explains how an agent decides which skill to invoke and when to stop trying. Without it, the tool-use layer is just a function registry with no principled dispatch mechanism.

Can an Agent Use Multiple Skills in a Single Reasoning Step?

An agent can invoke more than one skill within a single reasoning step when the task matches multiple capabilities simultaneously. In function-calling mode, some runtimes support parallel function calls in a single API response, where the model emits multiple structured calls and the runtime executes them concurrently. In ReAct mode, the agent can identify that a task requires sequential skill invocations and chain them across loop iterations without returning control to the user between steps.

Skill composability, the ability to combine discrete skills within a unified context, is one of the architectural advantages of modular skill design over monolithic prompts. An agent with a well-registered skill suite can internalize what would otherwise require a multi-agent setup, keeping coordination overhead low by running multiple specialized behaviors within one context. Whether parallel invocation is available depends on the orchestration framework; LangChain and Crew AI bot h support multi-skill workflows, but the specific execution model differs.

Does the MRKL Routing Problem Make Skill Selection Harder Than Building the Skills Themselves?

Routing is harder than building. The MRKL systems paper, Modular Reasoning, Knowledge, and Language, makes this explicit: the critical unsolved component in a modular neuro-symbolic architecture is not the individual expert modules but the router that knows which module, neural or symbolic, is the right one for a given sub-problem.

Building a capable keyword research skill is a tractable engineering problem. Ensuring the agent consistently routes the right query to that skill, rather than to a semantically adjacent but wrong skill, is harder. Routing failures are difficult to debug because they look like skill failures. The SkillOrchestra research found that overly fine-grained skills confuse weaker orchestrators, while overly coarse skills reduce precision, meaning the skill design space itself affects routing accuracy, not just the router's quality.

A skill that works perfectly in isolation can fail in production because the orchestration layer doesn't select it reliably. Framework selection matters here: LangChain's tool-selection mechanism, CrewAI's role-based delegation, and AutoGPT's planning loop each handle routing differently, and that difference affects which skills perform well on each platform.

Why Do Modular AI Agent Skills Outperform Monolithic Agent Prompts at Scale?

Monolithic agent prompts fail at scale for a specific reason: the context window fills with procedures that don't apply to the current task, which dilutes the signal for the ones that do. A single large prompt encoding keyword research, content auditing, SERP analysis, internal linking logic, and on-page optimization guidance simultaneously forces the model to reason through irrelevant instructions on every invocation. Token waste is the minor problem. The major problem is that the model's attention mechanism distributes across the entire prompt, reducing the effective weight given to the relevant procedure.

Modular AI Agent Skills fix this through progressive disclosure. Each skill is a self-contained package loaded on demand when a task matches its scope. The agent sees only skill names and descriptions until it needs the full procedure, then loads the complete instructions into working memory. This keeps the active context lean, improves task-specific reasoning quality, and makes the skill library scalable: you can add a new skill without touching any existing one.

The maintainability argument is equally strong. A monolithic prompt that needs updating requires careful surgery across the entire block, with risk of breaking unrelated behaviors. A modular skill can be versioned, tested in isolation, and updated without touching the rest of the agent's capability stack. Different team members can own different skills. The person who knows SERP analysis can improve that skill without needing to understand the internal linking skill at all.

Three statistics from the research literature define the performance boundaries here. Curated skills improved agent performance by an average of 16 percentage points across thousands of benchmark trajectories, compared with no-skills baselines, a gain large enough to justify the investment in proper skill design. Self-generated skills averaged slightly negative performance, marginally worse than no skills at all, which matters because several frameworks now advertise runtime skill generation as a feature. When the skill count climbed to 202 in one study, success rates dropped by up to 21 percentage points on average, showing that registry size is itself a performance variable. More skills is not better. Curated, domain-relevant, correctly routed skills are better.

How Does Skill Composition Differ from Prompt Chaining as an Agent Design Pattern?

Prompt chaining and skill composition look similar from the outside, both produce multi-step agent behavior, but they're structurally different in ways that matter for production systems.

Aspect	Skill Composition	Prompt Chaining
Core unit	Reusable skill with defined inputs, outputs, and policies	Prompt step producing text that feeds the next prompt
Persistence	Persistent across tasks and sessions	Temporary for one workflow
Contents	Instructions, tools, policies, context bundles, scripts	Prompt text and intermediate text outputs
Reuse	High, same skill runs across many agents and workflows	Low, chain is designed for one specific flow
Best fit	Production agent architecture with multiple capability domains	Linear multi-step reasoning or transformation tasks

Prompt chaining is a workflow pattern. It coordinates steps, but it doesn't create a reusable capability with its own execution policy. The chain is redesigned for each new workflow. Skill composition treats each capability as a named, durable unit that the orchestration layer can select, combine, and route outputs between. The difference becomes visible at scale: a prompt chain for a ten-step SEO audit is a custom script; a skill-composed SEO audit draws on registered skills that can also be invoked individually, combined differently for a different task, or updated without rebuilding the chain.

Can an Agent Write and Reuse Its Own Skills at Runtime Without Human Curation?

Voyager demonstrates that the answer is yes, with significant caveats about context. The Voyager agent, operating in Minecraft, writes executable skills, verifies them against task outcomes, stores them in a skill library, and retrieves and reuses them in later tasks, all through iterative in-context learning, without human curation of the library.

This is a research demonstration in a constrained game environment. The skill space in Minecraft is bounded, the verification mechanism is clear (did the agent accomplish the task?), and the consequences of a bad skill are low. Production SEO environments are none of those things. Deploying a self-generating skill system on a client's site without explicit review gates is not advisable regardless of how confident the framework documentation sounds.

But the architectural implication stands: the boundary between installed skills and generated skills is not a permanent wall. PraisonAI explicitly supports runtime skill creation and editing. Anthropic has described autonomous skill creation as a planned capability direction. The practical question for skill buyers is not whether self-generation will eventually work; it's whether the current agent runtime has the verification infrastructure to make self-generated skills trustworthy. Most don't yet.

How Do Agent Memory Types Interact with Skill Outputs Across Sessions?

Where a skill's output lands in the memory architecture determines whether it's useful once or useful indefinitely.

Working memory holds the immediate skill result in the active context window. It's available for the current reasoning step and gone on reset. An agent that writes all skill outputs only to working memory is not learning in any durable sense; it's re-executing the same skill from scratch every session.

Episodic memory stores sequences of past interactions and action traces. A skill output written here becomes part of the agent's history, available for retrieval in later sessions and usable as evidence about what worked before. The Generative Agents research demonstrated memory streams with retrieval, reflection, and planning layers that persist over time, showing that skill outputs have radically different utility depending on where they land.

Semantic memory persists learned facts: user preferences, domain knowledge, structured information extracted from prior runs. A keyword research skill that writes its findings to semantic memory means the agent doesn't re-derive the same keyword landscape on every invocation.

Procedural memory is the tier closest to AI Agent Skills themselves. It stores reusable how-to behavior, learned action patterns, and workflow logic. Skills, in many implementations, are procedural memory with progressive disclosure: the agent stores the procedure and retrieves it when the task matches.

For anyone building or buying skills, the practical question is: which memory tier does this skill write its outputs to? If the answer is working memory only, the skill is stateless and the agent starts fresh every session. That's fine for some tasks. For SEO workflows that compound knowledge over time, tracking a site's link profile or building a content gap analysis across months, you need skills that write to persistent memory tiers.

Does Multi-Agent Skill Delegation Require a Different Memory Architecture Than Single-Agent Systems?

Multi-agent skill delegation requires a different memory architecture, and the difference is not minor. When one agent orchestrates other specialized agents as if they were skills, the pattern HuggingGPT uses when delegating sub-tasks to hundreds of specialized Hugging Face models, memory must handle shared state, role boundaries, synchronization, and handoffs across agents. Single-agent memory mainly helps one model remember context and prior outputs. Multi-agent memory is part of the coordination layer.

Microsoft Research's LEGOMem work found that orchestrator memory is critical for task decomposition and delegation, while fine-grained agent memory improves execution accuracy in individual workflow steps. EvoAgent combines skill learning with a three-layer memory architecture and hierarchical sub-agent delegation explicitly to support this pattern. The single-agent component model cannot describe this second architectural layer, and most introductory agent architecture explainers don't try.

For practical purposes: if you're running skills within a single LangChain or CrewAI agent, standard memory configuration applies. If you're building a multi-agent crew where different agents hold different skill sets and pass work between them, memory architecture becomes a design decision in its own right, not a default setting.

How Does an Agent Orchestration Layer Manage Multiple Skills Simultaneously?

LangChain, CrewAI, and AutoGPT each implement an orchestration layer that handles four distinct functions: skill registration, selection logic, invocation sequencing, and output routing.

Skill registration maintains a catalog of available skills with their schemas, descriptions, and input/output contracts. The orchestrator cannot select a skill it doesn't know about. This is why skill compatibility with a specific framework matters before installation: a skill registered incorrectly won't appear in the selection pool.

Selection logic is the routing problem at the orchestration level. The AgentSkillOS research describes organizing skills into a capability tree, recursively grouping broad capabilities into narrower ones, so the orchestrator can traverse the tree and prune irrelevant branches rather than scanning every registered skill on every task. Flat skill registries with dozens of entries create selection overhead that compounds as the library grows.

Invocation sequencing manages the order and parallelism of skill calls. Independent skills run in parallel; skills that depend on prior outputs are ordered correctly. The orchestration layer builds what amounts to a directed acyclic graph of skill executions for each task, with data dependencies as edges. This is what allows a complex SEO workflow, crawl the site, analyze the content, cross-reference keyword data, generate recommendations, to run as a coordinated sequence rather than a manually scripted chain.

Output routing determines where skill results go: into working memory for the next reasoning step, into a persistent memory tier for later retrieval, or downstream to another skill or agent. This is the mechanism that makes skill composition coherent rather than just sequential.

Does Adding More Skills to an Agent Always Improve Its Performance?

Adding more skills does not improve performance; past a certain point, it actively degrades it. Across thousands of benchmark trajectories, curated skills improved agent performance by an average of 16 percentage points. Self-generated skills averaged slightly negative, marginally worse than no skills. When one study increased the skill count to 202, average success rates dropped by up to 21 percentage points. In roughly 20% of tasks, providing a skill lowered performance compared with no skill at all.

The failure mode is not capability overload. It's routing confusion. More skills mean more candidates in the selection pool, more opportunity for the orchestrator to pick the wrong one, and more token overhead from skill descriptions that don't apply to the current task. The gains from adding a well-curated, domain-relevant skill are real. The gains from adding a marginally relevant skill to an already large library are often negative.

This is the argument for domain-specific skill marketplaces over general-purpose libraries. A focused set of SEO skills, correctly scoped and well-described, outperforms a sprawling library of loosely related capabilities on SEO tasks.

What Can an AI Agent Not Do Even With a Full Suite of Skills?

Skills expand what an agent can attempt. They don't eliminate the constraints that the underlying architecture and runtime impose.

An agent cannot guarantee factual accuracy. Hallucination is a property of the language model, not a gap in the skill library. A skill that instructs the agent to verify claims before outputting them reduces hallucination risk; it doesn't eliminate it. This is an architectural limitation, not a skill design problem.

An agent cannot act outside the tool permissions its runtime grants. If the runtime doesn't expose a publishing API, no skill can publish content. If web access isn't granted, no skill can retrieve live SERP data. The runtime is the ceiling, and skills operate below it.

An agent cannot self-modify its own architecture. It can write to memory tiers, generate new skill procedures in systems designed for that, and adapt its behavior within the bounds of its planning loop. It cannot rewrite its own transformer architecture, change its context window, or modify its instruction tuning.

An agent cannot durably learn without a memory tier that persists across sessions. An agent running on working memory alone resets completely on every invocation. Skill outputs written only to the context window are not retained. This is the most common misunderstanding in product claims about agent "learning": the learning claim requires a persistent memory backend, not just a capable skill.

Long-horizon autonomy remains genuinely fragile. Multi-day tasks, complex multi-source research workflows that require cross-referencing conflicting claims, and actions in production systems with complex identity and permission layers all expose failure modes that a full skill suite doesn't fix. The infrastructure for persistent memory, verification, and governance has to be in place before skill quality becomes the binding constraint.

How Does Understanding AI Agent Architecture Help You Choose the Right AI Agent Skills?

The architecture is a constraint map. It tells you which skills are reachable given your runtime environment, how they will be invoked given your framework's selection mechanism, and whether their outputs will persist given your memory configuration.

Before buying or building a skill, three questions resolve most of the architecture-driven risk. Does the target runtime expose the tools this skill requires? A SERP analysis skill that needs live web access fails silently on a runtime that doesn't grant it. Does the orchestration framework's selection logic reliably route tasks to this skill, or does it get confused by semantically similar skills already in the registry? Which memory tier does the skill write its outputs to, and does that match the persistence requirement for the workflow?

Framework compatibility is the most common point of failure when evaluating skill deployments. A skill built for LangChain's tool-use interface will not register correctly in CrewAI's role-based delegation model. AutoGPT's planning loop selects tools differently than a ReAct-based system does. Checking framework compatibility before committing to a skill is the architectural hygiene step most buyers skip.

The routing problem doesn't go away with a better skill. It gets managed with a better-designed skill registry, clear descriptions, distinct capability scopes, no overlapping function signatures that confuse the orchestrator. That's a design discipline, not a purchase decision. But it's the discipline that determines whether a curated skill library performs at the 16-percentage-point improvement level the research documents, or slides toward the 21-percentage-point degradation that comes from an unmanaged, oversized registry. Start with the constraint map: identify the runtime, the framework, and the memory architecture, then select skills that fit those constraints with precision rather than breadth.

Sources

Agentic AI: A Technical Overview , IBM, 2024, IBM Research / IBM Think.
Agent Runtime , Google Cloud, 2024, Google Cloud Architecture Center.
AI Agent Architecture From Patterns to Governance , Galileo, 2024, Galileo Blog.
What are Components of AI Agents? , IBM, 2024, IBM Think.
Agentic AI Architecture: Types, Components & Best Practices , Exabeam, 2024, Exabeam Explainers.
AI Agent Architecture 101: An enterprise guide , Akka, 2024, Akka Blog.
Agent Components , Prompt Engineering Guide, 2024, promptingguide.ai.
AI Agent Architecture , GeeksforGeeks, 2024, GeeksforGeeks.
Agent Architectures in AI , GeeksforGeeks, 2024, GeeksforGeeks.
What is an Intelligent Agent? , Stuart Russell and Peter Norvig, 2021, Pearson / Artificial Intelligence: A Modern Approach.
ReAct: Synergizing Reasoning and Acting in Language Models , Shunyu Yao, Jeffrey Zhao, Dian Yu, et al., 2022, arXiv / Princeton, Google Research, and collaborators.
Toolformer: Language Models Can Teach Themselves to Use Tools , Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, et al., 2023, arXiv / Meta AI and collaborators.
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge and tool use , William F. Wu, et al., 2022, arXiv.
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face , Jiawei Shen, Junlin Xie, Michael S. C. H. et al., 2023, arXiv.
Generative Agents: Interactive Simulacra of Human Behavior , Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, et al., 2023, arXiv / Stanford University.
Voyager: An Open-Ended Embodied Agent with Large Language Models , Wenqi Liang, Siheng Li, et al., 2023, arXiv / NVIDIA and collaborators.
BabyAGI: Autonomous Task Management System , yoheinakajima, 2023, GitHub.