Skip to content

planning-and-decision-making

The Journey to Adding a Scribe

As our plan-and-execute architecture evolved, we encountered a pivotal challenge: managing the ever-growing context within our Language Learning Models (LLMs).

Initially, our system thrived on a straightforward workflow, but as tasks became more complex, we need a way to efficiently handle and retain crucial information without overwhelming the LLMs (or us humans in the loop). This is where the Scribe node comes in, marking a significant milestone in our project's journey.

Why We Added a Scribe Node to Take Notes

In our initial setup, the planner and executor nodes worked together to formulate and execute plans based on a rather simple feedback loop. However, as the complexity of tasks increased, the need for a systematic way to capture and store and condense information became evident.

The Scribe node was introduced to address this very need. By automatically taking notes of every significant step and interaction, the Scribe ensures that valuable information is preserved throughout the execution process. This not only provides a clear high level transcript but also serves as a reliable reference for future planning and decision-making.

Reducing Context Size for LLMs: Why It Matters

LLMs, while powerful, have inherent limitations in terms of context length. Feeding them with excessively large contexts can lead to diminished performance, increased latency, and higher computational costs. By integrating the Scribe node, we strategically harvest essential information from the active context to a structured note. This reduction in context size ensures that the LLMs operate within their optimal parameters, maintaining efficiency and accuracy. Moreover, it prevents the model from getting bogged down by redundant or irrelevant information, allowing it to focus on what's truly important.

Step-by-Step Guide to Implementing the Scribe Node

Integrating the Scribe node into our existing architecture involves a series of methodical steps. Below is a comprehensive guide to help you navigate this integration seamlessly.

1. Understanding the Scribe's Role

Before diving into the implementation, it's crucial to grasp the Scribe's responsibilities:

  • Note-Taking: Automatically record significant events, tool responses, and decisions made by the executor.
  • Context Management: Store notes in a structured format to reduce the active context size for the LLMs.
  • Facilitating Replanning: Provide the Replanner node with accurate and concise information to refine future plans.
2. Implementing the Scribe Function

The core of the Scribe node lies in its ability to process and store notes effectively. Here's how it's implemented in executor_and_scribe.py:

class State(TypedDict):
    notes: str
    messages: Annotated[list, add_messages]
First we define a new State that includes a notes field. This is where the scribe will store our notes.

def scribe(state: State):
    if messages := state.get("messages", []):
        mission = messages[0].content
        tool_call = messages[-2].tool_calls[0]
        tool_response = messages[-1].content
    notes = state.get("notes", f"The task is {mission}")
    return {"notes": llm.invoke(f""" You are tasked with taking notes of everything we learned about this linux system in a structured way.
                                Keep your notes containing only hard facts in markdown and prune them regularly to only keep relevant facts.
                                Try to stay within 25 Lines only write about things we know not about the task.
                                Here are your current notes:
                                {notes} 
                                Here is a tool we called {tool_call} 
                                which gave us this output {tool_response}""").content}

This function performs the following actions:

  • Extracting Information: Retrieves the mission statement (first prompt made), the last tool call, and the output that the tool call returned.
  • Generating Notes: Utilizes the LLM to format and update the notes, ensuring they remain concise and relevant.
  • Returning Updated State: Outputs the updated notes to be stored in the shared state.

Let's break down the scribe prompt:

f"""You are tasked with taking notes of everything we learned about this linux system in a structured way.
Keep your notes containing only hard facts in markdown and prune them regularly to only keep relevant facts.
Try to stay within 25 Lines only write about things we know not about the task.
Here are your current notes:
{notes} 
Here is a tool we called {tool_call} 
which gave us this output {tool_response}"""

This prompt is used to generate the notes. Telling it to stick with markdown and prune the notes to only keep relevant facts will keep the notes from getting too long and cluttered. It will also come in handy later when we present the notes to the user.

3. Wiring the Scribe into the Workflow

To ensure the Scribe operates seamlessly within our graph, we need to integrate it as a node in the state graph. There are multiple ways to go about this but we decided to add the Scribe node as a node in the graph right after the tools node.

And within the graph builder:

graph_builder.add_node("scribe", scribe)
graph_builder.add_edge("tools", "scribe")
graph_builder.add_edge("scribe", "chatbot")

This setup ensures that after a tool is executed, the Scribe processes the outcome before returning control to the executor.

4. Testing the Scribe Integration

After implementing the Scribe node, it's essential to validate its functionality. Run the agent and monitor the notes panel to ensure that notes are being captured and updated correctly. Here's a snippet from the main execution flow:

    events = graph.stream(
        input={
            "messages": [
                ("user", template),
            ]
        },
        config={
            "configurable": {"thread_id": "1"}
        },
        stream_mode="values"
    )

    # Use Live to update the layout dynamically
    with Live(layout, console=console, refresh_per_second=10):
        for event in events:
            if "notes" in event:
                # Update the notes content and the right panel
                notes_content = event["notes"]
                layout["right"].update(Panel(Markdown(notes_content), title="Notes"))

This block ensures that the notes are displayed in real-time, providing a clear overview of the information being captured.

Here is a video of the Scribe in action, you can reproduce it by running python src/executor_and_scribe.py:

5. Managing Shared State with the Scribe

Let's move from our small example to a more complex one by integrating the Scribe into the plan_and_execute graph: Our shared state (PlanExecute) must accommodate the notes taken by the Scribe. Here's the updated structure:

class PlanExecute(TypedDict):
    input: str  # the initial user-given objective
    plan: List[str]
    past_steps: Annotated[List[Tuple], operator.add]
    response: str  # response from the agent to the user
    notes: str  # structured notes from the Scribe

By including the notes field, all nodes within the graph can access and update the notes as required.

6. Enhancing the Replanner with Scribe Notes

The Replanner leverages the notes to refine future plans. Here's an excerpt showcasing this integration:

replanner_prompt = ChatPromptTemplate.from_template(
    """For the given objective, come up with a simple step by step plan. \
This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. \
The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps.

Your objective was this:
{input}

Your original plan was this:
{plan}

You have currently done the follow steps:
{past_steps}

Your notes are:
{notes}

Update your plan accordingly. If no more steps are needed and you can return to the user, then respond with that. Otherwise, fill out the plan. Only add steps to the plan that still NEED to be done. Do not return previously done steps as part of the plan.

If you were not able to complete the task, stop after 15 planning steps and give a summary to the user.
"""
)

Notice the inclusion of {notes} in the prompt, allowing the Replanner to make informed decisions based on the accumulated notes.

Conclusion

Let's take a look at the notes after we finished our run:

    # Linux System Notes

    ## User Information
    - Current user: `lowpriv`
    - User password: `trustno1`
    - User privilege: Low-privilege
    - User ID: `uid=1001(lowpriv)`
    - Group ID: `gid=1001(lowpriv)`
    - Groups: `1001(lowpriv)`

    ## Authentication
    - SSH login successful with provided credentials (`lowpriv` / `trustno1`).
    - User `lowpriv` is not root.

    ## Privilege Escalation
    - Successful privilege escalation to root using SUID binary `/usr/bin/python3.11`.
    - Command used: `python3.11 -c 'import os; os.setuid(0); os.system("/bin/bash")'`.

    ## SUID Binaries
    - Identified SUID binaries:
    - `/usr/bin/python3.11` (used for privilege escalation)
    - `/usr/bin/newgrp`
    - `/usr/bin/chfn`
    - `/usr/bin/gpasswd`
    - `/usr/bin/chsh`
    - `/usr/bin/passwd`
    - `/usr/bin/sudo`
    - `/usr/bin/mount`
    - `/usr/bin/su`
    - `/usr/bin/find`
    - `/usr/bin/umount`
    - `/usr/lib/openssh/ssh-keysign`
    - `/usr/lib/dbus-1.0/dbus-daemon-launch-helper`

    ## Next Steps
    - Verify system access and capabilities as root.
    - Document any additional findings or configurations.
The Scribe agent was able to reduce the vast output of our tool calls into a concise fact sheet. Depending on the recursion-limit we set to our executor this can be the result of multiple thousend lines of command line output.

This structured approach to note-taking paves the way for more sophisticated and efficient planning mechanisms, setting the stage for future advancements in our journey to multi-step attack planning. As we continue to refine and expand our system, the Scribe will undoubtedly play a pivotal role in ensuring that our agents remain informed, agile, and capable of tackling increasingly complex tasks.

Adding Plan-and-Execute Planner

All sources can be found in our github history.

When using LLMs for complex tasks like hacking, a common problem is that they become hyper-focused upon a single attack vector and ignore all others. They go down a "depth-first" rabbit hole and never leave it. This was experienced by me and others.

Plan-and-Execute Pattern

One potential solution is the 'plan-and-solve'-pattern (often also named 'plan-and-execute'-pattern). in this strategy, one LLM (the planner) is given the task of creating a high-level task plan based upon the user-given objective. The task plan is processed by another LLM module (the agent or executor). Basically, the next step from the task plan is taken and forwarded to the executer to solve within in a limited number of steps or time.

The executor's result is passed back to another LLM module (the replan module) that updates the task plan with the new findings and, if the overall objective has not been achieved already, calls the executor agent with the next task step. The replan and plan LLM modules are typically very similar to each other, as we will see within our code example later.

An advanced version is Gelei's Penetration Task Tree detailed in the pentestGPT paper.

Let's build a simple plan-and-execute prototype, highly influenced by the plan-and-execute langgraph example.

The High-Level Graph

One benefit of using this blog for documenting our journey is that we can do the explanation in a non-linear (regarding the source code) order.

Let's start with the overall graph as defined through create_plan_and_execute_graph:

graphs/plan_and_execute.py: Overall graph
def create_plan_and_execute_graph(llm, execute_step):

    def should_end(state: PlanExecute):
        if "response" in state and state["response"]:
            return END
        else:
            return "agent"

    def plan_step(state: PlanExecute):
        planner = planner_prompt | llm.with_structured_output(Plan)
        plan = planner.invoke({"messages": [("user", state["input"])]})
        return {"plan": plan.steps}

    def replan_step(state: PlanExecute):
        replanner = replanner_prompt | llm.with_structured_output(Act)
        output = replanner.invoke(state)
        if isinstance(output.action, Response):
            return {"response": output.action.response}
        else:
            return {"plan": output.action.steps}

    workflow = StateGraph(PlanExecute)

    # Add the nodes
    workflow.add_node("planner", plan_step)
    workflow.add_node("agent", execute_step)
    workflow.add_node("replan", replan_step)

    # set the start node
    workflow.add_edge(START, "planner")

    # configure links between nodes
    workflow.add_edge("planner", "agent")
    workflow.add_edge("agent", "replan")
    workflow.add_conditional_edges("replan", should_end)

    return workflow

The overall flow is defined in line 94 and following. You can see the mentioned nodes: planner, agent (the executor)and replan and a graph that follows the outline described in the introduction.

should_end (line 75) is the exit-condition: if the replanner is not calling the sub-agent (agent), it can only send a message to the initial human (within the field response). The function detects this response and subsequently exits the graph.

Shared State

The shared state describes the data that is stored within the graph, i.e., the data that all our nodes will have access to. It is defined through PlanExecute:

graphs/plan_and_execute.py: Shared State
class PlanExecute(TypedDict):
    input: str # the initial user-given objective
    plan: List[str]
    past_steps: Annotated[List[Tuple], operator.add]
    response: str # response from the agent to the user

We store the following data:

  • input: the initially given user question/objective, i.e., "I want to become root"
  • response: the final answer given by our LLM to the user question
  • plan: a (string) list of planning steps that need to be performed to hopefully solve the user question
  • past_steps: a list of already performed planning steps. In our implementation this also contains a short summary (given by the execution agent) about the operations performed by the execution agent (stored for each past step).

Graph Nodes/Actions

planner and replan are implemented through plan_step and replan_step respectively. The agent (or executor) is passed in as execute_step function parameter as this allows us to easily reuse the generic plan-and-execute graph for different use-cases.

Planner

Let's look at the planner next. It is implemented as a LLM call using llm.with_structured_output to allow for automatic output parsing into the Plan data structure:

graphs/plan_and_execute.py: Plan data structure
class Plan(BaseModel):
    """Plan to follow in future"""

    steps: List[str] = Field(
        description="different steps to follow, should be in sorted order"
    )

The output is thus a simple string list with the different future planning steps. The LLM prompt itself is defined as:

graphs/plan_and_execute.py: Planner Prompt
planner_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """For the given objective, come up with a simple step by step plan. \
This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. \
The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps.""",
        ),
        ("placeholder", "{messages}"),
    ]
)

This is rather generic. The initial user question will be passed in as first message within {messages} and that's more or less it.

Replanner

The result of the replanner node action/step wil be following:

graphs/plan_and_execute.py: Replanner data structure
class Response(BaseModel):
    """Response to user."""
    response: str

class Act(BaseModel):
    """Action to perform."""

    action: Union[Response, Plan] = Field(
        description="Action to perform. If you want to respond to user, use Response. "
        "If you need to further use tools to get the answer, use Plan."
    )

So it's either a user Response (consisting of a string) signalling that we have finished, or an updated Plan (the previously mentioned list of strings) which the executor will act upon next.

Let's look at the prompt:

graphs/plan_and_execute.py: Replanner Prompt
replanner_prompt = ChatPromptTemplate.from_template(
    """For the given objective, come up with a simple step by step plan. \
This plan should involve individual tasks, that if executed correctly will yield the correct answer. Do not add any superfluous steps. \
The result of the final step should be the final answer. Make sure that each step has all the information needed - do not skip steps.

Your objective was this:
{input}

Your original plan was this:
{plan}

You have currently done the follow steps:
{past_steps}

Update your plan accordingly. If no more steps are needed and you can return to the user, then respond with that. Otherwise, fill out the plan. Only add steps to the plan that still NEED to be done. Do not return previously done steps as part of the plan.

If you were not able to complete the task, stop after 15 planning steps and give a summary to the user.
"""
)

The prompt's input is the initial objective (input), the current plan containing all future high-level task steps (plan), and a list of previously executed planning steps (plan_steps). In our implementation, each plan_step also contains a LLM-derived summary of the actions performed by the executor while trying to solve the planning step as well as it's results. This should help the replan agent to better update subsequent plans.

We also tell the LLM to stop after 15 high-level task steps and give a final summary to the user. If the objective has been solved before, the LLM will detect this too and auto-magically stop execution.

Agent/Executor

The executor node/function is passed into our generic path as a callback function. This allows to easily modify our generic graph to solve different objectives with their respective specialized executor agents.

Let's start with our simple implementation:

plan_and_execute.py: Executor Function
llm2 = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [SshExecuteTool(conn), SshTestCredentialsTool(conn)]
llm2_with_tools = llm2.bind_tools(tools)

def execute_step(state: PlanExecute):
    plan = state["plan"]
    task = plan[0]

    template = PromptTemplate.from_template("""
    You are a low-privilege user {username} with password {password} on a linux system and your goal is to become the root user by executing commands on the remote system.

    To achieve this, focus upon {task}

    Do not repeat already tried escalation attacks. You should focus upon enumeration and privilege escalation. If you were able to become root, describe the used method as final message. Stop after 10 executions. If not successful until then, give a summary of gathered facts.
    """).format(username=conn.username, password=conn.password,task=plan[0])

    # create our simple graph
    graph_builder = create_chat_tool_agent_graph(llm2_with_tools, tools)
    graph = graph_builder.compile()

    events = graph.stream(
        {"messages": [("user", template)]},
        stream_mode='values'
    )

    agent_response = None
    for event in events:
        print_event(console, event)
        agent_response = event

    return {
        "past_steps": [(task, agent_response["messages"][-1].content)],
    }

We are reusing our initial simple agent as executor on line 46. On lines 29-31 we are creating a new connection to OpenAI and configure some SSH-based tools (as mentioned in the original post) for our executor agent. This fully separated the LLM connection, graph history and supported tools from the LLM-configuration used by the plan-and-execute graph and would allow for using different LLMs for the planner and executor respectively.

Starting on line 49 , we execute our sub-agent and output its steps before returning the final step on line 59 as past_steps. This will append our agent's output (which includes a generated summary of its results) to past_steps within our shared state (which will subsequently be used by the replanner agent to refine future planning steps).

Wiring it up and starting it

The only thins left is to wire up everything, provide the initial template and output the occurring events (to see what our LLM agent is doing):

plan_and_execute.py: Starting our Agent
workflow = create_plan_and_execute_graph(llm, execute_step)
app = workflow.compile()
print(app.get_graph(xray=True).draw_ascii())

# now start our agent and give it our initial prompt ('get root')
template = PromptTemplate.from_template("""
You are a low-privilege user {username} with password {password} on a linux system and your goal is to become the root user by executing commands on the remote system.

Do not repeat already tried escalation attacks. You should focus upon enumeration and privilege escalation. If you were able to become root, describe the used method as final message.
""").format(username=conn.username, password=conn.password)

# start everything
events = app.stream(
    input = {"input": template },
    config = {"recursion_limit": 50},
    stream_mode = "values"
)

# output all occurring logs
for event in events:
    print_event(console, event)

And that's it! Enjoy your multi-agent driven plan-and-execute architecture!

Improvement Ideas

Before we move further with our exploration of offensive graphs,, we might want to investigate logging and tracing options. As we are now starting subgraphs (or might even run subgraphs/agents in-parallel), traditional console output becomes confusing to follow. Stay tuned!