Skip to main content
This tutorial provides examples of how to perform retrieval using the PageIndex tree structure. Tree search enables intelligent navigation through document hierarchies to find relevant content. A simple strategy is to use an LLM agent to conduct tree search. The LLM analyzes the document tree structure and identifies relevant nodes based on the query.

Implementation

def tree_search(query: str, tree_structure: dict) -> list[str]:
    prompt = f"""
    You are given a query and the tree structure of a document.
    You need to find all nodes that are likely to contain the answer.
    
    Query: {query}
    
    Document tree structure: {tree_structure}
    
    Reply in the following JSON format:
    {{
      "thinking": <your reasoning about which nodes are relevant>,
      "node_list": [node_id1, node_id2, ...]
    }}
    """
    
    response = llm.generate(prompt)
    result = json.loads(response)
    return result['node_list']

# Example usage
relevant_nodes = tree_search(
    query="What were the key financial highlights in Q4?",
    tree_structure=pageindex_tree
)
In our dashboard and retrieval API, we use a combination of LLM tree search and value function-based Monte Carlo Tree Search (MCTS). More details will be released soon.

Integrating User Preference or Expert Knowledge

Unlike vector-based RAG where integrating expert knowledge or user preference requires fine-tuning the embedding model, in PageIndex, you can incorporate user preferences or expert knowledge by simply adding knowledge to the LLM tree search prompt.

Implementation Pipeline

1

Preference Retrieval

When a query is received, the system selects the most relevant user preference or expert knowledge snippets from a database or a set of domain-specific rules. This can be done using keyword matching, semantic similarity, or LLM-based relevance search.
def retrieve_preferences(query: str, preference_db: dict) -> str:
    """
    Retrieve relevant expert preferences based on the query.
    Can use keyword matching, semantic search, or LLM-based relevance.
    """
    # Example: Simple keyword matching
    preferences = []
    query_lower = query.lower()
    
    for topic, preference in preference_db.items():
        if any(keyword in query_lower for keyword in preference['keywords']):
            preferences.append(preference['text'])
    
    return '\n'.join(preferences) if preferences else None

# Example preference database
preference_db = {
    'ebitda': {
        'keywords': ['ebitda', 'adjustments', 'earnings'],
        'text': 'If the query mentions EBITDA adjustments, prioritize Item 7 (MD&A) and footnotes in Item 8 (Financial Statements) in 10-K reports.'
    },
    'risk_factors': {
        'keywords': ['risk', 'risk factors', 'uncertainties'],
        'text': 'For risk-related queries, focus on Item 1A (Risk Factors) and Item 7A (Quantitative and Qualitative Disclosures About Market Risk).'
    }
}

# Retrieve relevant preferences
preferences = retrieve_preferences(
    query="What are the EBITDA adjustments?",
    preference_db=preference_db
)
2

Tree Search with Preference

Integrate the retrieved preference into the tree search prompt to guide the LLM’s node selection.
def tree_search_with_preference(
    query: str, 
    tree_structure: dict, 
    preference: str
) -> list[str]:
    prompt = f"""
    You are given a question and a tree structure of a document.
    You need to find all nodes that are likely to contain the answer.
    
    Query: {query}
    
    Document tree structure: {tree_structure}
    
    Expert Knowledge of relevant sections: {preference}
    
    Reply in the following JSON format:
    {{
      "thinking": <reasoning about which nodes are relevant>,
      "node_list": [node_id1, node_id2, ...]
    }}
    """
    
    response = llm.generate(prompt)
    result = json.loads(response)
    return result['node_list']

# Example usage
relevant_nodes = tree_search_with_preference(
    query="What are the EBITDA adjustments for 2023?",
    tree_structure=pageindex_tree,
    preference="If the query mentions EBITDA adjustments, prioritize Item 7 (MD&A) and footnotes in Item 8 (Financial Statements) in 10-K reports."
)

Example Expert Preference

If the query mentions EBITDA adjustments, prioritize Item 7 (MD&A) and footnotes in Item 8 (Financial Statements) in 10-K reports.
By integrating user or expert preferences, node search becomes more targeted and effective, leveraging both the document structure and domain-specific insights.

Complete Example

Here’s a complete example that combines tree search with preference integration:
import json
from typing import Optional

class TreeSearchWithPreference:
    def __init__(self, llm, pageindex_client, preference_db):
        self.llm = llm
        self.pageindex = pageindex_client
        self.preference_db = preference_db
    
    def retrieve_preferences(self, query: str) -> Optional[str]:
        """Retrieve relevant expert preferences based on the query"""
        preferences = []
        query_lower = query.lower()
        
        for topic, preference in self.preference_db.items():
            if any(keyword in query_lower for keyword in preference['keywords']):
                preferences.append(preference['text'])
        
        return '\n'.join(preferences) if preferences else None
    
    def search(self, query: str, doc_id: str) -> list[str]:
        """Perform tree search with optional preference integration"""
        # Get document tree structure
        tree_structure = self.pageindex.get_tree(doc_id)
        
        # Retrieve relevant preferences
        preference = self.retrieve_preferences(query)
        
        # Construct prompt
        if preference:
            prompt = f"""
            You are given a question and a tree structure of a document.
            You need to find all nodes that are likely to contain the answer.
            
            Query: {query}
            
            Document tree structure: {tree_structure}
            
            Expert Knowledge of relevant sections: {preference}
            
            Reply in the following JSON format:
            {{
              "thinking": <reasoning about which nodes are relevant>,
              "node_list": [node_id1, node_id2, ...]
            }}
            """
        else:
            prompt = f"""
            You are given a query and the tree structure of a document.
            You need to find all nodes that are likely to contain the answer.
            
            Query: {query}
            
            Document tree structure: {tree_structure}
            
            Reply in the following JSON format:
            {{
              "thinking": <your reasoning about which nodes are relevant>,
              "node_list": [node_id1, node_id2, ...]
            }}
            """
        
        # Get LLM response
        response = self.llm.generate(prompt)
        result = json.loads(response)
        
        return result['node_list']

# Example usage
preference_db = {
    'ebitda': {
        'keywords': ['ebitda', 'adjustments', 'earnings'],
        'text': 'If the query mentions EBITDA adjustments, prioritize Item 7 (MD&A) and footnotes in Item 8 (Financial Statements) in 10-K reports.'
    }
}

searcher = TreeSearchWithPreference(
    llm=llm_client,
    pageindex_client=pageindex,
    preference_db=preference_db
)

# Search with automatic preference integration
relevant_nodes = searcher.search(
    query="What are the EBITDA adjustments for Q4 2023?",
    doc_id="doc_abc123"
)

print(f"Found {len(relevant_nodes)} relevant nodes: {relevant_nodes}")

Benefits of Tree Search with Preferences

  • No Model Fine-tuning: Unlike vector-based RAG, preferences are integrated directly into prompts
  • Dynamic Updates: Expert knowledge can be updated without retraining models
  • Transparent Reasoning: LLM provides explicit reasoning for node selection
  • Domain Expertise: Leverages document-specific knowledge and user preferences
  • Flexible Integration: Supports multiple preference types (expert rules, user history, domain guidelines)

Need Help?

Contact us if you need any advice on conducting document searches for your use case.