Many small and medium-sized businesses (SMBs) send out invoices daily – but often lack a clear overview of their financial situation. Calculating key metrics like ARR, MRR, CLV or Churn is nearly impossible without structured data. Financial analysis tools are often expensive, complex, or simply not built for smaller businesses.

Let me show you how to solve this with a simple, smart AI agent. With a user-friendly chat interface, you can query your financial system – and get real insights in seconds.

Here you can see the flow of the agent that prepare the data:

And here you can see our final chatbot, that can answer financial questions:

🔧 Which technologies do we use in our AI Finance Agent?

  • n8n as the automation framework
  • JavaScript for lightweight formatting and data preparation
  • Google Drive to store incoming invoices
  • ChatGPT (LLM) for text recognition and data extraction
  • Pinecone as the vector database

Don’t worry – even if you’re not familiar with some of these tools: I’ll guide you step by step. I’ll also suggest possible alternatives along the way.

1. Automating Invoice Preprocessing

📁 Google Drive Trigger
New invoices are placed in a specific folder. A trigger in n8n monitors changes. Important: Enable the Google Drive API and authorize access via the Google Console.

⬇️ Download the file
The trigger only detects a change – the next node downloads the actual file.

📄 Text extraction
If it’s a real PDF (not just an image), we can extract text directly. OCR is only needed for image-based PDFs.

By the way, the test PDF I used was created using ChatGPT itself:

Text of the invoice is in german, but I hope you can understand it 😉

2. Structuring Data & Preparing for LLMs

We extract two things:

  • Metadata like customer name, amount, date → via ChatGPT
  • Semantically enriched long text → also via ChatGPT

Example prompt for metadata extraction:

Extract the following fields from this text:

  • typ: Always “rechnung”
  • kunde: Customer name
  • tag: 2-digit
  • monat: 2-digit (01 to 12)
  • jahr: 4-digit (e.g. 2024)
  • productname
  • betrag: Net amount in EUR as number (use total if multiple)
    TEXT:
    {{ $json[“text”] }}

Second prompt (for optimized vector storage):

You receive a text extracted from an invoice. This needs to be cleaned and rewritten so that it can be saved in a vector database and used for chatbot queries.

Please:

  1. Remove all irrelevant special characters.
  2. Replace /n with a space.
  3. Convert the text to the following structure:
    Invoice Number:
    Invoice Date:
    Customer Name:
    Product:
    Total Amount:

3. Merge & Save the Data

Using an n8n Merge node, we combine:

  • The JSON with metadata
  • The cleaned and rewritten text

We format the data for Pinecone. Here’s an example structure stored in the vector DB:

{
“id”: “unique-id”,
“text”: “Invoice of 1,200 € from Max Müller GmbH…”,
“metadata”: {
“kunde”: “Max Müller GmbH”,
“monat”: “05”,
“jahr”: “2024”
}
}

4. Using Pinecone as a Vector Database

  • Free plan available – ideal for trying RAG use cases without entering credit card info
  • Seamless integration with n8n
  • Well documented and reliable

We use OpenAI’s text-embedding-3-small model (1536 dimensions).

👉 Tip: Choose your text splitter wisely. I use a character-based splitter with . as separator, chunk size 500, overlap 100. Experiment to find the best fit for your AI agent.

Important: Metadata is not used for semantic search – but it helps you filter results, e.g. only invoices from 2024.

After a successful run, here’s what we see in n8n executions:

You’ll typically get 1 chunk per invoice in Pinecone. That makes the search precise and fast.

5. Chatbot with Real Invoice Access

We now create a chatbot flow in n8n – using Telegram, Webhook, or your own UI.

Here’s a system prompt for the finance AI agent:

You are an AI assistant for a CFO.

Your task is to provide financial insights based on invoice data. Use the vector store to find relevant context.

If no useful data is found, reply: “Insufficient information available.”

Answer precisely and briefly, in German.

For queries like “how many”, “count”, etc., explain that answers are based on semantic context, not exact sums.

We also add simple memory so the LLM remembers previous context.

We link this agent to our Pinecone vector store using the same model and index we used for ingestion.

🎉 Done! Your chatbot can now answer real financial questions using your invoices.

🔍 Example query:

“We want to calculate the average ARR for the customer Epsilon Systems. Analyze all invoices from this customer and calculate the average MRR. Multiply by 12 to get the ARR.”

🔎 Result:

“The estimated average annual recurring revenue (ARR) for Epsilon Systems is 3,779.22 EUR.”

We uploaded two invoices:

  • 489.90 EUR (due November 2024)
  • 139.97 EUR (due August 2024)

When we asked, “How did you calculate that?”, here’s what our finance AI replied:

🧮 Step-by-step Calculation:

  1. Invoice review:
    • Invoice 1: 139.97 EUR (Aug 2024)
    • Invoice 2: 489.90 EUR (Nov 2024)
  2. Monthly distribution:
    Treated as representative of separate months.
  3. Average MRR:
    (139.97 + 489.90) / 2 = 314.935 EUR
  4. ARR:
    314.935 * 12 = 3,779.22 EUR

Note: This is a rough estimate assuming monthly recurrence.

🚀 Ready to build your own AI finance agent?

  • It runs locally with n8n
  • OpenAI costs are minimal
  • Easy to build, even without deep tech knowledge

Here you can see how this flow of AI-Agent works:

🤝 Want to build your own? I’d love to help – feel free to reach out!


Leave a Reply

Your email address will not be published. Required fields are marked *