Local AI in the Enterprise

What "local" actually means

The situation

Local AI sounds good. Data stays within the company, no cloud risk, full control. That was the theory from article one of this series.

But what does “local” actually mean in practice? What runs, where does it run, and what do you really need?

These are the questions that stand between interest and decision. And as long as they go unanswered, local AI stays a vague concept rather than a real option.

The problem

The word “local” triggers the same image in most decision-makers: a data center, an IT department of ten people, a project that takes a year and needs a budget that has to be approved first.

That image is wrong. But without the right image, no decision gets made. The topic gets pushed aside, people wait, and companies continue relying on cloud solutions whose data protection risks are well known.

The root cause

Almost all available content on local AI is aimed at developers. Step-by-step guides, code examples, technical comparisons. Useful for people who are setting it up. Not helpful for decision-makers who first need to understand what they are even evaluating.

The underlying concept is not hard to understand. It just needs the right words.

What can we take away from this?

Local AI consists of three building blocks. Anyone who understands these three building blocks has the foundation for every decision that follows.

Building block 1: The model

The model is what thinks. It is the actual AI that understands language, answers questions, and generates text.

With ChatGPT, that model is GPT-4o, running on OpenAI servers in the United States. With local AI, it is an open-source model running on your own hardware. The critical difference: the model and all the data it processes never leave your own network.

Open-source models are freely available and developed by major technology companies. Llama comes from Meta, Mistral from a French company running EU infrastructure, Qwen from Alibaba, Phi from Microsoft. These models come in different sizes, measured in parameters. A 7-billion-parameter model runs on modest hardware and handles simpler tasks. A 70-billion-parameter model requires significantly more resources but delivers results comparable to commercial models.

Which model suits which use case is covered in article 5 of this series.

Building block 2: The runtime environment

A model is initially just a large file. It needs software to run it, manage it, and make it accessible through an interface.

Ollama is today’s de facto standard for this. It is an open-source application that runs models locally and provides an interface identical to the OpenAI API. This means that anyone who has connected n8n, LangChain, or other automation tools to ChatGPT can in many cases switch to Ollama with minimal adjustments. The logic stays the same. The data stays internal.

Ollama is just the core. A complete local AI stack for enterprise use consists of multiple components working together: the model itself, an interface for employees, a system for processing internal documents, and an automation layer that integrates AI into existing processes. How these components connect and what a RAG system contributes is the subject of article 4 of this series.

Building block 3: The hardware

This is where the biggest misconception lives. Local AI does not need a data center.

What it needs depends on what it is being used for and how many people work with it simultaneously. A single employee using local AI for simple tasks can get by with a standard office computer. A company of 20 employees running AI-powered workflows in production needs a more stable foundation.

The most important hardware decision comes down to the graphics card. Local AI models run significantly faster on graphics cards than on standard processors. A dedicated graphics card with sufficient video memory is not optional for productive use, it is a prerequisite. Which configurations are realistic for which company sizes and what they cost is covered in article 6 of this series.

Test versus production: an important distinction

There is one distinction that needs to be understood before any investment is made.

Local AI can be tested on a normal computer. This is sensible and worthwhile, to understand how the system works and to validate first use cases. But a test setup is not a production system.

Running local AI for an entire company, with multiple users, stable performance, and IT security, requires a different foundation. The technology is the same. The requirements around stability, availability, security, and maintenance are not.

Planning for this from the start avoids a common trap: experiencing local AI in a test environment, underestimating what the step to production actually involves, and then either failing or spending significantly more than planned.

Own hardware or VPS: the fundamental decision

For production use, there are two realistic paths. Both are legitimate, and the right choice depends on the company.

Own server: Hardware in your own server room or data center. Full control over the infrastructure, no ongoing rental costs, one-time investment. This requires IT competence for operation, updates, and security. The right choice for companies that already run their own IT infrastructure and can accurately assess the requirements.

VPS with a German provider: A virtual private server with a provider whose data centers are located in Germany. I run my own VPS with Hetzner, a German provider whose infrastructure is based exclusively in Germany and Finland. Data does not leave the country, and GDPR compliance is guaranteed at the infrastructure level. No hardware purchase, no server room, monthly costs instead of a one-time investment. The trade-off is giving up some direct control and sharing physical hardware with other users of the platform.

Both paths can be implemented in a GDPR-compliant way when configured correctly. What that requires in practice is covered in a later article of this series.

The decision between the two paths comes down to three factors: How much IT capacity is available internally? How permanent is the planned operation? And how important is full physical control over the infrastructure?

What local AI always is: an architecture of three building blocks that you deliberately assemble. Model, runtime environment, hardware. Anyone who understands these three building blocks can ask the right questions before investing.

The next article in this series covers costs: what local AI actually costs, with concrete example calculations for different company sizes.

Does this sound familiar?

In many companies, this is exactly where unnecessary time losses and structural problems arise. Often this goes unnoticed for a long time — until projects start to stall.