Local AI Technology at makandra

Learn about the AI technologies we use to develop local AI applications for our clients. From the runtime environment to language models to retrieval architecture: We’ll show you what we work with and why.

Our AI Technology Stack

Ollama

We rely on Ollama as the runtime environment for our language models. It has become the de facto standard for getting started quickly with on-premises AI, letting us run open-source models locally without setting up complex infrastructure. Models can be swapped out at any time if a newer one suits the use case better – without locking us into a specific provider.

For scenarios with very high workloads and thousands of concurrent users, we are currently evaluating vLLM as an alternative. For most of our projects – internal AI features with manageable workload profiles – Ollama is the right choice.

Language Models

We currently rely on models from Qwen, Mistral (Ministral), and Google (Gemma). Depending on the task, we use different types of models: vision models for processing documents and images, embedding models for semantic search, and text models for generation and analysis – in a range of sizes, matched to performance requirements and available hardware.

Since the open-source model landscape is evolving rapidly, we continuously evaluate new releases. One practical advantage of on-premises AI is that the model in use can be swapped out at any time when a better one becomes available – without changing the rest of the application.

Document Processing

For projects that require automated document processing, we extend our tech stack with specialized tools. Tesseract OSD handles optical character recognition for scanned or rotated PDFs, and Sablon is used when existing Word templates need to be populated automatically without altering their layout.

Search & Retrieval

Depending on the requirements, we use different retrieval approaches. For pure vector search, we rely on pg_vector, a PostgreSQL extension that enables semantic similarity search directly within the database. When traditional full-text search is also needed, we combine both approaches using OpenSearch. Our embedding model is mxbai-embed-large, which also runs locally via Ollama.

Evaluation and Quality Assurance

We typically test multiple models and prompt variations against each other, run numerous test cycles with real data, and refine them iteratively. For us, this is a continuous process that spans the entire development cycle and is essential to ensuring an AI application works reliably in everyday use.

RAG: The Architectural Pattern

RAG is the architectural pattern we use in most of our projects. Rather than responding based on its trained knowledge, the language model draws on documents retrieved from a knowledge base in response to each query. This lets us work with large, evolving document collections without having to retrain the model. We built our RAG implementation in-house so we could tailor it precisely to specific requirements.

In practice, we have used RAG for internal knowledge platforms, for answering support inquiries automatically, and for extracting structured information from supplier documents.

More about RAG

Why Open-Source Models?

Our AI applications are built on open-source language models and run entirely on our own infrastructure – either on-site at the customer's premises or on dedicated servers in German data centers. Today's open-source models are powerful enough for the majority of enterprise applications.

Data never leaves the customer's own infrastructure, there is no dependency on external API providers, and costs stay predictable. When a better model becomes available, it can be deployed without changing the architecture – an advantage that carries real weight in a market moving at this pace.

Hosting Options

On-premises

The application runs entirely on your own infrastructure. Data never leaves your company. This is the right choice for companies with high security requirements – such as those handling sensitive personal data or operating in IP-critical areas. We’d be happy to assist you with the on-premises setup.

Regional Data Center

The AI application runs on dedicated servers managed by makandra in a German data center. The hosting setup can be tailored to your needs, and data stays within the region. This option suits companies that want to avoid clouds from international providers but would rather not operate their own server infrastructure.

Microsoft Azure

Microsoft enables companies to run the latest OpenAI models within EU data centers, offering scalable infrastructure and rapid project deployment without the need for their own hardware investments. This option is a good fit for companies that are already deeply integrated into the Microsoft ecosystem.

AI use cases from our projects

The most common use cases for our AI applications revolve around four key areas: understanding, generating, analyzing, and rephrasing content. In our brochure, you'll find our latest AI projects across a range of industries.

Aerospace: Automatic detection, extraction, and structuring of relevant information from supplier PDFs

Communications Consulting: A platform that generates initial drafts of press releases, internal memos, and investor updates based on case-specific information.

Biotechnology: A knowledge platform that analyzes past email inquiries and suggests draft responses to new ones based on verified solutions.

makandra AI magazine

Talk to us

Do you have an idea for an application and need technical support? Tell us about your project, and we'll discuss your options.