Local AI Technology at makandra
Learn about the AI technologies we use to develop local AI applications for our clients. From the runtime environment to language models to retrieval architecture: We’ll show you what we work with and why.
Our AI Technology Stack
Ollama
We rely on Ollama as the runtime environment for our language models. It has become the de facto standard for getting started quickly with on-premises AI, letting us run open-source models locally without setting up complex infrastructure. Models can be swapped out at any time if a newer one suits the use case better – without locking us into a specific provider.
For scenarios with very high workloads and thousands of concurrent users, we are currently evaluating vLLM as an alternative. For most of our projects – internal AI features with manageable workload profiles – Ollama is the right choice.
Language Models
We currently rely on models from Qwen, Mistral (Ministral), and Google (Gemma). Depending on the task, we use different types of models: vision models for processing documents and images, embedding models for semantic search, and text models for generation and analysis – in a range of sizes, matched to performance requirements and available hardware.
Since the open-source model landscape is evolving rapidly, we continuously evaluate new releases. One practical advantage of on-premises AI is that the model in use can be swapped out at any time when a better one becomes available – without changing the rest of the application.
Document Processing
For projects that require automated document processing, we extend our tech stack with specialized tools. Tesseract OSD handles optical character recognition for scanned or rotated PDFs, and Sablon is used when existing Word templates need to be populated automatically without altering their layout.
Search & Retrieval
Depending on the requirements, we use different retrieval approaches. For pure vector search, we rely on pg_vector, a PostgreSQL extension that enables semantic similarity search directly within the database. When traditional full-text search is also needed, we combine both approaches using OpenSearch. Our embedding model is mxbai-embed-large, which also runs locally via Ollama.
Evaluation and Quality Assurance
We typically test multiple models and prompt variations against each other, run numerous test cycles with real data, and refine them iteratively. For us, this is a continuous process that spans the entire development cycle and is essential to ensuring an AI application works reliably in everyday use.
Why Open-Source Models?
Our AI applications are built on open-source language models and run entirely on our own infrastructure – either on-site at the customer's premises or on dedicated servers in German data centers. Today's open-source models are powerful enough for the majority of enterprise applications.
Data never leaves the customer's own infrastructure, there is no dependency on external API providers, and costs stay predictable. When a better model becomes available, it can be deployed without changing the architecture – an advantage that carries real weight in a market moving at this pace.
Hosting Options
AI use cases from our projects
The most common use cases for our AI applications revolve around four key areas: understanding, generating, analyzing, and rephrasing content. In our brochure, you'll find our latest AI projects across a range of industries.
Aerospace: Automatic detection, extraction, and structuring of relevant information from supplier PDFs
Communications Consulting: A platform that generates initial drafts of press releases, internal memos, and investor updates based on case-specific information.
Biotechnology: A knowledge platform that analyzes past email inquiries and suggests draft responses to new ones based on verified solutions.
Talk to us
Do you have an idea for an application and need technical support? Tell us about your project, and we'll discuss your options.

