Arktis-1: Privacy-Focused AI Assistant

Self-hosted AI assistant powered by Qwen 8B running on Swedish infrastructure via RunPod GPU. Features bilingual Swedish/English support, rate limiting, GPU queue management, and a privacy-first architecture with no external data sharing.

Qwen 8BOllamaRunPod GPUNext.jsTypeScriptSSH TunnelingRate LimitingSwedish Infrastructure

The Challenge

Organizations and privacy-conscious users need access to powerful AI assistants without sending their data to external cloud providers. European data sovereignty requirements and GDPR compliance add complexity, while existing self-hosted solutions often lack user-friendly interfaces, robust production features, and customization for specific use cases like Swedish language and Nordic business contexts.

•Cloud AI services require sending sensitive data to third-party providers
•European organizations need GDPR-compliant AI solutions hosted within the EU
•Base models lack optimization for Swedish language nuances and Nordic business terminology
•Self-hosted LLM solutions often lack production-grade features like rate limiting and queue management
•Running GPU-accelerated AI models requires specialized infrastructure knowledge

Our Solution

We built Arktis-1 as a privacy-first AI assistant running on Swedish infrastructure, fine-tuned specifically for Swedish language and Nordic business contexts. Using LoRA (Low-Rank Adaptation), we trained the model on curated Swedish conversational data and business terminology directly on our RunPod A40 GPU. The system uses the fine-tuned Qwen 8B model via Ollama, connected through secure SSH tunneling to our Next.js frontend.

•LoRA fine-tuning on 2,500+ Swedish conversation examples for improved language fluency
•Custom training data covering Nordic business terminology, legal concepts, and cultural context
•Swedish-hosted infrastructure ensures data never leaves EU jurisdiction
•Secure SSH tunnel connects RunPod GPU to our API with encrypted communication
•Production-grade features: rate limiting (60 req/min), queue management (6 concurrent), input validation

System Architecture

Arktis-1 operates on a distributed architecture with clear separation between the frontend, API gateway, and GPU compute layer. The system uses reverse SSH tunneling to securely connect a RunPod A40 GPU to our Next.js application server.

Next.js Frontend

Polished chat interface with real-time streaming, bilingual support, and partner access controls

Next.js 15, TypeScript, Server-Sent Events, i18n

API Gateway

Handles authentication, rate limiting, input validation, and GPU queue management

Next.js API Routes, In-memory rate limiting, Request queuing

SSH Tunnel Layer

Secure encrypted connection between cloud infrastructure and GPU compute

Reverse SSH tunnel, localhost:11434, Persistent connection

GPU Compute (RunPod)

NVIDIA A40 GPU running Ollama with Qwen3:8B model for inference

RunPod A40, Ollama API, 6 concurrent requests, Swedish data center

Technology Stack

AI & Training

Qwen3 8BLoRA Fine-tuningUnslothHugging Face Transformersbitsandbytes

Inference

OllamaNVIDIA A40 GPURunPodGGUF QuantizationOpenAI-compatible API

Frontend

Next.js 15ReactTypeScriptTailwind CSSServer-Sent Events

Backend

Next.js API RoutesRate LimitingRequest QueuingInput Validation

Infrastructure

Swedish HostingSSH TunnelingHTTPS/TLSGPU Cloud Compute

Security

IP-based Rate LimitingMessage Length LimitsConversation LimitsEncrypted Tunnels

Custom Model Training & Privacy Architecture

Arktis-1 combines privacy-first architecture with custom model training. We fine-tuned the base Qwen 8B model using LoRA on our RunPod A40 GPU, creating a specialized version optimized for Swedish language and Nordic business contexts while maintaining full data sovereignty.

LoRA Fine-Tuning Pipeline

Used Unsloth for 4-bit LoRA training on A40 (48GB VRAM). Trained for 3 epochs on 2,500 Swedish conversation pairs with rank=16, alpha=32. Total training time: ~4 hours with gradient checkpointing

Swedish Language Dataset

Curated training data from Swedish news articles, business correspondence, legal documents, and conversational examples. Focus on proper Swedish grammar, formal/informal register switching, and Nordic cultural references

Model Export & Deployment

Exported LoRA adapters, merged with base model, and quantized to Q4_K_M GGUF format for efficient Ollama inference. Final model size: ~4.5GB with minimal quality loss

Data Sovereignty

All training data processing and model weights remain on Swedish infrastructure. User messages never leave EU jurisdiction or touch external AI providers

GPU Queue Management

Intelligent queue system limits concurrent GPU requests to 6 (optimal for 8B model on A40) with overflow queue of 20 requests

Streaming Responses

Server-Sent Events enable real-time token streaming for responsive, ChatGPT-like experience without waiting for full generation

Platform Metrics

Arktis-1 demonstrates that custom-trained, privacy-focused AI can be both powerful and user-friendly, providing GPT-class capabilities with Swedish language optimization while maintaining complete data sovereignty.

2,500+

Training Examples

4 hrs

LoRA Training Time

Parameter Model

100%

EU Data Sovereignty

Next Project

ShippingTracker: Maritime Intelligence Platform →