Loading...
Loading...
Self-hosted AI assistant powered by Qwen 8B running on Swedish infrastructure via RunPod GPU. Features bilingual Swedish/English support, rate limiting, GPU queue management, and a privacy-first architecture with no external data sharing.
Organizations and privacy-conscious users need access to powerful AI assistants without sending their data to external cloud providers. European data sovereignty requirements and GDPR compliance add complexity, while existing self-hosted solutions often lack user-friendly interfaces, robust production features, and customization for specific use cases like Swedish language and Nordic business contexts.
We built Arktis-1 as a privacy-first AI assistant running on Swedish infrastructure, fine-tuned specifically for Swedish language and Nordic business contexts. Using LoRA (Low-Rank Adaptation), we trained the model on curated Swedish conversational data and business terminology directly on our RunPod A40 GPU. The system uses the fine-tuned Qwen 8B model via Ollama, connected through secure SSH tunneling to our Next.js frontend.
Arktis-1 operates on a distributed architecture with clear separation between the frontend, API gateway, and GPU compute layer. The system uses reverse SSH tunneling to securely connect a RunPod A40 GPU to our Next.js application server.
Polished chat interface with real-time streaming, bilingual support, and partner access controls
Next.js 15, TypeScript, Server-Sent Events, i18n
Handles authentication, rate limiting, input validation, and GPU queue management
Next.js API Routes, In-memory rate limiting, Request queuing
Secure encrypted connection between cloud infrastructure and GPU compute
Reverse SSH tunnel, localhost:11434, Persistent connection
NVIDIA A40 GPU running Ollama with Qwen3:8B model for inference
RunPod A40, Ollama API, 6 concurrent requests, Swedish data center
Arktis-1 combines privacy-first architecture with custom model training. We fine-tuned the base Qwen 8B model using LoRA on our RunPod A40 GPU, creating a specialized version optimized for Swedish language and Nordic business contexts while maintaining full data sovereignty.
Used Unsloth for 4-bit LoRA training on A40 (48GB VRAM). Trained for 3 epochs on 2,500 Swedish conversation pairs with rank=16, alpha=32. Total training time: ~4 hours with gradient checkpointing
Curated training data from Swedish news articles, business correspondence, legal documents, and conversational examples. Focus on proper Swedish grammar, formal/informal register switching, and Nordic cultural references
Exported LoRA adapters, merged with base model, and quantized to Q4_K_M GGUF format for efficient Ollama inference. Final model size: ~4.5GB with minimal quality loss
All training data processing and model weights remain on Swedish infrastructure. User messages never leave EU jurisdiction or touch external AI providers
Intelligent queue system limits concurrent GPU requests to 6 (optimal for 8B model on A40) with overflow queue of 20 requests
Server-Sent Events enable real-time token streaming for responsive, ChatGPT-like experience without waiting for full generation
Arktis-1 demonstrates that custom-trained, privacy-focused AI can be both powerful and user-friendly, providing GPT-class capabilities with Swedish language optimization while maintaining complete data sovereignty.