Choosing a Local Inference Server

To run a local LLM with Elite Intel, an inference server is required. This is software that loads the AI model and serves it over a local API. It is the local equivalent of a cloud AI service, running entirely on your own hardware.

Elite Intel supports two inference servers: Ollama and LM Studio. Both are compatible and use the same models. The choice can be changed in settings at any time.

loca llm ui

GPU Requirements

Hardware requirements to run game and LLM on the same machine:

  • RTX 3090 24GB VRAM
  • AMD RX 7800 XT

If you do not have enough hardware, use Free Cloud service

A GPU reference table provided by Kevin Rank is available here: GPU Reference Guide


Install Guides

Inference Server
āœ… LM Studio - Linux Fast, more model flexibility - guide shows how to setup as a server
āœ… LM Studio - Windows Fast, more model flexibility - got GUI
Ollama - Linux Recommended if you have the hardware to run it
Ollama - Windows Recommended if you have the hardware to run it

Ollama vs. LM Studio at a Glance

Ollama LM Studio
Speed Slower Faster
Preferred model tulu-3.1-8b-supernova Q4_K_M tulu-3.1-8b-supernova Q4_K_M
Best for Simple setup, minimal maintenance More control over model loading
Install One script, done One script, done
Runs as System service (auto-starts on boot) Manual start, or opt-in auto-start
Model tuning Modelfile baked into the model Flags at load time
Windows auto-start āœ… Works out of the box Requires desktop app or Task Scheduler
Linux auto-start āœ… systemd service included Manual systemd setup
Model source Ollama library HuggingFace (GGUF)
API port 11434 1234
GUI None (CLI only) Optional desktop app

Selection Guide

Use Ollama when:

  • You want a simple install with minimal ongoing configuration
  • You are on Windows and prefer not to configure startup manually
  • You are new to local LLMs

Use LM Studio when:

  • You want a desktop GUI to browse, download, and manage models
  • You are already familiar with HuggingFace and GGUF model files
  • You want to experiment with different models without writing Modelfiles
  • You are running a dedicated inference machine and need a clean headless server

Either option works when:

  • You have an NVIDIA RTX 3090 24 GB equivalent or better. VRAM is the critical factor, not GPU speed. A GPU with only 12 GB VRAM is insufficient regardless of generation.
  • You are running Elite Dangerous and the LLM on the same machine
  • You want to point Elite Intel at a separate PC on your network

Developer Recommendation

The developer uses LM Studio with matrixportalx/Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF. This model provides fast inference. The same model on Ollama runs noticeably slower. The app is optimized for this model. Other models may work but are not guaranteed. Report compatibility findings on Matrix.

Why tulu3.1:8b Supernova specifically?

Elite Intel is a command parser and data analysis tool, not a conversational chatbot. This imposes specific model requirements. Generating natural-sounding banter is insufficient. The model must correctly infer actions from voice input and perform structured data analysis. It must return results in formatted JSON, not a markup essay or HTML. Not all models of this size perform this task reliably.

Tulu 3 (the base training recipe) is genuinely exceptional

Tulu-3.1-8B-SuperNova-Q4_K_M-GGUF

Most instructed models are trained with RLHF, which uses a learned reward model to judge outputs. That reward model is itself a neural network, so it inherits all the usual biases and inconsistencies. Tulu 3 replaced this with RLVR (Reinforcement Learning with Verifiable Rewards). Instead of a learned reward model, the training uses a deterministic scoring function: the answer is either correct or it is not. Binary, no bias. This is particularly impactful for instruction-following tasks, where the reward signal is objective.

The training pipeline is a four-stage approach: data curation targeting core skills, supervised fine-tuning, Direct Preference Optimization, and RLVR on top to sharpen verifiable task performance. Each stage builds on the last. This is why Tulu 3 on the 8B Llama base achieves results surpassing the instruct versions of Llama 3.1, Qwen 2.5, Mistral, and even closed models like GPT-4o-mini and Claude 3.5 Haiku.

For EliteIntel, the command classification stage is an instruction-following task with verifiable correct answers (JSON action X vs. Y). This is precisely the task type that RLVR optimizes. The model is trained specifically for deterministic structured output.

Why the "Supernova" Variant

The Supernova variant differs from standard Tulu 3. Tulu-3.1-8B-SuperNova is created via a linear merge of three models: Llama-3.1-MedIT-SUN-8B (medical/reasoning), Llama-3.1-Tulu-3-8B (instruction following), and Llama-3.1-SuperNova-Lite (Arcee's distilled model), each contributing equally at weight 1.0 using mergekit.

The SuperNova-Lite parent is a distilled model from a larger Arcee base, providing knowledge density beyond a standard 8B model. The linear merge averages weight tensors directly, combining knowledge without additional training compute. This achieves particularly strong results on instruction-following tasks, as demonstrated by its IFEval score.

Performance: The model uses an 8B Llama architecture. At Q4_K_M quantization on a 3090 24 GB, it fits in VRAM alongside the game with headroom. This avoids CPU offload and maintains maximum inference throughput. Comparable Qwen models use different attention head configurations (such as Qwen2.5's GQA ratio) that may run slower in llama.cpp's GGUF backend.

It also runs on a 12 GB VRAM card if no other VRAM-consuming workloads are present. This requires the game to run on a separate GPU or machine.

Can I use a different model?

Alternative models may be used but are unlikely to match the speed and accuracy of tulu3.1-supernova.

Common issues with alternative models include an incorrect response format. The most frequent error is the model returning a markup essay instead of a structured action or analysis result.


Community šŸ‘‰MatrixšŸ‘ˆ