Mirai Labs — Cluster Manager

Your GPU fleet.Fully under your control.

Provision, schedule, and manage GPU infrastructure for LLM workloads. From bare-metal to cloud, fully sovereign, no lock-in.

0x

Time-to-Deploy

From provisioning to production-ready LLM workflows in hours, not weeks.

0%

GPU Utilization

Maximize usage with multi-tenancy, job-aware scheduling, and GPU fractioning.

0x

Cost Efficiency

Per-second billing and auto-scaling with zero idle capacity.

0%

Infra Sovereignty

Your cloud, your region, your bare-metal. No lock-in.

Cluster Manager

Secure & seamless
AI workload orchestration.

Multi-tenancy, resource isolation, and dynamic scheduling -- all sovereign, all yours.

Multi-Tenancy

Securely isolate workloads across tenants with network segmentation, multi-cluster boundaries, and namespace-level access control.

Tenant A

llm-serveinference

embed-v2pipeline

rag-apiserving

ns: tenant-a-prod

o

isolated

Tenant B

finetune-01training

eval-runbenchmark

data-prepbatch

ns: tenant-b-prod

isolated

GPU Fractioning

Run multiple lightweight workloads on a single GPU with vGPU slicing. Ideal for LLM inference, RAG pipelines, and microservices.

NVIDIA A100 · 80 GB · Single GPU

1/4

1/2

1/4

1/4Deployment20 GB · LLM serving

1/2Model Training40 GB · fine-tuning

1/4Finetuning20 GB · LoRA

Resource Allocation

Guarantee compute and storage with fine-grained quota policies. Enable predictable performance through dedicated GPU pools.

GPU Compute Pool14 / 20 allocated

quota

Tenant A

Tenant B

Shared

Reserved

GPU Compute78%

GPU Memory60%

CPU Cores45%

Storage33%

Intelligent Scheduling

Maximize efficiency with bin packing, gang scheduling, and priority-based queueing -- dynamically placed based on cluster state.

JobAllocStatusGPU

P1

llm-serve

Inference · bin-packed

2xGPUrunningGPU 0-1

P2

finetune-01

Training · gang-sched.

4xGPUrunningGPU 2-5

P3

eval-run

Benchmark · prio-queue

1xGPUqueuedGPU 6

P4

data-prep

Batch · fractional

1/2 GPUpendingGPU 6+

Multi-Tenancy - namespace isolation

Tenant A

llm-serveinference

embed-v2pipeline

rag-apiserving

ns: tenant-a-prod

o

isolated

Tenant B

finetune-01training

eval-runbenchmark

data-prepbatch

ns: tenant-b-prod

isolated

MLOps / LLMOps

The full ML lifecycle.
On your infrastructure.

From fine-tuning to production inference — every tool your team needs, running on sovereign compute.

LLM Fine-Tuning & Benchmarking

Select a base model, attach your dataset, configure hyperparameters, and launch a fine-tuning job from one interface.

Select a base model

Llama 3MistralFalconGPT-JOpt

Select a dataset

Dataset 1

Dataset 2

Dataset 3

Hyperparameters

Learning rate

Gradient Accum.

Epochs

Batch size

Start fine-tuning job

ModelLlama 3

DatasetDataset 2

GPUs4x A100

Ready to launch

Notebooks Management

Spin up Jupyter environments with dedicated GPU allocations. Start, stop, and monitor notebook instances in real time.

research-llm-v2

running

CPU

8 Core

Memory

64G

GPU

1x A100

Started 12 min ago

finetune-mistral

stopped

CPU

16 Core

Memory

128G

GPU

4x H100

Started 2 hrs ago

eval-bench-01

running

CPU

4 Core

Memory

32G

GPU

1x T4

Started 5 min ago

Tracking & Experimentation

Log metrics, compare runs side by side, and visualize training curves across experiments to find what actually works.

Compare Runs

Run 1

Run 2

mse vs step

2.01.51.00.5

Best Loss0.82

Accuracy91.4%

Total Runs14

Inferencing & Model Deployment

Deploy models to scalable endpoints with autoscaling. Debug prompts and preview model responses before going live.

Debug & Preview

#1 llama-3-8b#2 mistral-7b

You

Summarize this contract in plain English.

AI

This establishes a 12-month SaaS agreement covering GPU cloud access at $0.80/hr with a 99.9% uptime SLA.

Talk to model...

Latency38ms

Req/sec1.2k

Replicasauto

Distributed Training

Train across multiple nodes with model and data parallelism. Partition large models automatically across your GPU fleet.

Model Parallelism

Shared Data Layer

A

Worker A

B

Worker B

C

Worker C

Partitioned Model

LLM Fine-Tuning & Benchmarking

Select a base model

Llama 3MistralFalconGPT-JOpt

Select a dataset

Dataset 1

Dataset 2

Dataset 3

Hyperparameters

Learning rate

Gradient Accum.

Epochs

Batch size

Start fine-tuning job

ModelLlama 3

DatasetDataset 2

GPUs4x A100

Ready to launch

SLURM

Intelligent job scheduling
for AI workloads.

Maximize resource efficiency with dynamic SLURM orchestration across your sovereign GPU fleet.

GPU & Compute Resource Scheduling

Allocate nodes with precise GPU, CPU, memory, and storage configurations. Submit allocation requests and confirm provisioning in one flow.

Node Profile

3 nodes · Hermes 1

Accelerator

4x NVIDIA A100

per node

80 GB VRAM

Memory

512 GB

CPU

64 cores

Storage

2 TB NVMe

OS

Ubuntu 20.04

Arch

AMD64

Network

InfiniBand

Cluster utilization after allocation+3 nodes

0%62% used100%

Dynamic Workload Management

Automatically distribute workloads across compute nodes. SLURM dispatches jobs to available GPUs based on priority and cluster state.

Active GPU Node

A10G

online

Running

Workspace Name

Cascade

Workload ID

WI-2d91e7

Workload Type

Benchmarking

Running

Workspace Name

Cascade

Workload ID

WI-2d91e7

Workload Type

Benchmarking

Running

Workspace Name

Cascade

Workload ID

WI-2d91e7

Workload Type

Benchmarking

Multi-Node Batch Processing

Parallelize training and inference jobs across multiple nodes. Distribute workloads with gang scheduling for maximum throughput.

Running

Workspace Name

Cascade

Workload ID

WI-2d91e7

Workload Type

Benchmarking

Node 1

Node 2

Node 3

Resource Optimization & Scaling

Set up SLURM clusters in a guided wizard — select nodes, configure shared storage, and define access policies in three steps.

Cluster provisioning flow

Select Nodes

Choose GPU node count, type, and availability zone for your cluster.

done

02

Configure Storage

Set shared file system size and mount paths for inter-node data access.

active

03

Set Up Access

Define SSH keys, user permissions, and namespace-level access policies.

Live resource efficiency

GPU Utilization87%

Job Queue Depth12 jobs

Node Efficiency94%

Resource Scheduling - node allocation

Node Profile

3 nodes · Hermes 1

Accelerator

4x NVIDIA A100

per node

80 GB VRAM

Memory

512 GB

CPU

64 cores

Storage

2 TB NVMe

OS

Ubuntu 20.04

Arch

AMD64

Network

InfiniBand

Cluster utilization after allocation+3 nodes

0%62% used100%

Monitoring & Logging

Real-time monitoring
& system health.

Full visibility into nodes, GPUs, and workload performance — with cost and usage insights across every tenant.

System Metrics

Track hardware performance, memory, and thermal stats across distributed training jobs in real time.

GPU Utilization (%)

Run 1

Run 2

2.01.51.00.50

GPU Temp72°Cnominal

Memory BW1.4 TB/speak

Power Draw310 Wper GPU

Clock Speed1.41 GHzboost

Detailed Logging & Event Analysis

Centralized logs and event tracking for debugging and optimization across all nodes and workloads.

Activity Log24-10-2024

adminassigned 5 nodes