Mirai Labs

Mirai Labs — Cluster Manager

Your GPU fleet.Fully under your control.

Provision, schedule, and manage GPU infrastructure for LLM workloads. From bare-metal to cloud, fully sovereign, no lock-in.

0x

Time-to-Deploy

From provisioning to production-ready LLM workflows in hours, not weeks.

0%

GPU Utilization

Maximize usage with multi-tenancy, job-aware scheduling, and GPU fractioning.

0x

Cost Efficiency

Per-second billing and auto-scaling with zero idle capacity.

0%

Infra Sovereignty

Your cloud, your region, your bare-metal. No lock-in.

Cluster Manager

Secure & seamless
AI workload orchestration.

Multi-tenancy, resource isolation, and dynamic scheduling -- all sovereign, all yours.

Multi-Tenancy
Securely isolate workloads across tenants with network segmentation, multi-cluster boundaries, and namespace-level access control.
Tenant A
llm-serveinference
embed-v2pipeline
rag-apiserving
ns: tenant-a-prod
Tenant B
finetune-01training
eval-runbenchmark
data-prepbatch
ns: tenant-b-prod
isolated
GPU Fractioning
Run multiple lightweight workloads on a single GPU with vGPU slicing. Ideal for LLM inference, RAG pipelines, and microservices.
NVIDIA A100 · 80 GB · Single GPU
1/4
1/2
1/4
1/4Deployment20 GB · LLM serving
1/2Model Training40 GB · fine-tuning
1/4Finetuning20 GB · LoRA
Resource Allocation
Guarantee compute and storage with fine-grained quota policies. Enable predictable performance through dedicated GPU pools.
GPU Compute Pool14 / 20 allocated
quota
Tenant A
Tenant B
Shared
Reserved
GPU Compute78%
GPU Memory60%
CPU Cores45%
Storage33%
Intelligent Scheduling
Maximize efficiency with bin packing, gang scheduling, and priority-based queueing -- dynamically placed based on cluster state.
JobAllocStatus
P1
llm-serve
Inference · bin-packed
2xGPUrunning
P2
finetune-01
Training · gang-sched.
4xGPUrunning
P3
eval-run
Benchmark · prio-queue
1xGPUqueued
P4
data-prep
Batch · fractional
1/2 GPUpending

MLOps / LLMOps

The full ML lifecycle.
On your infrastructure.

From fine-tuning to production inference — every tool your team needs, running on sovereign compute.

LLM Fine-Tuning & Benchmarking
Select a base model, attach your dataset, configure hyperparameters, and launch a fine-tuning job from one interface.

Select a base model

Llama 3MistralFalconGPT-JOpt

Select a dataset

Dataset 1
Dataset 2
Dataset 3

Hyperparameters

Learning rate
Gradient Accum.
Epochs
Batch size

Start fine-tuning job

ModelLlama 3
DatasetDataset 2
GPUs4x A100
Ready to launch
Notebooks Management
Spin up Jupyter environments with dedicated GPU allocations. Start, stop, and monitor notebook instances in real time.
research-llm-v2
running
CPU
8 Core
Memory
64G
GPU
1x A100
Started 12 min ago
finetune-mistral
stopped
CPU
16 Core
Memory
128G
GPU
4x H100
Started 2 hrs ago
eval-bench-01
running
CPU
4 Core
Memory
32G
GPU
1x T4
Started 5 min ago
Tracking & Experimentation
Log metrics, compare runs side by side, and visualize training curves across experiments to find what actually works.
Compare Runs
Run 1
Run 2

mse vs step

2.01.51.00.5
Best Loss0.82
Accuracy91.4%
Total Runs14
Inferencing & Model Deployment
Deploy models to scalable endpoints with autoscaling. Debug prompts and preview model responses before going live.
Debug & Preview
#1 llama-3-8b#2 mistral-7b
You
Summarize this contract in plain English.
AI
This establishes a 12-month SaaS agreement covering GPU cloud access at $0.80/hr with a 99.9% uptime SLA.
Talk to model...
Latency38ms
Req/sec1.2k
Replicasauto
Distributed Training
Train across multiple nodes with model and data parallelism. Partition large models automatically across your GPU fleet.

Model Parallelism

Shared Data Layer
A
Worker A
B
Worker B
C
Worker C

Partitioned Model

SLURM

Intelligent job scheduling
for AI workloads.

Maximize resource efficiency with dynamic SLURM orchestration across your sovereign GPU fleet.

GPU & Compute Resource Scheduling
Allocate nodes with precise GPU, CPU, memory, and storage configurations. Submit allocation requests and confirm provisioning in one flow.
Node Profile
3 nodes · Hermes 1
Accelerator
4x NVIDIA A100
per node
80 GB VRAM
Memory
512 GB
CPU
64 cores
Storage
2 TB NVMe
OS
Ubuntu 20.04
Arch
AMD64
Network
InfiniBand
Cluster utilization after allocation+3 nodes
0%62% used100%
Dynamic Workload Management
Automatically distribute workloads across compute nodes. SLURM dispatches jobs to available GPUs based on priority and cluster state.
Active GPU Node
A10G
online
Running
Workspace Name
Cascade
Workload ID
WI-2d91e7
Workload Type
Benchmarking
Running
Workspace Name
Cascade
Workload ID
WI-2d91e7
Workload Type
Benchmarking
Running
Workspace Name
Cascade
Workload ID
WI-2d91e7
Workload Type
Benchmarking
Multi-Node Batch Processing
Parallelize training and inference jobs across multiple nodes. Distribute workloads with gang scheduling for maximum throughput.
Running
Workspace Name
Cascade
Workload ID
WI-2d91e7
Workload Type
Benchmarking
Node 1
Node 2
Node 3
Resource Optimization & Scaling
Set up SLURM clusters in a guided wizard — select nodes, configure shared storage, and define access policies in three steps.
Cluster provisioning flow
Select Nodes
Choose GPU node count, type, and availability zone for your cluster.
done
02
Configure Storage
Set shared file system size and mount paths for inter-node data access.
active
03
Set Up Access
Define SSH keys, user permissions, and namespace-level access policies.
Live resource efficiency
GPU Utilization87%
Job Queue Depth12 jobs
Node Efficiency94%

Monitoring & Logging

Real-time monitoring
& system health.

Full visibility into nodes, GPUs, and workload performance — with cost and usage insights across every tenant.

System Metrics
Track hardware performance, memory, and thermal stats across distributed training jobs in real time.
GPU Utilization (%)
Run 1
Run 2
2.01.51.00.50
GPU Temp72°Cnominal
Memory BW1.4 TB/speak
Power Draw310 Wper GPU
Clock Speed1.41 GHzboost
Detailed Logging & Event Analysis
Centralized logs and event tracking for debugging and optimization across all nodes and workloads.
Activity Log24-10-2024
adminassigned 5 nodes
16:16:04
systemjob finetune-01 started
16:14:51
tenant-bGPU quota exceeded
16:12:30
adminnamespace isolation applied
16:09:17
systemnode health check passed
16:05:02
Node
Job
Warn
System
AI FinOps Metrics
Real-time insights to optimize AI costs and resource consumption per node, job, and tenant.
Total Cost$1,240
GPU Hours2,000
Cost / Hour$0.62
ip-10-23-1-220.ec2
1x T4 · 3.92 cores · 14 GiB
$6.40
Start Date
01/02/2025 11:57
End Date
01/02/2025 20:02
Total Hours8h 5m
ip-10-23-1-118.ec2
4x A100 · 64 cores · 512 GiB
$192.00
Start Date
01/02/2025 00:00
End Date
02/02/2025 00:00
Total Hours24h 0m
Workload Usage
Monitor GPU hours, job durations, and resource utilization at tenant and cluster levels.
Tenant GPU Usage Distribution
Total GPU Hours2,000
Tenant 1
245 hrs
Tenant 2
47 hrs
Tenant 3
124 hrs
Tenant 4
879 hrs
Tenant 5
475 hrs
Tenant 6
230 hrs
GPU Utilization
76%
High Usage Alerts
8 Tenants

FAQ

Common questions

Everything you need to know about deploying and running Cluster Manager in your environment.

Yes, Cluster Manager supports fractional GPU allocation, enabling you to divide a single GPU into smaller portions (e.g., 1/2, 1/4) and assign them to different jobs to optimize utilization.

Sovereign AI Infrastructure

Run AI on infrastructure you control, with tools to deploy, manage, and scale models, agents, and workloads.