Running Large Language Models on a VPS

Loouis published on 2024-12-03 included in AI IT

This post documents how to run LLMs on a 1C1G VPS using Ollama.

Large Language Model Inference Framework Throughput Comparison: VLLM | SGLang | LMDeploy

Loouis published on 2024-11-23 included in AI

This article compares the throughput of three large language model inference engines, VLLM, SGLang, and LMDeploy, in a short-input, long-output scenario. The unit of measurement is output tokens per second.

How to Make Your Website More Secure? How to Restrict Services Started via Docker Containers with UFW? UFW One-Click Script

Loouis published on 2024-10-30 included in IT

This article explains how to configure the UFW firewall using a one-click script to restrict network access for Docker container services, enhancing website security.