dev | Viet-Phi Huynh

Feb 1, 2024	Serve a ML model on a single machine with Flask + Gunicorn vs. FastAPI + Uvicorn
Jan 1, 2024	I find Ray a powerful framework for parallel computing
Oct 4, 2023	Different techniques for optimizing LLM inference
Jun 12, 2023	Experimental benchmarks on the GPU requirements for the training/fine-tuning of LLMs.
Nov 10, 2022	What I've learned from finding ways to accelerate the inference of a Transformer model.
Nov 1, 2022	How to develop an Asynchronous REST API with Python, Flask, Gunicorn and Celery