Feb 1, 2024 Serve a ML model on a single machine with Flask + Gunicorn vs. FastAPI + Uvicorn Jan 1, 2024 I find Ray a powerful framework for parallel computing Oct 4, 2023 Different techniques for optimizing LLM inference Jun 12, 2023 Experimental benchmarks on the GPU requirements for the training/fine-tuning of LLMs. Nov 10, 2022 What I've learned from finding ways to accelerate the inference of a Transformer model. Nov 1, 2022 How to develop an Asynchronous REST API with Python, Flask, Gunicorn and Celery