blog | Viet-Phi Huynh

Serve a ML model on a single machine with Flask + Gunicorn vs. FastAPI + Uvicorn

8 min read · February 1, 2024

2024 · dev · flask, gunicorn, fastapi, uvicorn
I find Ray a powerful framework for parallel computing

10 min read · January 1, 2024

2024 · dev · multiprocessing, ray
Different techniques for optimizing LLM inference

4 min read · October 4, 2023

2023 · dev · NLP, LLM Inference
Cheat Sheet of NLP Practitioner

137 min read · September 8, 2023

2023 · research · NLP, AI
Recap of The Very Large Database (VLDB) Conference 2023

14 min read · September 1, 2023

2023 · research · NLP, Database
Table Representation Learning with Transformer

10 min read · August 5, 2023

2023 · research · NLP, Table_Representation_Learning, AI
Experimental benchmarks on the GPU requirements for the training/fine-tuning of LLMs.

2 min read · June 12, 2023

2023 · dev · language_model, gpu
PoTM - Emergent World Representations - Exploring a Sequence Model Trained on a Synthetic Task

5 min read · May 1, 2023

2023 · research · paper_of_the_month, language_model, explainable_ai
PoTM - Understanding Dataset Difficulty with V-Usable Information

8 min read · December 6, 2022

2022 · research · paper_of_the_month, information_theory, machine_learning
What I've learned from finding ways to accelerate the inference of a Transformer model.

13 min read · November 10, 2022

2022 · dev · Optimization, ONNX, ONNX_Runtime, Huggingface_Optimum, Transformer.