### i’m Haotian Zhang (pronounced how-ten). I make smart vision-language models. <img src="resources/haotian2025-1.jpg" alt="Haotian Zhang" style="float:right;width:160px;margin-left:0rem;margin-right:1rem">I am a Researcher and Lead Machine Learning Engineer at Learnable, Inc., where I started as a founding ML engineer in 2019. My research work sits at the intersection of computer vision, natural language, and program synthesis. I build large-scale vision-language models that can ground objects, reason step-by-step, and explain their answers; develop diffusion-based code generators that treat visual code as a continuous medium to enable global, controllable generations and edits; and train handwriting recognition models that outperform human on noisy, real-world data. I believe vision-language models are still in their infancy yet hold enormous promise. By more tightly integrating perceptual input/feedback (“see”), reasoning (“think”), and code/text/image generation (“write”). vision-language models could become substantially more capable. My engineering career has evolved alongside the rise of modern MLOps. As one of the founding ML engineers at a pioneering machine learning startup in early 2019, I was part of the initial wave of training product-ready deep learning models and developing scalable, efficient methods for deploying them in production. For the past 7 years, my work has evolved from developing a handwriting OCR API by streamlining semantic segmentation and recurrent sequence-to-sequence models, to creating a early huggingface transformers style unified model registry and deployment pipeline capable of automatically routing and scaling hundreds of models across multiple teams, to conceptualizing -> data curating -> pre-training -> instruction-tuning -> reinforcement learning -> large-scale deploying a unified large vision-language model that replaced most of those models. I believe there’s always a significant gap between state-of-the-art models and their effective integration into applications that genuinely improve user experiences and solve real-world problems, and my engineering work is dedicated to bridging that gap. While much of my current project details are confidential, below are some selected open projects and publications. #### Selected Works <img src="resources/rlrf-1.png" alt="Haotian Zhang" style="float:left;width:160px;margin:1rem 1rem 1rem 0rem">**[Rendering-Aware Reinforcement Learning for Vector Graphics Generation](https://arxiv.org/abs/2505.20793)** Juan A Rodriguez\*, Haotian Zhang\*, Abhay Puri, Aarash Feizi, Rishav Pramanik, Pascal Wichmann, and 9 more authors *arXiv preprint arXiv:2505.20793* <img src="resources/mmc-1.png" alt="Haotian Zhang" style="float:left;width:160px;margin:1rem 1rem 1rem 0rem">**[MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs](https://arxiv.org/abs/2503.04291)** Tianyang Zhang\*, Zhuoxuan Jiang\*, Haotian Zhang\*, Lin Lin, Shaohua Zhang *AAAI 2025 pp. 29730-29732* <img src="resources/ri-2.png" alt="Haotian Zhang" style="float:left;width:160px;margin:0rem 1rem 0rem 0rem">**[Recurrent Inference in Text Editing](https://arxiv.org/abs/2009.12643)** Ning Shi, Ziheng Zeng, Haotian Zhang, Yichen Gong *EMNLP.findings.2020.159* --- #### Additional information Social: [github](https://github.com/htplex) [huggingface](https://huggingface.co/hz2475) [linkedin](https://www.linkedin.com/in/haotian01) [email](mailto:[email protected]) [google scholar](https://scholar.google.com/citations?user=WpWhRWwAAAAJ) Collaborators: [Juan A. Rodriguez](https://joanrod.github.io/) [Tianyang Zhang](https://scholar.google.com/citations?user=M1lw2OMAAAAJ) [Zhuoxuan Jiang](https://scholar.google.com/citations?user=I8WBM-8AAAAJ) I’m also passionate about *Photography* and *Cooking*. Shoot me a email any time if you want to chat !