### i’m Haotian Zhang
(pronounced how-ten). I make smart vision-language models.
<img src="resources/haotian2025-1.jpg" alt="Haotian Zhang" style="float:right;width:160px;margin-left:0rem;margin-right:1rem">I am a Researcher and Lead Machine Learning Engineer at Learnable, Inc., where I started as a founding ML engineer in 2019.
My research work sits at the intersection of computer vision, natural language, and program synthesis. I build large-scale vision-language models that can ground objects, reason step-by-step, and explain their answers; develop diffusion-based code generators that treat visual code as a continuous medium to enable global, controllable generations and edits; and train handwriting recognition models that outperform human on noisy, real-world data. I believe vision-language models are still in their infancy yet hold enormous promise. By more tightly integrating perceptual input/feedback (“see”), reasoning (“think”), and code/text/image generation (“write”). vision-language models could become substantially more capable.
My engineering career has evolved alongside the rise of modern MLOps. As one of the founding ML engineers at a pioneering machine learning startup in early 2019, I was part of the initial wave of training product-ready deep learning models and developing scalable, efficient methods for deploying them in production. For the past 7 years, my work has evolved from developing a handwriting OCR API by streamlining semantic segmentation and recurrent sequence-to-sequence models, to creating a early huggingface transformers style unified model registry and deployment pipeline capable of automatically routing and scaling hundreds of models across multiple teams, to conceptualizing -> data curating -> pre-training -> instruction-tuning -> reinforcement learning -> large-scale deploying a unified large vision-language model that replaced most of those models. I believe there’s always a significant gap between state-of-the-art models and their effective integration into applications that genuinely improve user experiences and solve real-world problems, and my engineering work is dedicated to bridging that gap.
While much of my current project details are confidential, below are some selected open projects and publications.
#### Selected Works
<img src="resources/rlrf-1.png" alt="Haotian Zhang" style="float:left;width:160px;margin:1rem 1rem 1rem 0rem">**[Rendering-Aware Reinforcement Learning for Vector Graphics Generation](https://arxiv.org/abs/2505.20793)**
Juan A Rodriguez\*, Haotian Zhang\*, Abhay Puri, Aarash Feizi, Rishav Pramanik, Pascal Wichmann, and 9 more authors
*arXiv preprint arXiv:2505.20793*
<img src="resources/mmc-1.png" alt="Haotian Zhang" style="float:left;width:160px;margin:1rem 1rem 1rem 0rem">**[MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs](https://arxiv.org/abs/2503.04291)**
Tianyang Zhang\*, Zhuoxuan Jiang\*, Haotian Zhang\*, Lin Lin, Shaohua Zhang
*AAAI 2025 pp. 29730-29732*
<img src="resources/ri-2.png" alt="Haotian Zhang" style="float:left;width:160px;margin:0rem 1rem 0rem 0rem">**[Recurrent Inference in Text Editing](https://arxiv.org/abs/2009.12643)**
Ning Shi, Ziheng Zeng, Haotian Zhang, Yichen Gong
*EMNLP.findings.2020.159*
---
#### Additional information
Social: [github](https://github.com/htplex) [huggingface](https://huggingface.co/hz2475) [linkedin](https://www.linkedin.com/in/haotian01) [email](mailto:
[email protected]) [google scholar](https://scholar.google.com/citations?user=WpWhRWwAAAAJ)
Collaborators: [Juan A. Rodriguez](https://joanrod.github.io/) [Tianyang Zhang](https://scholar.google.com/citations?user=M1lw2OMAAAAJ) [Zhuoxuan Jiang](https://scholar.google.com/citations?user=I8WBM-8AAAAJ)
I’m also passionate about *Photography* and *Cooking*.
Shoot me a email any time if you want to chat !