About Me
Hi my friend, and welcome to my homepage!
I’m currently based in Seattle, WA, and working on multimodal LLM research at Apple. If you’re interested in NLP or Multimodal, feel free to reach out—I’d love to connect! You can contact me at yq729@nyu.edu.
Before joining Apple, I earned my master’s degree in Computer Science from the Courant Institute of Mathematical Sciences at New York University in Jan 2023. Prior to that, I completed my bachelor’s degrees in Physics and Finance at Nanjing University in 2018.
I’m originally from Suzhou, Jiangsu, China. My MBTI type is ENFJ.
Photography
I use Fujifilm X-T5, Ricoh GR3 and DJI Mavic Air 2 to take photos. Check out my photos here.
Reading and Writing
I enjoy reading literature and writing articles/fictions. Check out my book lists and movie lists on Douban.
Some articles I wrote in Chinese: 纽约四年, 我的南大物理四年, 间隔年
Daily Life
I am active on Little Red Book. Check out my daily life here.
Cooking and Baking
I love sharing food with loved ones. Check out what I cooked and baked here.
Publications and Preprints
- Image Editing
- Yusu Qian, Eli Bocek-Rivele, Liangchen Song, Jialing Tong, Yinfei Yang, Jiasen Lu, Wenze Hu, Zhe Gan. Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing
- Yusu Qian, Jiasen Lu, Tsu-Jui Fu, Xinze Wang, Chen Chen, Yinfei Yang, Wenze Hu, Zhe Gan. GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing (GIE-Bench is released at link)
- Tsu-Jui Fu, Yusu Qian, Chen Chen, Wenze Hu, Zhe Gan, Yinfei Yang. UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing ICCV 2025
- Benchmarks
- Yusu Qian, Cheng Wan, Chao Jia, Yinfei Yang, Qingyu Zhao, Zhe Gan. PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection (PRISM-Bench is released at link)
- Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnuer, Peter Grasch, Yinfei Yang, Zhe Gan. MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs ICLR 2025 (MIA-Bench is released at link)
- Yusu Qian, Haotian Zhang, Yinfei Yang, Zhe Gan. How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts. SafeGenAi workshop at NeurIPS 2024 (MAD-Bench is released at link)
- Alignment
- Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei Yang, Zhe Gan, Peter Grasch. Understanding Alignment in Multimodal LLMs: A Comprehensive Study
- NLP for Social Science
- Yusu Qian, Urwa Muaz, Ben Zhang, Jae Won Hyun. Reducing gender bias in word-level language models with a gender-equalizing loss function. ACL-SRW 2019
- Yusu Qian. Gender Stereotypes Differ between Male and Female Writings. ACL-SRW 2019
Education
- M.S. in Computer Science, M.S. in Informatics, New York University, Sep 2018 ~ Dec 2022
- B.S. in Physics, B.S. in Finance, Nanjing University, Sep 2014 ~ July 2018
- I also studied on campus at University of California, Berkeley for a semester in 2017, and Stanford University for a quarter in 2019
Work experience
- Apple: Machine Learning Engineer
- (Oct 2023 - present) Vision Foundation Model
- (July 2023 - Oct 2023) Machine Learning Research
- (Feb 2023 - July 2023) Siri Understanding
- I joined Apple as an AIML Rotation Engineer, and rotated with above teams before deciding to join Vision Foundation Model team.
- Google: Software Engineer Intern
- (May 2022 - Aug 2022) ADS
