Songyang Zhang

Applied Scientist @ Amazon Artificial General Intelligence (AGI) team.

prof_pic.jpg

2795 Augustine Dr

Santa Clara, CA, USA

I am Songyang Zhang (张宋扬 - in Chinese, Pronounce). I am an Applied Scientist at Amazon Artificial General Intelligence team, building video generative models. I received Ph.D. from University of Rochester, advised by Prof. Jiebo Luo. Before that, I got my master’s degree from Zhejiang University advised by Prof. Jun Xiao and my bachelor’s degree from Southeast University. I’ve received the NAACL 2021 Best Long Paper Award. My research is on computer vision and natural language processing, especially the intersection between video and language.

[Resume]

News

Feb 3, 2024 One paper accepted at TMM.
Jul 31, 2023 I joined Amazon as an Applied Scientist.
Jun 29, 2023 I have successfully defended my phd thesis.
Jan 20, 2023 One paper is accepted by ICLR 2023.
Oct 14, 2022 Invited talk at UM-IoS Workshop @ EMNLP 2022.

Selected Publications

* denotes equal contribution. The full list is here.

  1. Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
    Jie An*Songyang Zhang*Harry Yang, Sonal Gupta, Jia-Bin HuangJiebo Luo, and Xi Yin
    In Submission, 2023
  2. Learning a Grammar Inducer by Watching Millions of Instructional YouTube Videos
    Songyang ZhangLinfeng SongLifeng Jin, Haitao Mi, Kun XuDong Yu, and Jiebo Luo
    In Conference on Empirical Methods in Natural Language Processing, 2022
  3. Make-A-Video: Text-to-video Generation without Text-Video Data
    Uriel Singer, Adam Polyak, Thomas Hayes, Xi YinJie AnSongyang ZhangQiyuan HuHarry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman
    In International Conference on Learning Representations, 2022
  4. MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
    Thomas Hayes*, Songyang Zhang*Xi Yin, Guan Pang, Sasha Sheng, Harry YangSongwei GeQiyuan Hu, and Devi Parikh
    In European Conference on Computer Vision, 2022
  5. Expanding Language-Image Pretrained Models for General Video Recognition
    Bolin Ni, Houwen PengMinghao ChenSongyang ZhangGaofeng MengJianlong FuShiming Xiang, and Haibin Ling
    In European Conference on Computer Vision, 2022
  6. SAT: 2D Semantics Assisted Training for 3D Visual Grounding
    Zhengyuan YangSongyang ZhangLiwei Wang, and Jiebo Luo
    In IEEE International Conference on Computer Vision, 2021
  7. Video-aided Unsupervised Grammar Induction
    Songyang ZhangLinfeng SongLifeng JinKun XuDong Yu, and Jiebo Luo
    In Conference of the North American Chapter of the Association for Computational Linguistics, 2021
  8. Multi-Scale 2D Temporal Adjacency Networks for Moment Localization with Natural Language
    Songyang ZhangHouwen PengJianlong Fu, Yijuan Lu, and Jiebo Luo
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
  9. Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization
    Songyang ZhangHouwen Peng, Le Yang, Jianlong Fu, and Jiebo Luo
    arXiv preprint arXiv:1912.03612, 2019
  10. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
    Songyang ZhangHouwen PengJianlong Fu, and Jiebo Luo
    In the AAAI Conference on Artificial Intelligence, 2020
  11. Exploiting Temporal Relationships in Video Moment Localization with Natural Language
    Songyang ZhangJinsong Su, and Jiebo Luo
    In ACM International Conference on Multimedia, 2019
  12. Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang ZhangYang YangJun XiaoXiaoming LiuYi Yang, Di Xie, and Yueting Zhuang
    IEEE Transactions on Multimedia, 2018
  13. On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang ZhangXiaoming Liu, and Jun Xiao
    In IEEE Winter Conference on Applications of Computer Vision, 2017