Songyang Zhang

Applied Scientist @ Amazon Artificial General Intelligence (AGI) team.

prof_pic.jpg

2795 Augustine Dr

Santa Clara, CA, USA

I am Songyang Zhang (张宋扬 - in Chinese, Pronounce). I am an Applied Scientist at Amazon Artificial General Intelligence team, building video generative models. I received Ph.D. from University of Rochester, advised by Prof. Jiebo Luo. Before that, I got my master’s degree from Zhejiang University advised by Prof. Jun Xiao and my bachelor’s degree from Southeast University. I’ve received the NAACL 2021 Best Long Paper Award. My research is on computer vision and natural language processing, especially the intersection between video and language.

[Resume]

selected publications

  1. nova-24.jpg
    The Amazon Nova family of models: Technical report and model card
    Amazon Artificial General Intelligence
    Amazon Technical Reports, 2024
  2. arXiv
    iccv-23.webp
    Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
    Jie An*Songyang Zhang*, Harry Yang, Sonal Gupta, Jia-Bin Huang, Jiebo Luo, and Xi Yin
    arXiv preprint arXiv:2304.08477, 2023
  3. EMNLP
    emnlp-22.jpg
    Learning a Grammar Inducer by Watching Millions of Instructional YouTube Videos
    Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, and Jiebo Luo
    In Conference on Empirical Methods in Natural Language Processing, 2022
  4. ICLR
    makeavideo.webp
    Make-A-Video: Text-to-video Generation without Text-Video Data
    Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman
    In International Conference on Learning Representations, 2022
  5. ECCV
    mugen.webp
    MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
    Thomas Hayes*Songyang Zhang*, Xi Yin, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, and Devi Parikh
    In European Conference on Computer Vision, 2022
  6. ECCV
    xclip.png
    Expanding Language-Image Pretrained Models for General Video Recognition
    Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, and Haibin Ling
    In European Conference on Computer Vision, 2022
  7. ICCV
    iccv-21.jpg
    SAT: 2D Semantics Assisted Training for 3D Visual Grounding
    Zhengyuan Yang, Songyang Zhang, Liwei Wang, and Jiebo Luo
    In IEEE International Conference on Computer Vision, 2021
  8. NAACL
    naacl-21.webp
    Video-aided Unsupervised Grammar Induction
    Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, and Jiebo Luo
    In Conference of the North American Chapter of the Association for Computational Linguistics, 2021
  9. TPAMI
    tpami-21.jpg
    Multi-Scale 2D Temporal Adjacency Networks for Moment Localization with Natural Language
    Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, and Jiebo Luo
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
  10. arXiv
    hacs-19.jpg
    Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization
    Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, and Jiebo Luo
    arXiv preprint arXiv:1912.03612, 2019
  11. AAAI
    aaai-20.jpg
    Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
    Songyang Zhang, Houwen Peng, Jianlong Fu, and Jiebo Luo
    In the AAAI Conference on Artificial Intelligence, 2020
  12. ACMMM
    acmmm-19.jpg
    Exploiting Temporal Relationships in Video Moment Localization with Natural Language
    Songyang Zhang, Jinsong Su, and Jiebo Luo
    In ACM International Conference on Multimedia, 2019
  13. TMM
    tmm-18.jpg
    Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang Zhang, Yang Yang, Jun Xiao, Xiaoming Liu, Yi Yang, Di Xie, and Yueting Zhuang
    IEEE Transactions on Multimedia, 2018
  14. WACV
    wacv-17.png
    On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang Zhang, Xiaoming Liu, and Jun Xiao
    In IEEE Winter Conference on Applications of Computer Vision, 2017