Publications

Publications by categories in reversed chronological order. * denotes equal contribution.

2024

  1. nova-24.jpg
    The Amazon Nova family of models: Technical report and model card
    Amazon Artificial General Intelligence
    Amazon Technical Reports, 2024

2023

  1. acl-23.jpg
    Possible Worlds VQA: Cross Modality Bias Reduction in Visual Question Answering Systems from a Causal View
    Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, and Jiebo Luo
    In TMM, 2023
  2. arXiv
    iccv-23.webp
    Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
    Jie An*, Songyang Zhang*, Harry Yang, Sonal Gupta, Jia-Bin Huang, Jiebo Luo, and Xi Yin
    arXiv preprint arXiv:2304.08477, 2023

2022

  1. EMNLP
    emnlp-22.jpg
    Learning a Grammar Inducer by Watching Millions of Instructional YouTube Videos
    Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, and Jiebo Luo
    In Conference on Empirical Methods in Natural Language Processing, 2022
  2. ICLR
    makeavideo.webp
    Make-A-Video: Text-to-video Generation without Text-Video Data
    Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman
    In International Conference on Learning Representations, 2022
  3. BMVC
    bmvc-22.jpg
    Rethinking the Evaluation of Unbiased Scene Graph Generation
    Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, and Jun Xiao
    In British Machine Vision Conference, 2022
  4. ECCV
    mugen.webp
    MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
    Thomas Hayes*, Songyang Zhang*, Xi Yin, Guan Pang, Sasha Sheng, Harry Yang, Songwei Ge, Qiyuan Hu, and Devi Parikh
    In European Conference on Computer Vision, 2022
  5. ECCV
    xclip.png
    Expanding Language-Image Pretrained Models for General Video Recognition
    Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, and Haibin Ling
    In European Conference on Computer Vision, 2022
  6. CVPR
    cvpr-22.jpg
    The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
    Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, and Jun Xiao
    In IEEE Conference on Computer Vision and Pattern Recognition, 2022

2021

  1. ACMMM
    acmmm-21.png
    Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation
    Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, and Jun Xiao
    In ACM International Conference on Multimedia, 2021
  2. ICCV
    iccv-21.jpg
    SAT: 2D Semantics Assisted Training for 3D Visual Grounding
    Zhengyuan Yang, Songyang Zhang, Liwei Wang, and Jiebo Luo
    In IEEE International Conference on Computer Vision, 2021
  3. NAACL
    naacl-21.webp
    Video-aided Unsupervised Grammar Induction
    Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, and Jiebo Luo
    In Conference of the North American Chapter of the Association for Computational Linguistics, 2021
  4. AAAI
    aaai-21.jpg
    Boundary Proposal Network for Two-Stage Natural Language Video Localization
    Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, and Jun Xiao
    In the AAAI Conference on Artificial Intelligence, 2021
  5. BigData
    bd-21.jpg
    Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube Thumbnails of Popular Videos
    Songyang Zhang, Tolga Aktas, and Jiebo Luo
    In IEEE Big Data, 2021
  6. TPAMI
    tpami-21.jpg
    Multi-Scale 2D Temporal Adjacency Networks for Moment Localization with Natural Language
    Songyang Zhang, Houwen Peng, Jianlong Fu, Yijuan Lu, and Jiebo Luo
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

2020

  1. BigData
    bd-20.jpg
    Content-based Analysis of the Cultural Differences between TikTok and Douyin
    Li Sun*, Haoqi Zhang*, Songyang Zhang, and Jiebo Luo
    In IEEE Big Data, 2020
  2. ICPR
    icpr-20.jpg
    Global Image Sentiment Transfer
    Jie An, Tianlang Chen, Songyang Zhang, and Jiebo Luo
    In International Conference on Pattern Recognition, 2020
  3. AAAI
    aaai-20.jpg
    Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
    Songyang Zhang, Houwen Peng, Jianlong Fu, and Jiebo Luo
    In the AAAI Conference on Artificial Intelligence, 2020

2019

  1. arXiv
    hacs-19.jpg
    Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization
    Songyang Zhang, Houwen Peng, Le Yang, Jianlong Fu, and Jiebo Luo
    arXiv preprint arXiv:1912.03612, 2019
  2. ACMMM
    acmmm-19.jpg
    Exploiting Temporal Relationships in Video Moment Localization with Natural Language
    Songyang Zhang, Jinsong Su, and Jiebo Luo
    In ACM International Conference on Multimedia, 2019

2018

  1. TMM
    tmm-18.jpg
    Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang Zhang, Yang Yang, Jun Xiao, Xiaoming Liu, Yi Yang, Di Xie, and Yueting Zhuang
    IEEE Transactions on Multimedia, 2018

2017

  1. WACV
    wacv-17.png
    On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang Zhang, Xiaoming Liu, and Jun Xiao
    In IEEE Winter Conference on Applications of Computer Vision, 2017