Publications

Publications by categories in reversed chronological order. * denotes equal contribution.

2023

  1. Possible Worlds VQA: Cross Modality Bias Reduction in Visual Question Answering Systems from a Causal View
    Ali Vosoughi, Shijian Deng, Songyang ZhangYapeng TianChenliang Xu, and Jiebo Luo
    In TMM, 2023
  2. Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
    Jie An*Songyang Zhang*Harry Yang, Sonal Gupta, Jia-Bin HuangJiebo Luo, and Xi Yin
    In Submission, 2023

2022

  1. Learning a Grammar Inducer by Watching Millions of Instructional YouTube Videos
    Songyang ZhangLinfeng SongLifeng Jin, Haitao Mi, Kun XuDong Yu, and Jiebo Luo
    In Conference on Empirical Methods in Natural Language Processing, 2022
  2. Make-A-Video: Text-to-video Generation without Text-Video Data
    Uriel Singer, Adam Polyak, Thomas Hayes, Xi YinJie AnSongyang ZhangQiyuan HuHarry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman
    In International Conference on Learning Representations, 2022
  3. Rethinking the Evaluation of Unbiased Scene Graph Generation
    Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, and Jun Xiao
    In British Machine Vision Conference, 2022
  4. MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENeration
    Thomas Hayes*, Songyang Zhang*Xi Yin, Guan Pang, Sasha Sheng, Harry YangSongwei GeQiyuan Hu, and Devi Parikh
    In European Conference on Computer Vision, 2022
  5. Expanding Language-Image Pretrained Models for General Video Recognition
    Bolin Ni, Houwen PengMinghao ChenSongyang ZhangGaofeng MengJianlong FuShiming Xiang, and Haibin Ling
    In European Conference on Computer Vision, 2022
  6. The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
    Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, and Jun Xiao
    In IEEE Conference on Computer Vision and Pattern Recognition, 2022

2021

  1. Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation
    Jiahui Li, Kun Kuang, Lin Li, Long ChenSongyang Zhang, Jian Shao, and Jun Xiao
    In ACM International Conference on Multimedia, 2021
  2. SAT: 2D Semantics Assisted Training for 3D Visual Grounding
    Zhengyuan YangSongyang ZhangLiwei Wang, and Jiebo Luo
    In IEEE International Conference on Computer Vision, 2021
  3. Video-aided Unsupervised Grammar Induction
    Songyang ZhangLinfeng SongLifeng JinKun XuDong Yu, and Jiebo Luo
    In Conference of the North American Chapter of the Association for Computational Linguistics, 2021
  4. Boundary Proposal Network for Two-Stage Natural Language Video Localization
    Shaoning Xiao, Long ChenSongyang ZhangWei Ji, Jian Shao, Lu Ye, and Jun Xiao
    In the AAAI Conference on Artificial Intelligence, 2021
  5. Mi YouTube es Su YouTube? Analyzing the Cultures using YouTube Thumbnails of Popular Videos
    Songyang Zhang, Tolga Aktas, and Jiebo Luo
    In IEEE Big Data, 2021
  6. Multi-Scale 2D Temporal Adjacency Networks for Moment Localization with Natural Language
    Songyang ZhangHouwen PengJianlong Fu, Yijuan Lu, and Jiebo Luo
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

2020

  1. Content-based Analysis of the Cultural Differences between TikTok and Douyin
    Li Sun*, Haoqi Zhang*, Songyang Zhang, and Jiebo Luo
    In IEEE Big Data, 2020
  2. Global Image Sentiment Transfer
    Jie AnTianlang ChenSongyang Zhang, and Jiebo Luo
    In International Conference on Pattern Recognition, 2020
  3. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language
    Songyang ZhangHouwen PengJianlong Fu, and Jiebo Luo
    In the AAAI Conference on Artificial Intelligence, 2020

2019

  1. Learning Sparse 2D Temporal Adjacent Networks for Temporal Action Localization
    Songyang ZhangHouwen Peng, Le Yang, Jianlong Fu, and Jiebo Luo
    arXiv preprint arXiv:1912.03612, 2019
  2. Exploiting Temporal Relationships in Video Moment Localization with Natural Language
    Songyang ZhangJinsong Su, and Jiebo Luo
    In ACM International Conference on Multimedia, 2019

2018

  1. Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang ZhangYang YangJun XiaoXiaoming LiuYi Yang, Di Xie, and Yueting Zhuang
    IEEE Transactions on Multimedia, 2018

2017

  1. On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks
    Songyang ZhangXiaoming Liu, and Jun Xiao
    In IEEE Winter Conference on Applications of Computer Vision, 2017