ECCV2024|AIGC相关论文汇总(如果觉得有帮助,欢迎点赞和收藏)
Awesome-ECCV2024-AIGC 1.图像生成(Image Generation/Image Synthesis) Accelerating Diffusion Sampling with Optimized Time Steps AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation A Watermark-Conditioned Diffusion Model for IP Protection BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image Data Augmentation for Saliency Prediction via Latent Diffusion Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics DiffFAS: Face Anti-Spoofing via Generative Diffusion Models DiffiT: Diffusion Vision Transformers for Image Generation Large-scale Reinforcement Learning for Diffusion Models MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation Memory-Efficient Fine-Tuning for Quantized Diffusion Model OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts 2.图像编辑(Image Editing) A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing StableDrag: Stable Dragging for Point-based Image Editing TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning 3.视频生成(Video Generation/Video Synthesis) Audio-Synchronized Visual Animation Dyadic Interaction Modeling for Social Behavior Generation EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis FreeInit : Bridging Initialization Gap in Video Diffusion Models MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video 4.视频编辑(Video Editing) Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation DragAnything: Motion Control for Anything using Entity Representation 5.3D生成(3D Generation/3D Synthesis) EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes GVGEN:Text-to-3D Generation with Volumetric Representation Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM ParCo: Part-Coordinating Text-to-Motion Synthesis Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models 6.3D编辑(3D Editing) Gaussian Grouping: Segment and Edit Anything in 3D Scenes SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing 7.多模态大语言模型(Multi-Modal Large Language Models) An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models ControlCap: Controllable Region-level Captioning DriveLM: Driving with Graph Visual Question Answering Elysium: Exploring Object-level Perception in Videos via MLLM Empowering Multimodal Large Language Model as a Powerful Data Generator GiT: Towards Generalist Vision Transformer through Universal Language Interface How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Long-CLIP: Unlocking the Long-Text Capability of CLIP MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Merlin:Empowering Multimodal LLMs with Foresight Minds Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models PointLLM: Empowering Large Language Models to Understand Point Clouds R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation ShareGPT4V: Improving Large Multi-Modal Models with Better Captions ST-LLM: Large Language Models Are Effective Temporal Learners TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias UniIR: Training and Benchmarking Universal Multimodal Information Retrievers 8.其他任务(Others) 参考 相关整理Awesome-ECCV2024-AIGC
A Collection of Papers and Codes for ECCV2024 AIGC
整理汇总下2024年ECCV AIGC相关的论文和代码,具体如下。
欢迎star,fork和PR~
优先在Github更新:Awesome-ECCV2024-AIGC,欢迎star~
知乎:https://zhuanlan.zhihu.com/p/706699484
参考或转载请注明出处
ECCV2024官网:https://eccv.ecva.net/
ECCV接收论文列表:
ECCV完整论文库:
开会时间:2024年9月29日-10月4日
论文接收公布时间:2024年
【Contents】
1.图像生成(Image Generation/Image Synthesis) 2.图像编辑(Image Editing) 3.视频生成(Video Generation/Image Synthesis) 4.视频编辑(Video Editing) 5.3D生成(3D Generation/3D Synthesis) 6.3D编辑(3D Editing) 7.多模态大语言模型(Multi-Modal Large Language Model) 8.其他多任务(Others)1.图像生成(Image Generation/Image Synthesis)
Accelerating Diffusion Sampling with Optimized Time Steps
Paper: https://arxiv.org/abs/2402.17376 Code: https://github.com/scxue/DM-NonUniformAnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
Paper: https://arxiv.org/abs/2406.18958 Code: https://github.com/open-mmlab/AnyControlA Watermark-Conditioned Diffusion Model for IP Protection
Paper: Code: https://github.com/rmin2000/WaDiffBeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Paper: https://arxiv.org/abs/2404.04544 Code: https://github.com/gwang-kim/BeyondSceneComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
Paper: https://arxiv.org/abs/2402.11849 Code:Data Augmentation for Saliency Prediction via Latent Diffusion
Paper: Code: https://github.com/IVRL/AugSalDefect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
Paper: https://arxiv.org/abs/2310.17316 Code: https://github.com/EnVision-Research/Defect_SpectrumDiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Paper: Code: https://github.com/murphytju/DiffFASDiffiT: Diffusion Vision Transformers for Image Generation
Paper: https://arxiv.org/abs/2312.02139 Code: https://github.com/NVlabs/DiffiTLarge-scale Reinforcement Learning for Diffusion Models
Paper: https://arxiv.org/abs/2401.12244 Code: https://github.com/pinterest/atg-research/tree/main/joint-rl-diffusionMasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
Paper: https://arxiv.org/abs/2405.05806 Code: https://github.com/csyxwei/MasterWeaverMemory-Efficient Fine-Tuning for Quantized Diffusion Model
Paper: Code: https://github.com/ugonfor/TuneQDMOMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
Paper: https://arxiv.org/abs/2403.10983 Code: https://github.com/kongzhecn/OMGSwitch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
Paper: https://arxiv.org/abs/2403.09176 Code: https://github.com/byeongjun-park/Switch-DiT2.图像编辑(Image Editing)
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Paper: https://arxiv.org/abs/2312.03594 Code: https://github.com/open-mmlab/PowerPaintBrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Paper: https://arxiv.org/abs/2403.06976 Code: https://github.com/TencentARC/BrushNetFlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
Paper: Code: https://github.com/kookie12/FlexiEditStableDrag: Stable Dragging for Point-based Image Editing
Paper: https://arxiv.org/abs/2403.04437 Code:TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
Paper: https://arxiv.org/abs/2403.15033 Code: https://github.com/TinyBeauty/TinyBeauty3.视频生成(Video Generation/Video Synthesis)
Audio-Synchronized Visual Animation
Paper: https://arxiv.org/abs/2403.05659 Code: https://github.com/lzhangbj/ASVADyadic Interaction Modeling for Social Behavior Generation
Paper: https://arxiv.org/abs/2403.09069 Code: https://github.com/Boese0601/Dyadic-Interaction-ModelingEDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Paper: https://arxiv.org/abs/2404.01647 Code: https://github.com/tanshuai0219/EDTalkFreeInit : Bridging Initialization Gap in Video Diffusion Models
Paper: https://arxiv.org/abs/2312.07537 Code: https://github.com/TianxingWu/FreeInitMOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Paper: https://arxiv.org/abs/2405.20222 Code: https://github.com/MyNiuuu/MOFA-VideoZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Paper: https://arxiv.org/abs/2310.01324 Code:4.视频编辑(Video Editing)
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Paper: https://arxiv.org/abs/2403.13745 Code: https://github.com/G-U-N/Be-Your-OutpainterDragAnything: Motion Control for Anything using Entity Representation
Paper: https://arxiv.org/abs/2403.07420 Code: https://github.com/showlab/DragAnything5.3D生成(3D Generation/3D Synthesis)
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Paper: https://arxiv.org/abs/2405.00915 Code: https://github.com/ymxlzgy/echosceneGenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Paper: https://arxiv.org/abs/2405.00915 Code: https://github.com/ibrahimethemhamamci/GenerateCTGVGEN:Text-to-3D Generation with Volumetric Representation
Paper: Code: https://github.com/SOTAMak1r/GVGENMotion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
Paper: https://arxiv.org/abs/2403.07487 Code: https://github.com/steve-zeyu-zhang/MotionMambaParCo: Part-Coordinating Text-to-Motion Synthesis
Paper: https://arxiv.org/abs/2403.18512 Code: https://github.com/qrzou/ParCoSurf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
Paper: https://arxiv.org/abs/2311.17050 Code: https://github.com/Yzmblog/SurfD6.3D编辑(3D Editing)
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Paper: https://arxiv.org/abs/2312.00732 Code: https://github.com/lkeab/gaussian-groupingSC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Paper: https://arxiv.org/abs/2403.18512 Code: https://github.com/JarrentWu1031/SC4DTexture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
Paper: https://arxiv.org/abs/2403.10050 Code: https://github.com/slothfulxtx/Texture-GS7.多模态大语言模型(Multi-Modal Large Language Models)
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Paper: https://arxiv.org/abs/2403.06764 Code: https://github.com/pkunlp-icler/FastVControlCap: Controllable Region-level Captioning
Paper: https://arxiv.org/abs/2401.17910 Code: https://github.com/callsys/ControlCapDriveLM: Driving with Graph Visual Question Answering
Paper: https://arxiv.org/abs/2312.14150 Code: https://github.com/OpenDriveLab/DriveLMElysium: Exploring Object-level Perception in Videos via MLLM
Paper: https://arxiv.org/abs/2403.16558 Code: https://github.com/Hon-Wong/ElysiumEmpowering Multimodal Large Language Model as a Powerful Data Generator
Paper: Code: https://github.com/zhaohengyuan1/GenixerGiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper: https://arxiv.org/abs/2403.09394 Code: https://github.com/Haiyang-W/GiTHow Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Paper: https://arxiv.org/abs/2311.17600 Code: https://github.com/UCSC-VLAA/vllm-safety-benchmarkLong-CLIP: Unlocking the Long-Text Capability of CLIP
Paper: https://arxiv.org/abs/2403.15378 Code: https://github.com/beichenzbc/Long-CLIPMathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper: https://arxiv.org/abs/2403.14624 Code: https://github.com/ZrrSkywalker/MathVerseMerlin:Empowering Multimodal LLMs with Foresight Minds
Paper: https://arxiv.org/abs/2312.00589 Code: https://github.com/Ahnsun/merlinMeta-Prompting for Automating Zero-shot Visual Recognition with LLMs
Paper: https://arxiv.org/abs/2403.11755 Code: https://github.com/jmiemirza/Meta-PromptingMM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Paper: https://arxiv.org/abs/2403.14624 Code: https://github.com/isXinLiu/MM-SafetyBenchPointLLM: Empowering Large Language Models to Understand Point Clouds
Paper: https://arxiv.org/abs/2308.16911 Code: https://github.com/OpenRobotLab/PointLLMR2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
Paper: https://arxiv.org/abs/2403.04924 Code: https://github.com/lxa9867/r2benchSAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Paper: Code: https://github.com/AI-Application-and-Integration-Lab/SAM4MLLMShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper: https://arxiv.org/abs/2311.12793 Code: https://github.com/ShareGPT4Omni/ShareGPT4VST-LLM: Large Language Models Are Effective Temporal Learners
Paper: https://arxiv.org/abs/2404.00308 Code: https://github.com/TencentARC/ST-LLMTTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Paper: https://arxiv.org/abs/2404.00384 Code: https://github.com/shjo-april/TTDUniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Paper: https://arxiv.org/abs/2311.17136 Code: https://github.com/TIGER-AI-Lab/UniIR8.其他任务(Others)
持续更新~
参考
相关整理
Awesome-CVPR2024-AIGC Awesome-AIGC-Research-Groups Awesome-Low-Level-Vision-Research-Groups Awesome-CVPR2024-CVPR2021-CVPR2020-Low-Level-Vision Awesome-ECCV2020-Low-Level-Vision总结