Skip to main content

Showing 1–50 of 333 results for author: Peng, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02387  [pdf, other

    cs.AI cs.CL

    Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges

    Authors: Qian Niu, Junyu Liu, Ziqian Bi, Pohsun Feng, Benji Peng, Keyu Chen

    Abstract: This comprehensive review explores the intersection of Large Language Models (LLMs) and cognitive science, examining similarities and differences between LLMs and human cognitive processes. We analyze methods for evaluating LLMs cognitive abilities and discuss their potential as cognitive models. The review covers applications of LLMs in various cognitive fields, highlighting insights gained for c… ▽ More

    Submitted 5 September, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: 10 pages, 1 figure

  2. arXiv:2408.15565  [pdf, other

    cs.CL

    SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

    Authors: Dian Yu, Baolin Peng, Ye Tian, Linfeng Song, Haitao Mi, Dong Yu

    Abstract: There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding. Existing studies primarily focus on prompting powerful, closed-source models to generate seed training data followed by in-domain data augmentation, equipping LLMs with considerable capabilities for code-aided mathematical reasoning. However, continually training these models on augment… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  3. arXiv:2408.15098  [pdf, other

    cs.CV

    CLIP-AGIQA: Boosting the Performance of AI-Generated Image Quality Assessment with CLIP

    Authors: Zhenchen Tang, Zichuan Wang, Bo Peng, Jing Dong

    Abstract: With the rapid development of generative technologies, AI-Generated Images (AIGIs) have been widely applied in various aspects of daily life. However, due to the immaturity of the technology, the quality of the generated images varies, so it is important to develop quality assessment techniques for the generated images. Although some models have been proposed to assess the quality of generated ima… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: accepted by ICPR2024

  4. arXiv:2408.14400  [pdf, other

    cs.CV cs.LG

    Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping

    Authors: Vishal Batchu, Alex Wilson, Betty Peng, Carl Elkin, Umangi Jain, Christopher Van Arsdale, Ross Goroshin, Varun Gulshan

    Abstract: The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by estimating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a D… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 14 pages

  5. arXiv:2408.09347  [pdf, other

    cs.CV

    S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis

    Authors: Dongze Li, Kang Zhao, Wei Wang, Yifeng Ma, Bo Peng, Yingya Zhang, Jing Dong

    Abstract: Talking head synthesis is a practical technique with wide applications. Current Neural Radiance Field (NeRF) based approaches have shown their superiority on driving one-shot talking heads with videos or signals regressed from audio. However, most of them failed to take the audio as driven information directly, unable to enjoy the flexibility and availability of speech. Since mapping audio signals… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  6. arXiv:2408.08921  [pdf, other

    cs.AI cs.CL cs.IR

    Graph Retrieval-Augmented Generation: A Survey

    Authors: Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, Siliang Tang

    Abstract: Recently, Retrieval-Augmented Generation (RAG) has achieved remarkable success in addressing the challenges of Large Language Models (LLMs) without necessitating retraining. By referencing an external knowledge base, RAG refines LLM outputs, effectively mitigating issues such as ``hallucination'', lack of domain-specific knowledge, and outdated information. However, the complex structure of relati… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Ongoing work

  7. arXiv:2408.07759  [pdf, other

    cs.IR

    SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis

    Authors: Shentao Yang, Haichuan Yang, Linna Du, Adithya Ganesh, Bo Peng, Boying Liu, Serena Li, Ji Liu

    Abstract: The significance of estimating video watch time has been highlighted by the rising importance of (short) video recommendation, which has become a core product of mainstream social media platforms. Modeling video watch time, however, has been challenged by the complexity of user-video interaction, such as different user behavior modes in watching the recommended videos and varying watching probabil… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  8. arXiv:2408.06070  [pdf, other

    cs.CV

    ControlNeXt: Powerful and Efficient Control for Image and Video Generation

    Authors: Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia

    Abstract: Diffusion models have demonstrated remarkable and robust abilities in both image and video generation. To achieve greater control over generated results, researchers introduce additional architectures, such as ControlNet, Adapters and ReferenceNet, to integrate conditioning controls. However, current controllable generation methods often require substantial additional computational resources, espe… ▽ More

    Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: controllable generation

  9. arXiv:2408.03302  [pdf, other

    cs.CV

    TextIM: Part-aware Interactive Motion Synthesis from Text

    Authors: Siyuan Fan, Bo Du, Xiantao Cai, Bo Peng, Longling Sun

    Abstract: In this work, we propose TextIM, a novel framework for synthesizing TEXT-driven human Interactive Motions, with a focus on the precise alignment of part-level semantics. Existing methods often overlook the critical roles of interactive body parts and fail to adequately capture and align part-level semantics, resulting in inaccuracies and even erroneous movement outcomes. To address these issues, T… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  10. arXiv:2408.02006  [pdf, other

    cs.CL

    LLaSA: Large Language and E-Commerce Shopping Assistant

    Authors: Shuo Zhang, Boci Peng, Xinping Zhao, Boren Hu, Yun Zhu, Yanjia Zeng, Xuming Hu

    Abstract: The e-commerce platform has evolved rapidly due to its widespread popularity and convenience. Developing an e-commerce shopping assistant for customers is crucial to aiding them in quickly finding desired products and recommending precisely what they need. However, most previous shopping assistants face two main problems: (1) task-specificity, which necessitates the development of different models… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024 Workshop (Oral)

  11. arXiv:2408.01983  [pdf, other

    physics.plasm-ph cs.DC cs.PF

    Characterizing the Performance of the Implicit Massively Parallel Particle-in-Cell iPIC3D Code

    Authors: Jeremy J. Williams, Daniel Medeiros, Ivy B. Peng, Stefano Markidis

    Abstract: Optimizing iPIC3D, an implicit Particle-in-Cell (PIC) code, for large-scale 3D plasma simulations is crucial for space and astrophysical applications. This work focuses on characterizing iPIC3D's communication efficiency through strategic measures like optimal node placement, communication and computation overlap, and load balancing. Profiling and tracing tools are employed to analyze iPIC3D's com… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by SC Conference 2023 (SC23), prepared in the standardized ACM format and consists of 2 pages, which includes the main text, references, and figures. See https://sc23.supercomputing.org/proceedings/tech_poster/tech_poster_pages/rpost102.html

  12. arXiv:2408.01622  [pdf, other

    cs.RO cs.AI cs.LG

    Positive-Unlabeled Constraint Learning (PUCL) for Inferring Nonlinear Continuous Constraints Functions from Expert Demonstrations

    Authors: Baiyu Peng, Aude Billard

    Abstract: Planning for a wide range of real-world robotic tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. This paper presents a novel Positive-Unlabeled Constraint Learning (PUCL) algorithm to infer a continuous arb… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  13. arXiv:2407.16485  [pdf, other

    cs.LG cs.AI cs.RO

    Learning General Continuous Constraint from Demonstrations via Positive-Unlabeled Learning

    Authors: Baiyu Peng, Aude Billard

    Abstract: Planning for a wide range of real-world tasks necessitates to know and write all constraints. However, instances exist where these constraints are either unknown or challenging to specify accurately. A possible solution is to infer the unknown constraints from expert demonstration. The majority of prior works limit themselves to learning simple linear constraints, or require strong knowledge of th… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  14. arXiv:2407.15683  [pdf, other

    cs.CV

    Enhancing Transferability of Targeted Adversarial Examples: A Self-Universal Perspective

    Authors: Bowen Peng, Li Liu, Tianpeng Liu, Zhen Liu, Yongxiang Liu

    Abstract: Transfer-based targeted adversarial attacks against black-box deep neural networks (DNNs) have been proven to be significantly more challenging than untargeted ones. The impressive transferability of current SOTA, the generative methods, comes at the cost of requiring massive amounts of additional data and time-consuming training for each targeted label. This results in limited efficiency and flex… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 8 pages and 9 figures

  15. arXiv:2407.10485  [pdf, other

    cs.CV

    MM-Tracker: Motion Mamba with Margin Loss for UAV-platform Multiple Object Tracking

    Authors: Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

    Abstract: Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces both local object motion and global camera motion. Motion blur also increases the difficulty of detecting large moving objects. Previous UAV motion modeling approaches either focus only on local motion or ignore motion blurring effects, thus limiting their t… ▽ More

    Submitted 17 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.07207

  16. arXiv:2407.10481  [pdf, other

    cs.LG cs.AI cs.CL cs.GR

    SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation

    Authors: Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng

    Abstract: Physically-simulated models for human motion can generate high-quality responsive character animations, often in real-time. Natural language serves as a flexible interface for controlling these models, allowing expert and non-expert users to quickly create and edit their animations. Many recent physics-based animation methods, including those that use text interfaces, train control policies using… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  17. arXiv:2407.06584  [pdf, other

    cs.RO

    HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

    Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

    Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: IROS 2024

  18. arXiv:2407.05324  [pdf, other

    cs.CV

    PICA: Physics-Integrated Clothed Avatar

    Authors: Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, Juyong Zhang

    Abstract: We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Project page: https://ustc3dv.github.io/PICA/

  19. arXiv:2407.00617  [pdf, other

    cs.LG cs.AI cs.CL cs.GT

    Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

    Authors: Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

    Abstract: Reinforcement Learning with Human Feedback (RLHF) has achieved great success in aligning large language models (LLMs) with human preferences. Prevalent RLHF approaches are reward-based, following the Bradley-Terry (BT) model assumption, which may not fully capture the complexity of human preferences. In this paper, we explore RLHF under a general preference framework and approach it from a game-th… ▽ More

    Submitted 7 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  20. arXiv:2407.00320  [pdf, other

    cs.CL cs.AI cs.LG

    LiteSearch: Efficacious Tree Search for LLM

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Dian Yu, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree s… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  21. arXiv:2406.19131  [pdf, other

    cs.CV

    CELLO: Causal Evaluation of Large Vision-Language Models

    Authors: Meiqi Chen, Bo Peng, Yan Zhang, Chaochao Lu

    Abstract: Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments. Despite recent advancements in large vision-language models (LVLMs), their ability to comprehend causality remains unclear. Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  22. arXiv:2406.17338  [pdf, other

    eess.IV cs.CV cs.LG

    Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

    Authors: Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

    Abstract: Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propos… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  23. arXiv:2406.11528  [pdf, other

    econ.TH cs.GT

    Optimal Robust Contract Design

    Authors: Bo Peng, Zhihao Gavin Tang

    Abstract: We consider the robust contract design problem when the principal only has limited information about the actions the agent can take. The principal evaluates a contract according to its worst-case performance caused by the uncertain action space. Carroll (AER 2015) showed that a linear contract is optimal among deterministic contracts. Recently, Kambhampati (JET 2023) showed that the principal's pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Full version of EC 2024 paper

  24. DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search

    Authors: Jiuqi Wei, Botao Peng, Xiaodong Lee, Themis Palpanas

    Abstract: Locality-sensitive hashing (LSH) is a well-known solution for approximate nearest neighbor (ANN) search in high-dimensional spaces due to its robust theoretical guarantee on query accuracy. Traditional LSH-based methods mainly focus on improving the efficiency and accuracy of the query phase by designing different query strategies, but pay little attention to improving the efficiency of the indexi… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Journal ref: PVLDB, 17(9): 2241 - 2254, 2024

  25. arXiv:2406.09399  [pdf, other

    cs.CV

    OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation

    Authors: Junke Wang, Yi Jiang, Zehuan Yuan, Binyue Peng, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Tokenizer, serving as a translator to map the intricate visual data into a compact latent space, lies at the core of visual generative models. Based on the finding that existing tokenizers are tailored to image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. OmniTokenizer is designed with a spatial-temporal decoupled archite… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  26. arXiv:2406.06615  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Language Guided Skill Discovery

    Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

    Abstract: Skill discovery methods enable agents to learn diverse emergent behaviors without explicit rewards. To make learned skills useful for unknown downstream tasks, obtaining a semantically diverse repertoire of skills is essential. While some approaches introduce a discriminator to distinguish skills and others aim to increase state coverage, no existing work directly addresses the "semantic diversity… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  27. arXiv:2406.06525  [pdf, other

    cs.CV

    Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

    Authors: Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan

    Abstract: We introduce LlamaGen, a new family of image generation models that apply original ``next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly. We reexamine design spa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Codes and models: \url{https://github.com/FoundationVision/LlamaGen}

  28. arXiv:2406.06326  [pdf, other

    cs.CL

    Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

    Authors: Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Yipeng Zhang, Haitao Mi, Helen Meng

    Abstract: Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training and the constantly evolving nature of the world. To keep LLMs current, existing approaches typically involve continued pre-training on new documents. However, they frequently face difficulties in extracting stored knowledge. Motivated by the remarkable success of the Feynman Technique in ef… ▽ More

    Submitted 15 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 30 pages

  29. arXiv:2406.04316  [pdf, other

    cs.CV

    Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

    Authors: Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong

    Abstract: 6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  30. arXiv:2406.02357  [pdf, ps, other

    cs.GT cs.AI cs.DS cs.LG

    The complexity of approximate (coarse) correlated equilibrium for incomplete information games

    Authors: Binghui Peng, Aviad Rubinstein

    Abstract: We study the iteration complexity of decentralized learning of approximate correlated equilibria in incomplete information games. On the negative side, we prove that in $\mathit{extensive}$-$\mathit{form}$ $\mathit{games}$, assuming $\mathsf{PPAD} \not\subset \mathsf{TIME}(n^{\mathsf{polylog}(n)})$, any polynomial-time learning algorithms must take at least $2^{\log_2^{1-o(1)}(|\mathcal{I}|)}$ i… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  31. arXiv:2406.01238  [pdf, other

    cs.CL

    EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

    Authors: Zixuan Dong, Baoyun Peng, Yufei Wang, Jia Fu, Xiaodong Wang, Yongxue Shan, Xin Zhou

    Abstract: While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propos… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures, 3 tables

  32. arXiv:2405.11126  [pdf, other

    cs.CV cs.GR cs.LG

    Flexible Motion In-betweening with Diffusion Models

    Authors: Setareh Cohan, Guy Tevet, Daniele Reda, Xue Bin Peng, Michiel van de Panne

    Abstract: Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a s… ▽ More

    Submitted 23 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024. For project page and code, see https://setarehc.github.io/CondMDI/

  33. arXiv:2405.00622  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Evaluation of Language Models

    Authors: Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

    Abstract: Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 315 pages, 230 figures, 21 tables. Project website: https://opencausalab.github.io/CaLM

  34. arXiv:2404.19264  [pdf, other

    cs.RO

    DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

    Authors: Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

    Abstract: This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged rob… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  35. arXiv:2404.18246  [pdf, other

    cs.LG cs.CV

    AdaFSNet: Time Series Classification Based on Convolutional Network with a Adaptive and Effective Kernel Size Configuration

    Authors: Haoxiao Wang, Bo Peng, Jianhua Zhang, Xu Cheng

    Abstract: Time series classification is one of the most critical and challenging problems in data mining, existing widely in various fields and holding significant research importance. Despite extensive research and notable achievements with successful real-world applications, addressing the challenge of capturing the appropriate receptive field (RF) size from one-dimensional or multi-dimensional time serie… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCNN 2024

  36. arXiv:2404.16807  [pdf, other

    cs.CL

    Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning

    Authors: Tianhui Zhang, Bei Peng, Danushka Bollegala

    Abstract: Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge, while generating coherent sentences. Although the quality of the generated sentences is crucial, the diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. Large Language Models (LLMs) have shown proficienc… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 16 pages, 6 figures

  37. arXiv:2404.16522  [pdf, other

    eess.IV cs.LG

    A Deep Learning-Driven Pipeline for Differentiating Hypertrophic Cardiomyopathy from Cardiac Amyloidosis Using 2D Multi-View Echocardiography

    Authors: Bo Peng, Xiaofeng Li, Xinyu Li, Zhenghan Wang, Hui Deng, Xiaoxian Luo, Lixue Yin, Hongmei Zhang

    Abstract: Hypertrophic cardiomyopathy (HCM) and cardiac amyloidosis (CA) are both heart conditions that can progress to heart failure if untreated. They exhibit similar echocardiographic characteristics, often leading to diagnostic challenges. This paper introduces a novel multi-view deep learning approach that utilizes 2D echocardiography for differentiating between HCM and CA. The method begins by classif… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  38. arXiv:2404.12253  [pdf, other

    cs.CL cs.LG

    Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

    Authors: Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. I… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  39. arXiv:2404.11054  [pdf, other

    cs.CV

    Mumpy: Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection

    Authors: Ying Zhang, Yuezun Li, Bo Peng, Jiaran Zhou, Huiyu Zhou, Junyu Dong

    Abstract: The task of video inpainting detection is to expose the pixel-level inpainted regions within a video sequence. Existing methods usually focus on leveraging spatial and temporal inconsistencies. However, these methods typically employ fixed operations to combine spatial and temporal clues, limiting their applicability in different scenarios. In this paper, we introduce a novel Multilateral Temporal… ▽ More

    Submitted 29 August, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: BMVC 2024

  40. arXiv:2404.10685  [pdf, other

    cs.CV cs.GR

    Generating Human Interaction Motions in Scenes with Text Control

    Authors: Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe

    Abstract: We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project Page: https://research.nvidia.com/labs/toronto-ai/tesmo/

  41. arXiv:2404.10099  [pdf, other

    math.OC cs.LG

    Feature selection in linear SVMs via hard cardinality constraint: a scalable SDP decomposition approach

    Authors: Immanuel Bomze, Federico D'Onofrio, Laura Palagi, Bo Peng

    Abstract: In this paper, we study the embedded feature selection problem in linear Support Vector Machines (SVMs), in which a cardinality constraint is employed, leading to a fully explainable selection model. The problem is NP-hard due to the presence of the cardinality constraint, even though the original linear SVM amounts to a problem solvable in polynomial time. To handle the hard problem, we first int… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Submitted to European Journal of Operational Research. arXiv admin note: text overlap with arXiv:1808.02435 by other authors

    MSC Class: 90C22; 90C11 ACM Class: I.5.1; I.2.0

  42. arXiv:2404.09338  [pdf, other

    cs.CL

    Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

    Authors: Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

    Abstract: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve factuality during inference by leveraging LLMs' hierarchical representation of factual knowledge, manipulating the predicted distributions at inference time. Current… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  43. arXiv:2404.08549  [pdf

    eess.IV cs.CV physics.bio-ph

    Practical Guidelines for Cell Segmentation Models Under Optical Aberrations in Microscopy

    Authors: Boyuan Peng, Jiaju Chen, P. Bilha Githinji, Ijaz Gul, Qihui Ye, Minjiang Chen, Peiwu Qin, Xingru Huang, Chenggang Yan, Dongmei Yu, Jiansong Ji, Zhenglin Chen

    Abstract: Cell segmentation is essential in biomedical research for analyzing cellular morphology and behavior. Deep learning methods, particularly convolutional neural networks (CNNs), have revolutionized cell segmentation by extracting intricate features from images. However, the robustness of these methods under microscope optical aberrations remains a critical challenge. This study evaluates cell image… ▽ More

    Submitted 25 August, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  44. arXiv:2404.08341  [pdf, other

    cs.CV

    Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts

    Authors: Yang Li, Songlin Yang, Wei Wang, Ziwen He, Bo Peng, Jing Dong

    Abstract: Highly realistic AI generated face forgeries known as deepfakes have raised serious social concerns. Although DNN-based face forgery detection models have achieved good performance, they are vulnerable to latest generative methods that have less forgery traces and adversarial attacks. This limitation of generalization and robustness hinders the credibility of detection results and requires more ex… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to ICME2024

  45. arXiv:2404.07470  [pdf, other

    cs.CL

    Scalable Language Model with Generalized Continual Learning

    Authors: Bohao Peng, Zhuotao Tian, Shu Liu, Mingchang Yang, Jiaya Jia

    Abstract: Continual learning has gained increasing importance as it facilitates the acquisition and refinement of scalable knowledge and skills in language models. However, existing methods typically encounter strict limitations and challenges in real-world scenarios, such as reliance on experience replay, optimization constraints, and inference task-ID. In this study, we introduce the Scalable Language Mod… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: The Twelfth International Conference on Learning Representations

  46. arXiv:2404.05892  [pdf, other

    cs.CL cs.AI

    Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

    Authors: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao , et al. (3 additional authors not shown)

    Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  47. arXiv:2404.04875  [pdf, other

    cs.CV

    NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization

    Authors: Peng Tu, Xun Zhou, Mingming Wang, Xiaojun Yang, Bo Peng, Ping Chen, Xiu Su, Yawen Huang, Yefeng Zheng, Chang Xu

    Abstract: Neural Radiance Fields (NeRF) have emerged as a paradigm-shifting methodology for the photorealistic rendering of objects and environments, enabling the synthesis of novel viewpoints with remarkable fidelity. This is accomplished through the strategic utilization of object-centric camera poses characterized by significant inter-frame overlap. This paper explores a compelling, alternative utility o… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 18 pages

  48. arXiv:2404.04062  [pdf, other

    cs.LG math.OC

    Derivative-free tree optimization for complex systems

    Authors: Ye Wei, Bo Peng, Ruiwen Xie, Yangtao Chen, Yu Qin, Peng Wen, Stefan Bauer, Po-Yen Tung

    Abstract: A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 d… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 39 pages, 3 figures

  49. arXiv:2404.02905  [pdf, other

    cs.CV cs.AI

    Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

    Authors: Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, Liwei Wang

    Abstract: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction". This simple, intuitive methodology allows autoregressive (AR) transformers to learn visual distributions fast and generalize well: V… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: Demo website: https://var.vision/

  50. arXiv:2404.00230  [pdf, other

    cs.CV

    Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space

    Authors: Zheling Meng, Bo Peng, Jing Dong

    Abstract: Watermarking is a tool for actively identifying and attributing the images generated by latent diffusion models. Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression, while watermarks with superior robustness usually significantly damage image qualit… ▽ More

    Submitted 11 July, 2024; v1 submitted 29 March, 2024; originally announced April 2024.