Skip to main content

Showing 1–50 of 60 results for author: Mo, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02877  [pdf, other

    cs.AI cs.CL cs.LG

    Configurable Foundation Models: Building LLMs from a Modular Perspective

    Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

    Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2407.12887  [pdf, other

    cs.RO

    Self-Adaptive Robust Motion Planning for High DoF Robot Manipulator using Deep MPC

    Authors: Ye Zhang, Kangtong Mo, Fangzhou Shen, Xuanzhen Xu, Xingyu Zhang, Jiayue Yu, Chang Yu

    Abstract: In contemporary control theory, self-adaptive methodologies are highly esteemed for their inherent flexibility and robustness in managing modeling uncertainties. Particularly, robust adaptive control stands out owing to its potent capability of leveraging robust optimization algorithms to approximate cost functions and relax the stringent constraints often associated with conventional self-adaptiv… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2406.18158  [pdf, other

    cs.RO cs.CV

    3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

    Authors: Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

    Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage R… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.13626  [pdf, other

    cs.CL cs.AI

    Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines

    Authors: Kangtong Mo, Wenyan Liu, Xuanzhen Xu, Chang Yu, Yuelin Zou, Fangqing Xia

    Abstract: In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our an… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2405.11656  [pdf, other

    cs.RO cs.AI

    URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

    Authors: Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta

    Abstract: Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by… ▽ More

    Submitted 31 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted at RSS2024

  6. arXiv:2405.04710  [pdf, other

    cs.LG math.OC

    Untangling Lariats: Subgradient Following of Variationally Penalized Objectives

    Authors: Kai-Chia Mo, Shai Shalev-Shwartz, Nisæl Shártov

    Abstract: We describe a novel subgradient following apparatus for calculating the optimum of convex problems with variational penalties. In this setting, we receive a sequence $y_i,\ldots,y_n$ and seek a smooth sequence $x_1,\ldots,x_n$. The smooth sequence attains the minimum Bregman divergence to an input sequence with additive variational penalties in the general form of $\sum_i g_i(x_{i+1}-x_i)$. We der… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  7. arXiv:2401.04143  [pdf, other

    cs.CV

    RHOBIN Challenge: Reconstruction of Human Object Interaction

    Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeongjin Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

    Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

  8. arXiv:2312.15610  [pdf, other

    cs.CV

    Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks

    Authors: Yijia Weng, Kaichun Mo, Ruoxi Shi, Yanchao Yang, Leonidas J. Guibas

    Abstract: Some extremely low-dimensional yet crucial geometric eigen-lengths often determine the success of some geometric tasks. For example, the height of an object is important to measure to check if it can fit between the shelves of a cabinet, while the width of a couch is crucial when trying to move it through a doorway. Humans have materialized such crucial geometric eigen-lengths in common sense sinc… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: ICML 2023. Project page: https://yijiaweng.github.io/geo-eigen-length

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:36958-36977, 2023

  9. arXiv:2311.02337  [pdf, other

    cs.RO cs.AI cs.CV

    STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots

    Authors: Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox

    Abstract: Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objec… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: CoRL 2023, project page: https://sites.google.com/view/stow-corl23

  10. arXiv:2309.07473  [pdf, other

    cs.RO cs.AI

    Where2Explore: Few-shot Affordance Learning for Unseen Novel Categories of Articulated Objects

    Authors: Chuanruo Ning, Ruihai Wu, Haoran Lu, Kaichun Mo, Hao Dong

    Abstract: Articulated object manipulation is a fundamental yet challenging task in robotics. Due to significant geometric and semantic variations across object categories, previous manipulation models struggle to generalize to novel categories. Few-shot learning is a promising solution for alleviating this issue by allowing robots to perform a few interactions with unseen objects. However, extant approaches… ▽ More

    Submitted 15 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023

  11. arXiv:2308.01857  [pdf, other

    cs.AR

    iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library

    Authors: Xingquan Li, Simin Tao, Zengrong Huang, Shijian Chen, Zhisheng Zeng, Liwei Ni, Zhipeng Huang, Chunan Zhuang, Hongxi Wu, Weiguo Li1, Xueyan Zhao, He Liu, Shuaiying Long, Wei He, Bojun Liu, Sifeng Gan, Zihao Yu, Tong Liu, Yuchi Miao, Zhiyuan Yan, Hao Wang, Jie Zhao, Yifan Li, Ruizhi Liu, Xiaoze Lin , et al. (31 additional authors not shown)

    Abstract: Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Opti… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  12. arXiv:2304.00341  [pdf, other

    cs.CV

    JacobiNeRF: NeRF Shaping with Mutual Information Gradients

    Authors: Xiaomeng Xu, Yanchao Yang, Kaichun Mo, Boxiao Pan, Li Yi, Leonidas Guibas

    Abstract: We propose a method that trains a neural radiance field (NeRF) to encode not only the appearance of the scene but also semantic correlations between scene points, regions, or entities -- aiming to capture their mutual co-variation patterns. In contrast to the traditional first-order photometric reconstruction objective, our method explicitly regularizes the learning dynamics to align the Jacobians… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

  13. arXiv:2303.06163  [pdf, other

    cs.CV

    Category-Level Multi-Part Multi-Joint 3D Shape Assembly

    Authors: Yichen Li, Kaichun Mo, Yueqi Duan, He Wang, Jiequan Zhang, Lin Shao, Wojciech Matusik, Leonidas Guibas

    Abstract: Shape assembly composes complex shapes geometries by arranging simple part geometries and has wide applications in autonomous robotic assembly and CAD modeling. Existing works focus on geometry reasoning and neglect the actual physical assembly process of matching and fitting joints, which are the contact surfaces connecting different parts. In this paper, we consider contacting joints for the tas… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  14. arXiv:2303.01310  [pdf, other

    cs.RO

    Learning Language-Conditioned Deformable Object Manipulation with Graph Dynamics

    Authors: Yuhong Deng, Kai Mo, Chongkun Xia, Xueqian Wang

    Abstract: Multi-task learning of deformable object manipulation is a challenging problem in robot manipulation. Most previous works address this problem in a goal-conditioned way and adapt goal images to specify different tasks, which limits the multi-task learning performance and can not generalize to new tasks. Thus, we adapt language instruction to specify deformable object manipulation tasks and propose… ▽ More

    Submitted 29 January, 2024; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: has been accepted by ICRA 2024

  15. arXiv:2302.10237  [pdf, other

    cs.GR cs.CV

    SceneHGN: Hierarchical Graph Networks for 3D Indoor Scene Generation with Fine-Grained Geometry

    Authors: Lin Gao, Jia-Mu Sun, Kaichun Mo, Yu-Kun Lai, Leonidas J. Guibas, Jie Yang

    Abstract: 3D indoor scenes are widely used in computer graphics, with applications ranging from interior design to gaming to virtual and augmented reality. They also contain rich information, including room layout, as well as furniture type, geometry, and placement. High-quality 3D indoor scenes are highly demanded while it requires expertise and is time-consuming to design high-quality 3D indoor scenes man… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: 21 pages, 21 figures, Project: http://geometrylearning.com/scenehgn/

  16. arXiv:2301.09209  [pdf, other

    cs.CV cs.CL

    Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

    Authors: Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang

    Abstract: We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatio-temporal context formed by past actions on objects, coined action context. We propose TransFusion, a multimodal transformer-based architecture. It exploits the representational power of language by summarizing the action context. TransFusion leverages pre-trained image captioning and vi… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

  17. Foldsformer: Learning Sequential Multi-Step Cloth Manipulation With Space-Time Attention

    Authors: Kai Mo, Chongkun Xia, Xueqian Wang, Yuhong Deng, Xuehai Gao, Bin Liang

    Abstract: Sequential multi-step cloth manipulation is a challenging problem in robotic manipulation, requiring a robot to perceive the cloth state and plan a sequence of chained actions leading to the desired state. Most previous works address this problem in a goal-conditioned way, and goal observation must be given for each specific task and cloth configuration, which is not practical and efficient. Thus,… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

    Comments: 8 pages, 6 figures, published to IEEE Robotics & Automation Letters (RA-L)

    Journal ref: IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 760-767, Feb. 2023

  18. arXiv:2211.00382  [pdf, other

    cs.CV

    Seg&Struct: The Interplay Between Part Segmentation and Structure Inference for 3D Shape Parsing

    Authors: Jeonghyun Kim, Kaichun Mo, Minhyuk Sung, Woontack Woo

    Abstract: We propose Seg&Struct, a supervised learning framework leveraging the interplay between part segmentation and structure inference and demonstrating their synergy in an integrated framework. Both part segmentation and structure inference have been extensively studied in the recent deep learning literature, while the supervisions used for each task have not been fully exploited to assist the other t… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: WACV 2023 (Algorithm Track)

  19. arXiv:2210.01781  [pdf, other

    cs.CV cs.RO

    COPILOT: Human-Environment Collision Prediction and Localization from Egocentric Videos

    Authors: Boxiao Pan, Bokui Shen, Davis Rempe, Despoina Paschalidou, Kaichun Mo, Yanchao Yang, Leonidas J. Guibas

    Abstract: The ability to forecast human-environment collisions from egocentric observations is vital to enable collision avoidance in applications such as VR, AR, and wearable assistive robotics. In this work, we introduce the challenging problem of predicting collisions in diverse environments from multi-view egocentric videos captured from body-mounted cameras. Solving this problem requires a generalizabl… ▽ More

    Submitted 26 March, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

  20. arXiv:2207.01971  [pdf, other

    cs.CV cs.RO

    DualAfford: Learning Collaborative Visual Affordance for Dual-gripper Manipulation

    Authors: Yan Zhao, Ruihai Wu, Zhehuan Chen, Yourong Zhang, Qingnan Fan, Kaichun Mo, Hao Dong

    Abstract: It is essential yet challenging for future home-assistant robots to understand and manipulate diverse 3D objects in daily human environments. Towards building scalable systems that can perform diverse manipulation tasks over various 3D shapes, recent works have advocated and demonstrated promising results learning visual actionable affordance, which labels every point over the input 3D geometry wi… ▽ More

    Submitted 27 March, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

  21. arXiv:2205.02834  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

    Authors: Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

    Abstract: This paper studies the problem of fixing malfunctional 3D objects. While previous works focus on building passive perception models to learn the functionality from static 3D objects, we argue that functionality is reckoned with respect to the physical interactions between the object and the user. Given a malfunctional object, humans can perform mental simulations to reason about its functionality… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: CVPR 2022. Project page: http://fixing-malfunctional.csail.mit.edu

  22. arXiv:2204.09443  [pdf, other

    cs.CV

    GIMO: Gaze-Informed Human Motion Prediction in Context

    Authors: Yang Zheng, Yanchao Yang, Kaichun Mo, Jiaman Li, Tao Yu, Yebin Liu, C. Karen Liu, Leonidas J. Guibas

    Abstract: Predicting human motion is critical for assistive robots and AR/VR applications, where the interaction with humans needs to be safe and comfortable. Meanwhile, an accurate prediction depends on understanding both the scene context and human intentions. Even though many works study scene-aware human motion prediction, the latter is largely underexplored due to the lack of ego-centric views that dis… ▽ More

    Submitted 19 July, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  23. arXiv:2112.10143  [pdf, other

    cs.RO cs.AI cs.CV cs.LG cs.MA

    RoboAssembly: Learning Generalizable Furniture Assembly Policy in a Novel Multi-robot Contact-rich Simulation Environment

    Authors: Mingxin Yu, Lin Shao, Zhehuan Chen, Tianhao Wu, Qingnan Fan, Kaichun Mo, Hao Dong

    Abstract: Part assembly is a typical but challenging task in robotics, where robots assemble a set of individual parts into a complete shape. In this paper, we develop a robotic assembly simulation environment for furniture assembly. We formulate the part assembly task as a concrete reinforcement learning problem and propose a pipeline for robots to learn to assemble a diverse set of chairs. Experiments sho… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

    Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2022

  24. arXiv:2112.07954  [pdf, other

    cs.CV

    Object Pursuit: Building a Space of Objects via Discriminative Weight Generation

    Authors: Chuanyu Pan, Yanchao Yang, Kaichun Mo, Yueqi Duan, Leonidas Guibas

    Abstract: We propose a framework to continuously learn object-centric representations for visual learning and understanding. Existing object-centric representations either rely on supervisions that individualize objects in the scene, or perform unsupervised disentanglement that can hardly deal with complex scenes in the real world. To mitigate the annotation burden and relax the constraints on the statistic… ▽ More

    Submitted 2 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: 24 pages. This paper has been accepted by ICLR2022 (OpenReview: https://openreview.net/forum?id=lbauk6wK2-y)

  25. arXiv:2112.06253  [pdf, other

    cs.LG cs.AI

    Up to 100$\times$ Faster Data-free Knowledge Distillation

    Authors: Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song

    Abstract: Data-free knowledge distillation (DFKD) has recently been attracting increasing attention from research communities, attributed to its capability to compress a model only using synthetic data. Despite the encouraging results achieved, state-of-the-art DFKD methods still suffer from the inefficiency of data synthesis, making the data-free training process extremely time-consuming and thus inapplica… ▽ More

    Submitted 24 February, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

  26. arXiv:2112.05298  [pdf, other

    cs.CV cs.AI cs.RO

    IFR-Explore: Learning Inter-object Functional Relationships in 3D Indoor Scenes

    Authors: Qi Li, Kaichun Mo, Yanchao Yang, Hang Zhao, Leonidas Guibas

    Abstract: Building embodied intelligent agents that can interact with 3D indoor environments has received increasing research attention in recent years. While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e.g., a switch… ▽ More

    Submitted 4 April, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  27. arXiv:2112.02608  [pdf

    eess.IV cs.CV cs.LG cs.RO

    Real-time Virtual Intraoperative CT for Image Guided Surgery

    Authors: Yangming Li, Neeraja Konuthula, Ian M. Humphreys, Kris Moe, Blake Hannaford, Randall Bly

    Abstract: Abstract. Purpose: This paper presents a scheme for generating virtual intraoperative CT scans in order to improve surgical completeness in Endoscopic Sinus Surgeries (ESS). Approach: The work presents three methods, the tip motion-based, the tip trajectory-based, and the instrument based, along with non-parametric smoothing and Gaussian Process Regression, for virtual intraoperative CT generation… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

  28. arXiv:2112.02598  [pdf

    cs.LG

    Real-time Informative Surgical Skill Assessment with Gaussian Process Learning

    Authors: Yangming Li, Randall Bly, Sarah Akkina, Rajeev C. Saxena, Ian Humphreys, Mark Whipple, Kris Moe, Blake Hannaford

    Abstract: Endoscopic Sinus and Skull Base Surgeries (ESSBSs) is a challenging and potentially dangerous surgical procedure, and objective skill assessment is the key components to improve the effectiveness of surgical training, to re-validate surgeons' skills, and to decrease surgical trauma and the complication rate in operating rooms. Because of the complexity of surgical procedures, the variation of oper… ▽ More

    Submitted 5 December, 2021; originally announced December 2021.

  29. arXiv:2112.00246  [pdf, other

    cs.CV cs.RO

    AdaAfford: Learning to Adapt Manipulation Affordance for 3D Articulated Objects via Few-shot Interactions

    Authors: Yian Wang, Ruihai Wu, Kaichun Mo, Jiaqi Ke, Qingnan Fan, Leonidas Guibas, Hao Dong

    Abstract: Perceiving and interacting with 3D articulated objects, such as cabinets, doors, and faucets, pose particular challenges for future home-assistant robots performing daily tasks in human environments. Besides parsing the articulated parts and joint parameters, researchers recently advocate learning manipulation affordance over the input shape geometry which is more task-aware and geometrically fine… ▽ More

    Submitted 4 May, 2023; v1 submitted 30 November, 2021; originally announced December 2021.

    Comments: ECCV 2022

  30. arXiv:2110.06962  [pdf, other

    cs.CL cs.IR

    Open-Domain Question-Answering for COVID-19 and Other Emergent Domains

    Authors: Sharon Levy, Kevin Mo, Wenhan Xiong, William Yang Wang

    Abstract: Since late 2019, COVID-19 has quickly emerged as the newest biomedical domain, resulting in a surge of new information. As with other emergent domains, the discussion surrounding the topic has been rapidly changing, leading to the spread of misinformation. This has created the need for a public space for users to ask questions and receive credible, scientific answers. To fulfill this need, we turn… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021 Demo

  31. arXiv:2109.08817  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Learning to Regrasp by Learning to Place

    Authors: Shuo Cheng, Kaichun Mo, Lin Shao

    Abstract: In this paper, we explore whether a robot can learn to regrasp a diverse set of objects to achieve various desired grasp poses. Regrasping is needed whenever a robot's current grasp pose fails to perform desired manipulation tasks. Endowing robots with such an ability has applications in many domains such as manufacturing or domestic services. Yet, it is a challenging task due to the large diversi… ▽ More

    Submitted 17 November, 2021; v1 submitted 17 September, 2021; originally announced September 2021.

    Comments: Accepted to Conference on Robot Learning (CoRL) 2021

  32. Reducing Annotating Load: Active Learning with Synthetic Images in Surgical Instrument Segmentation

    Authors: Haonan Peng, Shan Lin, Daniel King, Yun-Hsuan Su, Randall A. Bly, Kris S. Moe, Blake Hannaford

    Abstract: Accurate instrument segmentation in endoscopic vision of robot-assisted surgery is challenging due to reflection on the instruments and frequent contacts with tissue. Deep neural networks (DNN) show competitive performance and are in favor in recent years. However, the hunger of DNN for labeled data poses a huge workload of annotation. Motivated by alleviating this workload, we propose a general e… ▽ More

    Submitted 7 August, 2021; originally announced August 2021.

  33. arXiv:2106.15087  [pdf, other

    cs.CV cs.RO

    O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning

    Authors: Kaichun Mo, Yuzhe Qin, Fanbo Xiang, Hao Su, Leonidas Guibas

    Abstract: Contrary to the vast literature in modeling, perceiving, and understanding agent-object (e.g., human-object, hand-object, robot-object) interaction in computer vision and robotics, very few past works have studied the task of object-object interaction, which also plays an important role in robotic manipulation and planning tasks. There is a rich space of object-object interaction scenarios in our… ▽ More

    Submitted 25 October, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: to appear in CoRL 2021

  34. arXiv:2106.14440  [pdf, other

    cs.CV cs.RO

    VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects

    Authors: Ruihai Wu, Yan Zhao, Kaichun Mo, Zizheng Guo, Yian Wang, Tianhao Wu, Qingnan Fan, Xuelin Chen, Leonidas Guibas, Hao Dong

    Abstract: Perceiving and manipulating 3D articulated objects (e.g., cabinets, doors) in human environments is an important yet challenging task for future home-assistant robots. The space of 3D articulated objects is exceptionally rich in their myriad semantic categories, diverse shape geometry, and complicated part functionality. Previous works mostly abstract kinematic structure with estimated joint param… ▽ More

    Submitted 1 April, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: ICLR 2022

  35. arXiv:2102.06554  [pdf, other

    cs.LG cs.AI

    Exploiting Spline Models for the Training of Fully Connected Layers in Neural Network

    Authors: Kanya Mo, Shen Zheng, Xiwei Wang, Jinghua Wang, Klaus-Dieter Schewe

    Abstract: The fully connected (FC) layer, one of the most fundamental modules in artificial neural networks (ANN), is often considered difficult and inefficient to train due to issues including the risk of overfitting caused by its large amount of parameters. Based on previous work studying ANN from linear spline perspectives, we propose a spline-based approach that eases the difficulty of training FC layer… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  36. arXiv:2101.02692  [pdf, other

    cs.CV cs.RO

    Where2Act: From Pixels to Actions for Articulated 3D Objects

    Authors: Kaichun Mo, Leonidas Guibas, Mustafa Mukadam, Abhinav Gupta, Shubham Tulsiani

    Abstract: One of the fundamental goals of visual perception is to allow agents to meaningfully interact with their environment. In this paper, we take a step towards that long-term goal -- we extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. For example, given a drawer, our network predicts that applying a pul… ▽ More

    Submitted 10 August, 2021; v1 submitted 7 January, 2021; originally announced January 2021.

    Comments: accepted to ICCV 2021

  37. arXiv:2012.02493  [pdf, other

    cs.CV

    Compositionally Generalizable 3D Structure Prediction

    Authors: Songfang Han, Jiayuan Gu, Kaichun Mo, Li Yi, Siyu Hu, Xuejin Chen, Hao Su

    Abstract: Single-image 3D shape reconstruction is an important and long-standing problem in computer vision. A plethora of existing works is constantly pushing the state-of-the-art performance in the deep learning era. However, there remains a much more difficult and under-explored issue on how to generalize the learned skills over unseen object categories that have very different shape geometry distributio… ▽ More

    Submitted 21 April, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

  38. Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video

    Authors: Shan Lin, Fangbo Qin, Haonan Peng, Randall A. Bly, Kris S. Moe, Blake Hannaford

    Abstract: Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the… ▽ More

    Submitted 25 July, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Published in IEEE Robotics and Automation Letters (Early Access)

  39. arXiv:2008.05440  [pdf, other

    cs.GR cs.CV

    DSG-Net: Learning Disentangled Structure and Geometry for 3D Shape Generation

    Authors: Jie Yang, Kaichun Mo, Yu-Kun Lai, Leonidas J. Guibas, Lin Gao

    Abstract: D shape generation is a fundamental operation in computer graphics. While significant progress has been made, especially with recent deep generative models, it remains a challenge to synthesize high-quality shapes with rich geometric details and complex structure, in a controllable manner. To tackle this, we introduce DSG-Net, a deep neural network that learns a disentangled structured and geometr… ▽ More

    Submitted 28 May, 2022; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: Accept to ACM Transaction on Graphics 2022, 26 pages

  40. arXiv:2006.07793  [pdf, other

    cs.CV

    Generative 3D Part Assembly via Dynamic Graph Learning

    Authors: Jialei Huang, Guanqi Zhan, Qingnan Fan, Kaichun Mo, Lin Shao, Baoquan Chen, Leonidas Guibas, Hao Dong

    Abstract: Autonomous part assembly is a challenging yet crucial task in 3D computer vision and robotics. Analogous to buying an IKEA furniture, given a set of 3D parts that can assemble a single shape, an intelligent agent needs to perceive the 3D part geometry, reason to propose pose estimations for the input parts, and finally call robotic planning and control routines for actuation. In this paper, we foc… ▽ More

    Submitted 23 December, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  41. arXiv:2006.07029  [pdf, other

    cs.CV cs.LG eess.IV

    Rethinking Sampling in 3D Point Cloud Generative Adversarial Networks

    Authors: He Wang, Zetian Jiang, Li Yi, Kaichun Mo, Hao Su, Leonidas J. Guibas

    Abstract: In this paper, we examine the long-neglected yet important effects of point sampling patterns in point cloud GANs. Through extensive experiments, we show that sampling-insensitive discriminators (e.g.PointNet-Max) produce shape point clouds with point clustering artifacts while sampling-oversensitive discriminators (e.g.PointNet++, DGCNN) fail to guide valid shape generation. We propose the concep… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  42. arXiv:2004.08731  [pdf

    cs.CL cs.IR cs.LG

    Enhancing Pharmacovigilance with Drug Reviews and Social Media

    Authors: Brent Biseda, Katie Mo

    Abstract: This paper explores whether the use of drug reviews and social media could be leveraged as potential alternative sources for pharmacovigilance of adverse drug reactions (ADRs). We examined the performance of BERT alongside two variants that are trained on biomedical papers, BioBERT7, and clinical notes, Clinical BERT8. A variety of 8 different BERT models were fine-tuned and compared across three… ▽ More

    Submitted 18 April, 2020; originally announced April 2020.

  43. arXiv:2003.09754  [pdf, other

    cs.CV cs.RO

    Learning 3D Part Assembly from a Single Image

    Authors: Yichen Li, Kaichun Mo, Lin Shao, Minhyuk Sung, Leonidas Guibas

    Abstract: Autonomous assembly is a crucial capability for robots in many applications. For this task, several problems such as obstacle avoidance, motion planning, and actuator control have been extensively studied in robotics. However, when it comes to task specification, the space of possibilities remains underexplored. Towards this end, we introduce a novel problem, single-image-guided 3D part assembly,… ▽ More

    Submitted 24 March, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

  44. arXiv:2003.08624  [pdf, other

    cs.CV cs.CG cs.GR

    PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions

    Authors: Kaichun Mo, He Wang, Xinchen Yan, Leonidas J. Guibas

    Abstract: 3D generative shape modeling is a fundamental research area in computer vision and interactive computer graphics, with many real-world applications. This paper investigates the novel problem of generating 3D shape point cloud geometry from a symbolic part tree representation. In order to learn such a conditional shape generation procedure in an end-to-end fashion, we propose a conditional GAN "par… ▽ More

    Submitted 15 July, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: ECCV 2020

  45. arXiv:2003.08515  [pdf, other

    cs.CV cs.RO

    SAPIEN: A SimulAted Part-based Interactive ENvironment

    Authors: Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel X. Chang, Leonidas J. Guibas, Hao Su

    Abstract: Building home assistant robots has long been a pursuit for vision and robotics researchers. To achieve this task, a simulated environment with physically realistic simulation, sufficient articulated objects, and transferability to the real robot is indispensable. Existing environments achieve these requirements for robotics simulation with different levels of simplification and focus. We take one… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

  46. arXiv:2003.04949  [pdf, other

    eess.IV cs.CV

    LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images

    Authors: Shan Lin, Fangbo Qin, Yangming Li, Randall A. Bly, Kris S. Moe, Blake Hannaford

    Abstract: Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN)… ▽ More

    Submitted 13 August, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: Accepted by 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  47. Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision

    Authors: Fangbo Qin, Shan Lin, Yangming Li, Randall A. Bly, Kris S. Moe, Blake Hannaford

    Abstract: Accurate and real-time surgical instrument segmentation is important in the endoscopic vision of robot-assisted surgery, and significant challenges are posed by frequent instrument-tissue contacts and continuous change of observation perspective. For these challenging tasks more and more deep neural networks (DNN) models are designed in recent years. We are motivated to propose a general embeddabl… ▽ More

    Submitted 10 August, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted by IEEE Robotics and Automation Letters

  48. arXiv:2002.06478  [pdf, other

    cs.CV cs.LG

    Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories

    Authors: Tiange Luo, Kaichun Mo, Zhiao Huang, Jiarui Xu, Siyu Hu, Liwei Wang, Hao Su

    Abstract: We address the problem of discovering 3D parts for objects in unseen categories. Being able to learn the geometry prior of parts and transfer this prior to unseen categories pose fundamental challenges on data-driven shape segmentation approaches. Formulated as a contextual bandit problem, we propose a learning-based agglomerative clustering framework which learns a grouping policy to progressivel… ▽ More

    Submitted 17 September, 2021; v1 submitted 15 February, 2020; originally announced February 2020.

    Comments: ICLR2020

  49. arXiv:1911.11098  [pdf, other

    cs.CV cs.CG cs.GR

    StructEdit: Learning Structural Shape Variations

    Authors: Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

    Abstract: Learning to encode differences in the geometry and (topological) structure of the shapes of ordinary objects is key to generating semantically plausible variations of a given shape, transferring edits from one shape to another, and many other applications in 3D content creation. The common approach of encoding shapes as points in a high-dimensional latent feature space suggests treating shape diff… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

  50. arXiv:1908.00575  [pdf, other

    cs.GR cs.CG cs.CV

    StructureNet: Hierarchical Graph Networks for 3D Shape Generation

    Authors: Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas J. Guibas

    Abstract: The ability to generate novel, diverse, and realistic 3D shapes along with associated part semantics and structure is central to many applications requiring high-quality 3D assets or large volumes of realistic training data. A key challenge towards this goal is how to accommodate diverse shape variations, including both continuous deformations of parts as well as structural or discrete alterations… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: Conditionally Accepted to Siggraph Asia 2019