Skip to main content

Showing 1–43 of 43 results for author: Kuo, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12574  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    MuMA-ToM: Multi-modal Multi-Agent Theory of Mind

    Authors: Haojun Shi, Suyu Ye, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, Tianmin Shu

    Abstract: Understanding people's social interactions in complex real-world scenarios often relies on intricate mental reasoning. To truly understand how and why people interact with one another, we must infer the underlying mental states that give rise to the social interactions, i.e., Theory of Mind reasoning in multi-agent interactions. Additionally, social interactions are often multi-modal -- we can wat… ▽ More

    Submitted 25 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Project website: https://scai.cs.jhu.edu/projects/MuMA-ToM/ Code: https://github.com/SCAI-JHU/MuMA-ToM

  2. arXiv:2408.08992  [pdf, other

    cs.HC

    SpreadLine: Visualizing Egocentric Dynamic Influence

    Authors: Yun-Hsin Kuo, Dongyu Liu, Kwan-Liu Ma

    Abstract: Egocentric networks, often visualized as node-link diagrams, portray the complex relationship (link) dynamics between an entity (node) and others. However, common analytics tasks are multifaceted, encompassing interactions among four key aspects: strength, function, structure, and content. Current node-link visualization designs may fall short, focusing narrowly on certain aspects and neglecting t… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: To appear in VIS 2024 and IEEE Transactions on Visualization and Computer Graphics

  3. arXiv:2407.13729  [pdf, other

    cs.CL

    Baba Is AI: Break the Rules to Beat the Benchmark

    Authors: Nathan Cloos, Meagan Jens, Michelangelo Naim, Yen-Ling Kuo, Ignacio Cases, Andrei Barbu, Christopher J. Cueva

    Abstract: Humans solve problems by following existing rules and procedures, and also by leaps of creativity to redefine those rules and objectives. To probe these abilities, we developed a new benchmark based on the game Baba Is You where an agent manipulates both objects in the environment and rules, represented by movable tiles with words written on them, to reach a specified goal and win the game. We tes… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 8 pages, 8 figures

  4. arXiv:2406.18089  [pdf, other

    cs.SD cs.MM eess.AS

    A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons

    Authors: Tzu-Yun Hung, Jui-Te Wu, Yu-Chia Kuo, Yo-Wei Hsiao, Ting-Wei Lin, Li Su

    Abstract: Expressive music synthesis (EMS) for violin performance is a challenging task due to the disagreement among music performers in the interpretation of expressive musical terms (EMTs), scarcity of labeled recordings, and limited generalization ability of the synthesis model. These challenges create trade-offs between model effectiveness, diversity of generated results, and controllability of the syn… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 15 pages, 2 figures, 3 tables

  5. arXiv:2406.06375  [pdf, other

    cs.SD cs.AI eess.AS

    MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

    Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-Ping Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

    Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

  6. arXiv:2404.07351  [pdf, other

    cs.CV cs.HC cs.LG

    A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

    Authors: Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, Enkelejda Kasneci

    Abstract: Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate human gaze behavior. However, this task presents significant challenges due to the inherent complexity and ambiguity of human gaze patterns. In this work, we i… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 2024 Symposium on Eye Tracking Research and Applications (ETRA24), Glasgow, United Kingdom

  7. arXiv:2404.07347  [pdf, other

    cs.CV cs.HC cs.LG

    Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention

    Authors: Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, Enkelejda Kasneci

    Abstract: Humans utilize their gaze to concentrate on essential information while perceiving and interpreting intentions in videos. Incorporating human gaze into computational algorithms can significantly enhance model performance in video understanding tasks. In this work, we address a challenging and innovative task in video understanding: predicting the actions of an agent in a video based on a partial v… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 2024 Symposium on Eye Tracking Research and Applications (ETRA24), Glasgow, United Kingdom

  8. arXiv:2401.13280  [pdf, other

    cs.CV cs.CE

    DDI-CoCo: A Dataset For Understanding The Effect Of Color Contrast In Machine-Assisted Skin Disease Detection

    Authors: Ming-Chang Chiu, Yingfei Wang, Yen-Ju Kuo, Pin-Yu Chen

    Abstract: Skin tone as a demographic bias and inconsistent human labeling poses challenges in dermatology AI. We take another angle to investigate color contrast's impact, beyond skin tones, on malignancy detection in skin disease datasets: We hypothesize that in addition to skin tones, the color difference between the lesion area and skin also plays a role in malignancy detection performance of dermatology… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 2 tables, Accepted to ICASSP 2024

  9. arXiv:2401.08743  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    MMToM-QA: Multimodal Theory of Mind Question Answering

    Authors: Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than v… ▽ More

    Submitted 15 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: ACL 2024. 26 pages, 11 figures, 7 tables

  10. arXiv:2309.10858  [pdf, other

    cs.CV

    On-device Real-time Custom Hand Gesture Recognition

    Authors: Esha Uboweja, David Tian, Qifei Wang, Yi-Chun Kuo, Joe Zou, Lu Wang, George Sung, Matthias Grundmann

    Abstract: Most existing hand gesture recognition (HGR) systems are limited to a predefined set of gestures. However, users and developers often want to recognize new, unseen gestures. This is challenging due to the vast diversity of all plausible hand shapes, e.g. it is impossible for developers to include all hand gestures in a predefined list. In this paper, we present a user-friendly framework that lets… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 5 pages, 6 figures; Accepted to ICCV Workshop on Computer Vision for Metaverse, Paris, France, 2023

  11. arXiv:2309.05739  [pdf, other

    cs.HC cs.GR

    VisActs: Describing Intent in Communicative Visualization

    Authors: Keshav Dasu, Yun-Hsin Kuo, Kwan-Liu Ma

    Abstract: Data visualization can be defined as the visual communication of information. One important barometer for the success of a visualization is whether the intents of the communicator(s) are faithfully conveyed. The processes of constructing and displaying visualizations have been widely studied by our community. However, due to the lack of consistency in this literature, there is a growing acknowledg… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: Currently pending review

  12. arXiv:2308.11071  [pdf, other

    cs.AI cs.LG cs.MA cs.RO

    Neural Amortized Inference for Nested Multi-agent Reasoning

    Authors: Kunal Jha, Tuan Anh Le, Chuanyang Jin, Yen-Ling Kuo, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i.e., understanding how others infer oneself. Such intricate reasoning can be effectively modeled through nested multi-agent reasoning. Nonetheless, the computational complexity escalates exponentially with each level of reasoning, posing a significant challenge. However, humans ef… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: 8 pages, 10 figures

  13. arXiv:2308.07557  [pdf, other

    cs.HC

    Character-Oriented Design for Visual Data Storytelling

    Authors: Keshav Dasu, Yun-Hsin Kuo, Kwan-Liu Ma

    Abstract: When telling a data story, an author has an intention they seek to convey to an audience. This intention can be of many forms such as to persuade, to educate, to inform, or even to entertain. In addition to expressing their intention, the story plot must balance being consumable and enjoyable while preserving scientific integrity. In data stories, numerous methods have been identified for construc… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: Accepted to TVCG & VIS 2023 Pre-Print. Storytelling, Data Stories, Explanatory, Narrative visualization, Visual metaphor

  14. arXiv:2308.00278  [pdf, other

    cs.LG

    Classes are not Clusters: Improving Label-based Evaluation of Dimensionality Reduction

    Authors: Hyeon Jeon, Yun-Hsin Kuo, Michaël Aupetit, Kwan-Liu Ma, Jinwook Seo

    Abstract: A common way to evaluate the reliability of dimensionality reduction (DR) embeddings is to quantify how well labeled classes form compact, mutually separated clusters in the embeddings. This approach is based on the assumption that the classes stay as clear clusters in the original high-dimensional space. However, in reality, this assumption can be violated; a single class can be fragmented into m… ▽ More

    Submitted 11 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: IEEE Transactions on Visualization and Computer Graphics (TVCG) (Proc. IEEE VIS 2023)

  15. arXiv:2305.17600  [pdf, other

    cs.LG cs.CV cs.GT cs.RO math.OC

    NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction

    Authors: Justin Lidard, Oswin So, Yanxia Zhang, Jonathan DeCastro, Xiongyi Cui, Xin Huang, Yen-Ling Kuo, John Leonard, Avinash Balachandran, Naomi Leonard, Guy Rosman

    Abstract: Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-… ▽ More

    Submitted 11 November, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: 8 pages, 6 figures

  16. arXiv:2305.06447  [pdf, other

    cs.LG cs.IR cs.SI

    Dynamic Graph Representation Learning for Depression Screening with Transformer

    Authors: Ai-Te Kuo, Haiquan Chen, Yu-Hsuan Kuo, Wei-Shinn Ku

    Abstract: Early detection of mental disorder is crucial as it enables prompt intervention and treatment, which can greatly improve outcomes for individuals suffering from debilitating mental affliction. The recent proliferation of mental health discussions on social media platforms presents research opportunities to investigate mental health and potentially detect instances of mental illness. However, exist… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 10 pages, 4 figures, 8 tables

  17. arXiv:2301.09209  [pdf, other

    cs.CV cs.CL

    Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

    Authors: Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang

    Abstract: We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatio-temporal context formed by past actions on objects, coined action context. We propose TransFusion, a multimodal transformer-based architecture. It exploits the representational power of language by summarizing the action context. TransFusion leverages pre-trained image captioning and vi… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 January, 2023; originally announced January 2023.

  18. arXiv:2210.01306  [pdf, other

    quant-ph cs.AR cs.ET

    Robust Qubit Mapping Algorithm via Double-Source Optimal Routing on Large Quantum Circuits

    Authors: Chin-Yi Cheng, Chien-Yi Yang, Yi-Hsiang Kuo, Ren-Chu Wang, Hao-Chung Cheng, Chung-Yang Ric Huang

    Abstract: Qubit Mapping is a critical aspect of implementing quantum circuits on real hardware devices. Currently, the existing algorithms for qubit mapping encounter difficulties when dealing with larger circuit sizes involving hundreds of qubits. In this paper, we introduce an innovative qubit mapping algorithm, Duostra, tailored to address the challenge of implementing large-scale quantum circuits on rea… ▽ More

    Submitted 3 August, 2024; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by ACM Transactions on Quantum Computing

    Journal ref: ACM Transactions on Quantum Computing (August 2024)

  19. arXiv:2209.02485  [pdf, other

    cs.CV cs.CL

    Reconstructing Action-Conditioned Human-Object Interactions Using Commonsense Knowledge Priors

    Authors: Xi Wang, Gen Li, Yen-Ling Kuo, Muhammed Kocabas, Emre Aksan, Otmar Hilliges

    Abstract: We present a method for inferring diverse 3D models of human-object interactions from images. Reasoning about how humans interact with objects in complex scenes from a single 2D image is a challenging task given ambiguities arising from the loss of information through projection. In addition, modeling 3D interactions requires the generalization ability towards diverse object categories and interac… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  20. arXiv:2208.08878  [pdf, other

    cs.LG cs.AI

    Towards Learning in Grey Spatiotemporal Systems: A Prophet to Non-consecutive Spatiotemporal Dynamics

    Authors: Zhengyang Zhou, Yang Kuo, Wei Sun, Binwu Wang, Min Zhou, Yunan Zong, Yang Wang

    Abstract: Spatiotemporal forecasting is an imperative topic in data science due to its diverse and critical applications in smart cities. Existing works mostly perform consecutive predictions of following steps with observations completely and continuously obtained, where nearest observations can be exploited as key knowledge for instantaneous status estimation. However, the practical issues of early activi… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 13 pages, 6 figures and 4 tables

  21. arXiv:2206.13891  [pdf, other

    cs.LG stat.ML

    Feature Learning for Nonlinear Dimensionality Reduction toward Maximal Extraction of Hidden Patterns

    Authors: Takanori Fujiwara, Yun-Hsin Kuo, Anders Ynnerman, Kwan-Liu Ma

    Abstract: Dimensionality reduction (DR) plays a vital role in the visual analysis of high-dimensional data. One main aim of DR is to reveal hidden patterns that lie on intrinsic low-dimensional manifolds. However, DR often overlooks important patterns when the manifolds are distorted or masked by certain influential data attributes. This paper presents a feature learning framework, FEALM, designed to genera… ▽ More

    Submitted 24 February, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Accepted by PacificVis 2023. The previous preprint version was titled "Feature Learning for Dimensionality Reduction toward Maximal Extraction of Hidden Patterns" (arxiv:2206.13891v2)

  22. arXiv:2205.11748  [pdf, other

    cs.SD cs.LG eess.AS

    Deep Learning-based automated classification of Chinese Speech Sound Disorders

    Authors: Yao-Ming Kuo, Shanq-Jang Ruan, Yu-Chin Chen, Ya-Wen Tu

    Abstract: This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrica… ▽ More

    Submitted 6 July, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: Children 2022

    Journal ref: Children 2022, 9, 996

  23. arXiv:2202.05413  [pdf, other

    cs.LG

    A Machine-Learning-Aided Visual Analysis Workflow for Investigating Air Pollution Data

    Authors: Yun-Hsin Kuo, Takanori Fujiwara, Charles C. -K. Chou, Chun-houh Chen, Kwan-Liu Ma

    Abstract: Analyzing air pollution data is challenging as there are various analysis focuses from different aspects: feature (what), space (where), and time (when). As in most geospatial analysis problems, besides high-dimensional features, the temporal and spatial dependencies of air pollution induce the complexity of performing analysis. Machine learning methods, such as dimensionality reduction, can extra… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: To appear in the Proceedings of IEEE PacificVis 2022

  24. arXiv:2110.11864  [pdf

    cs.CL cs.CV

    Deep learning-based NLP Data Pipeline for EHR Scanned Document Information Extraction

    Authors: Enshuo Hsu, Ioannis Malagaris, Yong-Fang Kuo, Rizwana Sultana, Kirk Roberts

    Abstract: Scanned documents in electronic health records (EHR) have been a challenge for decades, and are expected to stay in the foreseeable future. Current approaches for processing often include image preprocessing, optical character recognition (OCR), and text mining. However, there is limited work that evaluates the choice of image preprocessing methods, the selection of NLP models, and the role of doc… ▽ More

    Submitted 13 September, 2021; originally announced October 2021.

    Comments: 6 tables, 7 figures

  25. arXiv:2110.10298  [pdf, other

    cs.RO

    Incorporating Rich Social Interactions Into MDPs

    Authors: Ravi Tejwani, Yen-Ling Kuo, Tianmin Shu, Bennett Stankovits, Dan Gutfreund, Joshua B. Tenenbaum, Boris Katz, Andrei Barbu

    Abstract: Much of what we do as humans is engage socially with other agents, a skill that robots must also eventually possess. We demonstrate that a rich theory of social interactions originating from microsociology and economics can be formalized by extending a nested MDP where agents reason about arbitrary functions of each other's hidden rewards. This extended Social MDP allows us to encode the five basi… ▽ More

    Submitted 7 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted to the 39th International Conference on Robotics and Automation (ICRA 2022)

  26. arXiv:2110.09741  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    Trajectory Prediction with Linguistic Representations

    Authors: Yen-Ling Kuo, Xin Huang, Andrei Barbu, Stephen G. McGill, Boris Katz, John J. Leonard, Guy Rosman

    Abstract: Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without dir… ▽ More

    Submitted 9 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted in ICRA 2022

  27. Secure Links: Secure-by-Design Communications in IEC 61499 Industrial Control Applications

    Authors: Awais Tanveer, Roopak Sinha, Matthew M. Y. Kuo

    Abstract: Increasing automation and external connectivity in industrial control systems (ICS) demand a greater emphasis on software-level communication security. In this article, we propose a secure-by-design development method for building ICS applications, where requirements from security standards like ISA/IEC 62443 are fulfilled by design-time abstractions called secure links. Proposed as an extension t… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

    Comments: Journal paper, 11 pages, 10 figures, 3 tables

    Journal ref: IEEE Transactions on Industrial Informatics 17(6)(2021), pp.3992-4002

  28. arXiv:2105.14322  [pdf, other

    cs.CV

    RPG: Learning Recursive Point Cloud Generation

    Authors: Wei-Jan Ko, Hui-Yu Huang, Yu-Liang Kuo, Chen-Yi Chiu, Li-Heng Wang, Wei-Chen Chiu

    Abstract: In this paper we propose a novel point cloud generator that is able to reconstruct and generate 3D point clouds composed of semantic parts. Given a latent representation of the target 3D model, the generation starts from a single point and gets expanded recursively to produce the high-resolution point cloud via a sequence of point expansion stages. During the recursive procedure of generation, we… ▽ More

    Submitted 29 May, 2021; originally announced May 2021.

  29. arXiv:2012.12453  [pdf, other

    cs.CV

    CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80

    Authors: W. -Y. Hong, C. -L. Kao, Y. -H. Kuo, J. -R. Wang, W. -L. Chang, C. -S. Shih

    Abstract: Computer-assisted surgery has been developed to enhance surgery correctness and safety. However, researchers and engineers suffer from limited annotated data to develop and train better algorithms. Consequently, the development of fundamental algorithms such as Simultaneous Localization and Mapping (SLAM) is limited. This article elaborates on the efforts of preparing the dataset for semantic segm… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

    Comments: 6 pages

  30. arXiv:2011.05451  [pdf, other

    math.NA cs.CE hep-lat

    Lattice meets lattice: Application of lattice cubature to models in lattice gauge theory

    Authors: Tobias Hartung, Karl Jansen, Frances Y. Kuo, Hernan Leövey, Dirk Nuyens, Ian H. Sloan

    Abstract: High dimensional integrals are abundant in many fields of research including quantum physics. The aim of this paper is to develop efficient recursive strategies to tackle a class of high dimensional integrals having a special product structure with low order couplings, motivated by models in lattice gauge theory from quantum field theory. A novel element of this work is the potential benefit in us… ▽ More

    Submitted 29 June, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

    MSC Class: 65D30; 65D32; 65T50; 65Z05; 81T80

    Journal ref: Journal of Computational Physics Volume 443, 15 October 2021, 110527

  31. arXiv:2008.03277  [pdf, other

    cs.CL

    Learning a natural-language to LTL executable semantic parser for grounded robotics

    Authors: Christopher Wang, Candace Ross, Yen-Ling Kuo, Boris Katz, Andrei Barbu

    Abstract: Children acquire their native language with apparent ease by observing how language is used in context and attempting to use it themselves. They do so without laborious annotations, negative examples, or even direct corrections. We take a step toward robots that can do the same by training a grounded semantic parser, which discovers latent linguistic representations that can be used for the execut… ▽ More

    Submitted 16 March, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

    Comments: 10 pages, 2 figures, Accepted in Conference on Robot Learning (CoRL) 2020

    ACM Class: I.2.7

  32. arXiv:2008.02742  [pdf, other

    cs.CL cs.AI cs.RO

    Compositional Networks Enable Systematic Generalization for Grounded Language Understanding

    Authors: Yen-Ling Kuo, Boris Katz, Andrei Barbu

    Abstract: Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before. Recent work has shown that while deep networks can mimic some human language abilities when presented with novel sentences, systematic variation uncovers the limitations in the language-understanding abilities of networks. We demonstrate that these limitations c… ▽ More

    Submitted 19 October, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted in Findings of EMNLP 2021

  33. arXiv:2006.01110  [pdf, other

    cs.RO cs.CL

    Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

    Authors: Yen-Ling Kuo, Boris Katz, Andrei Barbu

    Abstract: We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them. This is a novel form of multi-task learning for RL agents where agents learn from one diverse set of tasks and ge… ▽ More

    Submitted 6 August, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: Accepted in IROS 2020

  34. arXiv:2003.03716  [pdf, other

    cs.CL

    Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

    Authors: Yu-Siang Wang, Yen-Ling Kuo, Boris Katz

    Abstract: We demonstrate how we can practically incorporate multi-step future information into a decoder of maximum likelihood sequence models. We propose a "k-step look-ahead" module to consider the likelihood information of a rollout up to k steps. Unlike other approaches that need to train another value network to evaluate the rollouts, we can directly apply this look-ahead module to improve the decoding… ▽ More

    Submitted 7 March, 2020; originally announced March 2020.

    Comments: 7 pages, 5 figures

  35. arXiv:2002.05201  [pdf, other

    cs.RO cs.CL

    Deep compositional robotic planners that follow natural language commands

    Authors: Yen-Ling Kuo, Boris Katz, Andrei Barbu

    Abstract: We demonstrate how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands in a continuous configuration space to move and manipulate objects. Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes, with a sampling-based planner, RRT. A recur… ▽ More

    Submitted 19 February, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

    Comments: Accepted in ICRA 2020

  36. arXiv:1810.00804  [pdf, other

    cs.RO

    Deep sequential models for sampling-based planning

    Authors: Yen-Ling Kuo, Andrei Barbu, Boris Katz

    Abstract: We demonstrate how a sequence model and a sampling-based planner can influence each other to produce efficient plans and how such a model can automatically learn to take advantage of observations of the environment. Sampling-based planners such as RRT generally know nothing of their environments even if they have traversed similar spaces many times. A sequence model, such as an HMM or LSTM, guides… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: Published in IROS 2018

  37. arXiv:1809.08753  [pdf, other

    cs.MM

    An Iterative Refinement Approach for Social Media Headline Prediction

    Authors: Chih-Chung Hsu, Chia-Yen Lee, Ting-Xuan Liao, Jun-Yi Lee, Tsai-Yne Hou, Ying-Chu Kuo, Jing-Wen Lin, Ching-Yi Hsueh, Zhong-Xuan Zhan, Hsiang-Chin Chien

    Abstract: In this study, we propose a novel iterative refinement approach to predict the popularity score of the social media meta-data effectively. With the rapid growth of the social media on the Internet, how to adequately forecast the view count or popularity becomes more important. Conventionally, the ensemble approach such as random forest regression achieves high and stable performance on various pre… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

    Comments: 5 pages, ACM Multimedia Conference 2018

  38. Detecting Outliers in Data with Correlated Measures

    Authors: Yu-Hsuan Kuo, Zhenhui Li, Daniel Kifer

    Abstract: Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers.… ▽ More

    Submitted 26 August, 2018; originally announced August 2018.

    Comments: 10 pages

  39. arXiv:1804.00370  [pdf, other

    cs.DB

    Differentially Private Hierarchical Count-of-Counts Histograms

    Authors: Yu-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay, Ashwin Machanavajjhala

    Abstract: We consider the problem of privately releasing a class of queries that we call hierarchical count-of-counts histograms. Count-of-counts histograms partition the rows of an input table into groups (e.g., group of people in the same household), and for every integer j report the number of groups of size j. Hierarchical count-of-counts queries report count-of-counts histograms at different granularit… ▽ More

    Submitted 13 September, 2018; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: 13 pages

  40. arXiv:1706.09541  [pdf, ps, other

    cs.NI

    Information-Centric Wireless Networks with Mobile Edge Computing

    Authors: Yuchen Zhou, F. Richard Yu, Jian Chen, Yonghong Kuo

    Abstract: In order to better accommodate the dramatically increasing demand for data caching and computing services, storage and computation capabilities should be endowed to some of the intermediate nodes within the network. In this paper, we design a novel virtualized heterogeneous networks framework aiming at enabling content caching and computing. With the virtualization of the whole system, the communi… ▽ More

    Submitted 28 June, 2017; originally announced June 2017.

  41. arXiv:1608.05339  [pdf, other

    cs.CV

    Photo Filter Recommendation by Category-Aware Aesthetic Learning

    Authors: Wei-Tse Sun, Ting-Hsuan Chao, Yin-Hsi Kuo, Winston H. Hsu

    Abstract: Nowadays, social media has become a popular platform for the public to share photos. To make photos more visually appealing, users usually apply filters on their photos without domain knowledge. However, due to the growing number of filter types, it becomes a major issue for users to choose the best filter type. For this purpose, filter recommendation for photo aesthetics takes an important role i… ▽ More

    Submitted 27 March, 2017; v1 submitted 18 August, 2016; originally announced August 2016.

    Comments: 11 pages, 7 figures

  42. arXiv:1606.08999  [pdf, ps, other

    cs.MM cs.CV

    De-Hashing: Server-Side Context-Aware Feature Reconstruction for Mobile Visual Search

    Authors: Yin-Hsi Kuo, Winston H. Hsu

    Abstract: Due to the prevalence of mobile devices, mobile search becomes a more convenient way than desktop search. Different from the traditional desktop search, mobile visual search needs more consideration for the limited resources on mobile devices (e.g., bandwidth, computing power, and memory consumption). The state-of-the-art approaches show that bag-of-words (BoW) model is robust for image and video… ▽ More

    Submitted 29 June, 2016; originally announced June 2016.

    Comments: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

  43. arXiv:1512.08580  [pdf, other

    cs.LG cs.CY

    A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data

    Authors: Hongjian Wang, Zhenhui Li, Yu-Hsuan Kuo, Dan Kifer

    Abstract: The increased availability of large-scale trajectory data around the world provides rich information for the study of urban dynamics. For example, New York City Taxi Limousine Commission regularly releases source-destination information about trips in the taxis they regulate. Taxi data provide information about traffic patterns, and thus enable the study of urban flow -- what will traffic between… ▽ More

    Submitted 28 December, 2015; originally announced December 2015.

    Comments: 12 pages

    ACM Class: H.2.8; I.2.6