Skip to main content

Mohamad Ivan Fanany

University of Indonesia, Computer Science, Faculty Member

Followers

8,781

Following

109

Co-authors

24

Public Views

Dr. Fanany is a researcher and lecturer at Faculty of Computer Science and Graduate School of Biomedical Engineering.-- University of Indonesia. His research interests include machine learning, data-mining, and combining vision and graphics for the application of advanced machine learning especialy in remote sensing, climate modeling, biomedical, automobile, broadcasting and robotics industry. Currently, he also work in hyperspectral image understanding and climate modeling in cooperation with Agency for the Assessment and Application of Technology. One of his current scientific activity is a reviewer of Elsevier Journal of Computer Methods and Programs in Biomedicine. Before joining the faculty, he worked at Future Project Div. Toyota Motor Corp, Japan, as a member of middleware development and recognition team
Address: https://www.researchgate.net/profile/Mohamad_Ivan_Fanany

less

InterestsView All (41)

Uploads

Papers by Mohamad Ivan Fanany

Bi-directional Long Short-Term Memory using Quantized data of Deep Belief Networks for Sleep Stage Classification

by Mohamad Ivan Fanany, Intan Nurma Yulita, and Aniati M. Arymurthy

The study examines the use of quantization to be applied to Bi-directional Long Short-Term Memory... more The study examines the use of quantization to be applied to Bi-directional Long Short-Term Memory (Bi-LSTM), a combination of the two called qBi-LSTM. Quantization used comes from Deep Belief Networks (DBN). It selected DBN for its superiority as a generative model of Deep Learning in producing an optimal artificial feature. Development of qBi-LSTM is expected to improve the performance of Bi-LSTM and also provide efficient time. The qBi-LSTM test is applied for sleep stage classification on St. Vincent's University Hospital / University College Dublin's Sleep Apnea Database. The result shows that qBi-LSTM has the highest performance compared to Bi-LSTM and DBN with precision, recall and F-measure values of 86.00%, 72.10%, and 75.27%. The best qBi-LSTM performance is to classify Stage 2 but still fails to classify the stage of REM (Rapid Eye Movement).

Residual Convolutional Neural Network for Diabetic Retinopathy

by Mohamad Ivan Fanany and syahidah izza

—This research proposes a method to detect diabetic retinopathy automatically based on fundus p... more —This research proposes a method to detect diabetic retinopathy automatically based on fundus photography evaluation. This automatic method will speed up diabetic retinopathy detection process especially in Indonesia which lack of ophthalmologist. Besides, the difference of doctor ability and experience may produce an inconsistent result. Thus, with this method, we hope automatic detection of diabetic retinopathy will speed up with a consistent result so blindness effect from diabetic retinopathy can be prevented as early as possible. Convolutional Neural Network (CNN) is one of neural network variant which can detect the pattern on an image very well. Residual CNN is one of CNN variant which can prevent accuracy degradation for a deep neural network. Therefore this inspire us to apply Residual CNN on diabetic retinopathy. This Residual Network can detect diabetic retinopathy with kappa score 0.51049.

Recommender System Improvement Cases Through Implicit Feedbacks from Social Network

by Mohamad Ivan Fanany and Ibrahim Malik Khasbulloh

— Recommender systems (RS) performance largely depends on diverse types of input that characteriz... more — Recommender systems (RS) performance largely depends on diverse types of input that characterize users' preference in the form of both explicit and implicit feedbacks. An explicit feedback is stated directly by an explicit input from users regarding their interest in some options of services or products. Such feedback, however, is not always available. On the other hand, an implicit feedback, which reflects users' opinion indirectly through user behavior is far more abundant. In this paper, we elaborate several ways to improve the RS of three real cases dataset (online travel service, online transportation, and telecommunication service provider) through implicit feedbacks. In the first case, we analyze the effect of a simple feedback from users' input during registration without using any social network analysis (SNA). In the second case, we analyze the effect of community structure extracted from its SNA as its additional attributes. In the third case, we analyze the effect of more additional feedback attributes (modularity, PageRank, eigenvector centrality, clustering coefficient, weighted in degree, weighted outdegree, weighted degree) which also obtained from the SNA of the corresponding dataset. Given the right hyperparameter settings, we observed RS improvement in term of RMSE (root mean square error) in the three cases. In this paper, three RS models: SVD, SVD++, and difference SVD are used. Besides discussing the RS performance, we also discuss the computational cost incurred from incorporating those implicit feedbacks.

Fuzzy Latent-Dynamic Conditional Neural Fields for Gesture Recognition in Video

With the explosion of data on the internet led to the presence of the big data era, so it require... more With the explosion of data on the internet led to the presence of the big data era, so it requires data processing to get the useful information. One of the challenges is the gesture recognition the video processing. Therefore, the study proposes Latent-Dynamic Conditional Neural Fields and compares with the other family members of Conditional Random Fields. For improving the accuracy, these methods are combined by using Fuzzy Clustering. From the results, it can be concluded that the performance of Fuzzy Latent-Dynamic Conditional Neural Fields are the highest. Also, the combination of the basic classifiers and Fuzzy C-Means Clustering has the higher than the original ones. The evaluation is tested on a temporal dataset of gesture phase segmentation.

Faster R-CNN with Structured Sparsity Learning and Ristretto for Mobile Environment

— Deep learning is a part of machine learning area that has proven to solve many problems in the ... more — Deep learning is a part of machine learning area that has proven to solve many problems in the real world such as object recognition and detection. One of popular deep learning methods is Faster Region-Based Convolutional Neural Network (Faster R-CNN). Faster R-CNN proposed an integrated structure of CNN and region proposal network to detect multiple objects in a single image. Even though deep learning is powerful for object recognition or detection, it would still be a problem for implementing both the learning and the inference on mobile devices due to the need for a large memory and computation. In this paper, we propose to reduce the number of filters and nodes in the convolutional and fully connected layer to 50% to make it feasible for implementation in a mobile environment and compared it with the original model. Second, we use Structured Sparsity Learning (SSL) in the convolutional layer to regularize Deep Neural Network (DNN) structure with group lasso. Third, we use Ristretto framework to convert floating point to 8 and 16 bits fixed point to represent weights and outputs of the fully connected layer. Our result shows that filter and node number reduction lowering memory storage down to 4.16x and successfully trained on NVIDIA Jetson Tegra TX1 Development Kit as mobile environment emulator. Ristretto successfully condense a model to 16 or 8 bits with error tolerance ~1% but has better accuracy from 0.85 to 0.87 at k = 5 for the original model, and 0.84 to 0.85 at k = 10 for 50% model on CCTV UI dataset. SSL works well on 50% model that obtain better accuracy from 0.83 to 0.84 in k=5 and from 0.84 to 0.86 in k=10 and accelerates computation time 2.72x faster than the original convolution layer without SSL.

Learning Explicit and Implicit Knowledge with Differentiable Neural Computer

by Mohamad Ivan Fanany and Adnan Ardhian

—Neural Network can perform various of tasks well after learning process, but still have limitati... more —Neural Network can perform various of tasks well after learning process, but still have limitations in remembering. This is due to very limited memory. Differentiable Neural Computer or DNC is proven to address the problem. DNC consist of Neural Network which associated with an external memory module that works like a tape on an accessible Turing Machine. DNC can solve simple problems that require memory, such as copy, graph, and Question Answering. DNC learns the algorithm to accomplish the task based on input and output. In this research, DNC with MLP or Multi-Layer Perceptron as the controller is compared with MLP only. The aim of this investigation is to test the ability of the neural network to learn explicit and implicit knowledge at once. The tasks are sequence classification and sequence addition of MNIST handwritten digits. The results show that MLP which has an external memory is much better than without external memory to process sequence data. The results also show that DNC as a fully differentiable system can solve the problem that requires explicit and implicit knowledge learning at once.

Sentence-level Indonesian Lip Reading with Spatiotemporal CNN and Gated RNN

—It is widely known that visual cues play an important role in speech, especially in disambiguati... more —It is widely known that visual cues play an important role in speech, especially in disambiguating confusable phonemes or as a means for " hearing " visually. Interpreting speech only through visual signal is called lip reading. Lip reading has several potential application as a complementary modality to speech recognition or as purely visual speech recognition, which gives rises to silent speech interface, which by itself has numerous practical application. Although the overwhelming potential of such system, research on lip reading for the Indonesian language was extremely limited, with settings still very distant from the real world. This research is an attempt to make a lip reading model that has the potential to be applicable in the real world, specifically by building a lip reading model that supports a variable-length sentence as its input. We build the model using deep learning, specifically spatiotemporal Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) that both respectively form spatiotemporal feature extractor and character-level sentence decoder. During the process, we also investigate whether knowledge on lip reading on other language affects the acquisition of a different language. To the best of our knowledge, our model was the first sentence level Indonesian language lip reading that supports variable-length input. Our model achieved superhuman performance on all metrics, with almost 2× better word accuracy.

Visual-Only Word Boundary Detection

by Mohamad Ivan Fanany and Muhammad Aulia

Word boundary detection is one of the primary components in speech recognition system, which can ... more Word boundary detection is one of the primary components in speech recognition system, which can be learned jointly as part of the speech model or independently as an extra step of preprocessing, reducing the problem into a conditionally independent word prediction. It can also be used to separate Out of Vocabulary (OOV) words in the sentence, thereby avoiding unnecessary computation. By itself, word boundary detection is essential in multimodal corpus collection, in which it allows automated and detailed labeling towards the dataset, be it on sentence or word level. In this research, we proposed a novel approach in word boundary detection, that is, by utilizing only visual information , using 3−Dimensional Convolutional Neural Network (3D-CNN) and Bidirectional-Gated Recurrent Unit (Bi-GRU). This research is important in paving the way for a better lip reading system, as well as multi-modal speech recognition, as it allows easier creation of novel dataset and enables conventional word-level visual or multimodal speech recognition system to work on continuous speech. Training was done on GRID video corpus on 118 epochs. The proposed model performed well compared to the baseline method, with considerably lower error rate.

Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition

—Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data... more —Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data, even more so for deep learning ASR system which is at the forefront of ASR research. A multitude of such corpus has been built to accommodate such need, ranging from single modal corpus which caters the need for mostly acoustic speech recognition, with several exceptions on visual speech decoding, to multimodal corpus which provides the need for both modalities. Multimodal corpus was significant in the development of ASR as speech is inherently multimodal in the very first place. Despite the importance , none of this corpus was built for Indonesian language, resulting in little to no development of visual-only or multimodal ASR systems. This research is an attempt to solve that problem by constructing AVID, an Indonesian audiovisual speech corpus for multimodal ASR. The corpus consists of 10 speakers speaking 1,040 sentences with a simple structure, resulting in 10,400 videos of spoken sentences. To the best of our knowledge, AVID is the first audiovisual speech corpus for the Indonesian language which is designed for multimodal ASR. AVID was heavily tested and contains overall low errors in both modality tests, which indicates the high quality and suitability of the corpus for building multimodal ASR systems.

Deep Learning for Text Processing with Focus on Word Embedding: Concept and Applications

Big Data Concepts

SCIENCE & TECHNOLOGY Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays

Cheating Video Description Based on Sequences of Gestures

Cheating during exams is a problem in the field of education. Cheating during exams undermine the... more Cheating during exams is a problem in the field of education. Cheating during exams undermine the efforts to evaluate the student's proficiency and growth. We propose a real-time cheating detection system using video feed that allows the ability to monitor students during written exams for any illegal behaviors and gestures, such as giving codes, looking at friends, using a cheat sheet, talking and exchanging papers between students. The gestures recognized during the runtime of the video from sequences of actions performed by the subjects which are then used to generate textual descriptions based on the detected cheating gestures. These textual descriptions help the process of documenting activities that transpired during the exams for later use. Our proposed system comprises two primary subsystems, a gesture recognition model based on 3DCNN and XGBoost and a language generation model based on an LSTM network. The gesture recognition model achieves recognition of the cheating gestures with 81.11% accuracy and Kappa statistic 0.760. The language generation model achieves 95.3 % word accuracy and average edit distance 1.076 on single subject description sentences, and 96.6% word accuracy and average edit distance 3.305 on interaction description sentences. The system runs at 32.54 fps on a mid-range laptop.

Handwriting Recognition on Form Document Using Convolutional Neural Network and Support Vector Machines (CNN-SVM

In this paper, we propose a workflow and a machine learning model for recognizing handwritten cha... more In this paper, we propose a workflow and a machine learning model for recognizing handwritten characters on form document. The learning model is based on Convolutional Neural Network (CNN) as a powerful feature extraction and Support Vector Machines (SVM) as a high-end classifier. The proposed method is more efficient than modifying the CNN with complex architecture. We evaluated some SVM and found that the linear SVM using L1 loss function and L2 regularization giving the best performance both of the accuracy rate and the computation time. Based on the experiment results using data from NIST SD 19 2 nd edition both for training and testing, the proposed method which combines CNN and linear SVM using L1 loss function and L2 regularization achieved a recognition rate better than only CNN. The recognition rate achieved by the proposed method are 98.85% on numeral characters, 93.05% on uppercase characters, 86.21% on lowercase characters, and 91.37 on the merger of numeral and uppercase characters. While the original CNN achieves an accuracy rate of 98.30% on numeral characters, 92.33% on uppercase characters, 83.54% on lowercase characters, and 88.32% on the merger of numeral and uppercase characters. The proposed method was also validated by using ten folds cross-validation, and it shows that the proposed method still can improve the accuracy rate. The learning model was used to construct a handwriting recognition system to recognize a more challenging data on form document automatically. The pre-processing, segmentation and character recognition are integrated into one system. The output of the system is converted into an editable text. The system gives an accuracy rate of 83.37% on ten different test form document.

Man Woman Detection in Surveillance Images

Human gender detection from body profile is an important task for surveillance. Most surveillance... more Human gender detection from body profile is an important task for surveillance. Most surveillance cameras are placed at a distance such that it is not possible to see people's face clearly. In this paper, we report the comparison between fast-feature pyramids and deep region-based convolutional neural network (RCNN) to detect a person in surveillance images. Since RCNN performs better in detecting a person, further training is applied to the RCNN to detect man and woman. Transfer learning strategy is used due to a small number of training images. The result shows that the trained RCNN can detect man and woman with promising result.

EEG Channels Reduction using PCA to Increase XGBoost's Accuracy for Stroke Detection

by Mohamad Ivan Fanany and Nilam Fitriah

In Indonesia, based on the result of Basic Health Research 2013, the number of stroke patients ha... more In Indonesia, based on the result of Basic Health Research 2013, the number of stroke patients had increased from 8.3‰ (2007) to 12.1‰ (2013). These days, some researchers are using electroencephalography (EEG) result as another option to detect the stroke disease besides CT Scan image as the gold standard. A previous study on the data of stroke and healthy patients in National Brain Center Hospital (RS PON) used Brain Symmetry Index (BSI), Delta-Alpha Ratio (DAR), and Delta-Theta-Alpha-Beta Ratio (DTABR) as the features for classification by an Extreme Learning Machine (ELM). The study got 85% accuracy with sensitivity above 86% for acute ischemic stroke detection. Using EEG data means dealing with many data dimensions, and it can reduce the accuracy of classifier (the curse of dimensionality). Principal Component Analysis (PCA) could reduce dimensionality and computation cost without decreasing classification accuracy. XGBoost, as the scalable tree boosting classifier, can solve real-world scale problems (Higgs Boson and Allstate dataset) with using a minimal amount of resources. This paper reuses the same data from RS PON and features from previous research, preprocessed with PCA and classified with XGBoost, to increase the accuracy with fewer electrodes. The specific fewer electrodes improved the accuracy of stroke detection. Our future work will examine the other algorithm besides PCA to get higher accuracy with less number of channels.

Gesture Recognition using Latent-Dynamic based Conditional Random Fields and Scalar Features

by Mohamad Ivan Fanany and Intan Nurma Yulita

The need for segmentation and labeling of sequence data appears in several fields. The use of the... more The need for segmentation and labeling of sequence data appears in several fields. The use of the conditional models such as Conditional Random Fields is widely used to solve this problem. In the pattern recognition, Conditional Random Fields specify the possibilities of a sequence label. This method constructs its full label sequence to be a probabilistic graphical model based on its observation. However, Conditional Random Fields can not capture the internal structure so that Latent-based Dynamic Conditional Random Fields is developed without leaving external dynamics of inter-label. This study proposes the use of Latent-Dynamic Conditional Random Fields for Gesture Recognition and comparison between both methods. Besides, this study also proposes the use of a scalar features to gesture recognition. The results show that performance of Latent-dynamic based Conditional Random Fields is not better than the Conditional Random Fields, and scalar features are effective for both methods are in gesture recognition. Therefore, it recommends implementing Conditional Random Fields and scalar features in gesture recognition for better performance.

A Heuristic Hidden Markov Model to Recognize Inflectional Words in Sign System for Indonesian Language known as SIBI (Sistem Isyarat Bahasa Indonesia

by Mohamad Ivan Fanany and Erdefi Rakun

—SIBI (Sistem Isyarat Bahasa Indonesia) is the commonly used sign language in Indonesia. SIBI, wh... more —SIBI (Sistem Isyarat Bahasa Indonesia) is the commonly used sign language in Indonesia. SIBI, which follows Indonesian language's grammatical structure, is a complex and unique sign language. A method to recognize SIBI gestures in a rapid, precise and efficient manner needs to be developed for the SIBI machine translation system. Feature extraction method with space-efficient feature set and at the same time retained its capability to recognize different types of SIBI gestures is the ultimate goal. There are four types of SIBI gestures: root, affix, inflectional and function word gestures. This paper proposed to use heuristic Hidden Markov Model and a feature extraction system to separate inflectional gesture into its constituents, prefix, suffix and root. The separation reduces the amount of feature sets that would otherwise as big as the product of the prefixes, suffixes and root words feature sets of the inflectional word gestures.

Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays

by Mohamad Ivan Fanany and Afif A Iskandar

Aside from the proper usage of grammar, diction and punctuation, a good essay must have cohesion ... more Aside from the proper usage of grammar, diction and punctuation, a good essay must have cohesion and coherence. In persuasive essay, argumentative discourse is important as the parameter to see the cohesion and coherence among the arguments. An argument is characterized by one's stance (claim) which is strengthened with facts (premises) to complete the validity of the stance. Ideally, claims have to be followed by premises either they support or attack the claims. In this paper, we try to identify 4 kinds of argument components (major claim, claim, premise, and non-argumentative) using some predefined features and measure the performance of word vector representation utilization in identifying argument components. We also present the results of our initial experiment by using deep learning to classify the argument components.

Multimodal Decomposable Models by Superpixel Segmentation and Point-in-Time Cheating Detection

—This research aims to classify cheating activity during exam from video observation. The method ... more —This research aims to classify cheating activity during exam from video observation. The method uses Conditional Random Field (CRF) for classifying and detecting some classes of cheating activities. The method used to detect the location of the joints of the body is a Multimodal Decomposable Model (MODEC) with superpixel segmentation. The used joints are head, shoulders, elbows, and wrists. The superpixel method is Simple Linear Iterative Clustering (SLIC). Comparison between MODEC and MODEC + SLIC as feature detector for CRF showed that MODEC + SLIC capable of providing a better activity classification. From our experiments, the cheating activities in average can be detected up to 83.9%. Moving beyond only detecting the class of motion segments, we also devised point-in-time event detection system also using CRF. The time of occurrences of three consecutive cheating activities are determined from a sequence of video frames.

Bi-directional Long Short-Term Memory using Quantized data of Deep Belief Networks for Sleep Stage Classification

by Mohamad Ivan Fanany, Intan Nurma Yulita, and Aniati M. Arymurthy

The study examines the use of quantization to be applied to Bi-directional Long Short-Term Memory... more The study examines the use of quantization to be applied to Bi-directional Long Short-Term Memory (Bi-LSTM), a combination of the two called qBi-LSTM. Quantization used comes from Deep Belief Networks (DBN). It selected DBN for its superiority as a generative model of Deep Learning in producing an optimal artificial feature. Development of qBi-LSTM is expected to improve the performance of Bi-LSTM and also provide efficient time. The qBi-LSTM test is applied for sleep stage classification on St. Vincent's University Hospital / University College Dublin's Sleep Apnea Database. The result shows that qBi-LSTM has the highest performance compared to Bi-LSTM and DBN with precision, recall and F-measure values of 86.00%, 72.10%, and 75.27%. The best qBi-LSTM performance is to classify Stage 2 but still fails to classify the stage of REM (Rapid Eye Movement).

Residual Convolutional Neural Network for Diabetic Retinopathy

by Mohamad Ivan Fanany and syahidah izza

—This research proposes a method to detect diabetic retinopathy automatically based on fundus p... more —This research proposes a method to detect diabetic retinopathy automatically based on fundus photography evaluation. This automatic method will speed up diabetic retinopathy detection process especially in Indonesia which lack of ophthalmologist. Besides, the difference of doctor ability and experience may produce an inconsistent result. Thus, with this method, we hope automatic detection of diabetic retinopathy will speed up with a consistent result so blindness effect from diabetic retinopathy can be prevented as early as possible. Convolutional Neural Network (CNN) is one of neural network variant which can detect the pattern on an image very well. Residual CNN is one of CNN variant which can prevent accuracy degradation for a deep neural network. Therefore this inspire us to apply Residual CNN on diabetic retinopathy. This Residual Network can detect diabetic retinopathy with kappa score 0.51049.

Recommender System Improvement Cases Through Implicit Feedbacks from Social Network

by Mohamad Ivan Fanany and Ibrahim Malik Khasbulloh

— Recommender systems (RS) performance largely depends on diverse types of input that characteriz... more — Recommender systems (RS) performance largely depends on diverse types of input that characterize users' preference in the form of both explicit and implicit feedbacks. An explicit feedback is stated directly by an explicit input from users regarding their interest in some options of services or products. Such feedback, however, is not always available. On the other hand, an implicit feedback, which reflects users' opinion indirectly through user behavior is far more abundant. In this paper, we elaborate several ways to improve the RS of three real cases dataset (online travel service, online transportation, and telecommunication service provider) through implicit feedbacks. In the first case, we analyze the effect of a simple feedback from users' input during registration without using any social network analysis (SNA). In the second case, we analyze the effect of community structure extracted from its SNA as its additional attributes. In the third case, we analyze the effect of more additional feedback attributes (modularity, PageRank, eigenvector centrality, clustering coefficient, weighted in degree, weighted outdegree, weighted degree) which also obtained from the SNA of the corresponding dataset. Given the right hyperparameter settings, we observed RS improvement in term of RMSE (root mean square error) in the three cases. In this paper, three RS models: SVD, SVD++, and difference SVD are used. Besides discussing the RS performance, we also discuss the computational cost incurred from incorporating those implicit feedbacks.

Fuzzy Latent-Dynamic Conditional Neural Fields for Gesture Recognition in Video

With the explosion of data on the internet led to the presence of the big data era, so it require... more With the explosion of data on the internet led to the presence of the big data era, so it requires data processing to get the useful information. One of the challenges is the gesture recognition the video processing. Therefore, the study proposes Latent-Dynamic Conditional Neural Fields and compares with the other family members of Conditional Random Fields. For improving the accuracy, these methods are combined by using Fuzzy Clustering. From the results, it can be concluded that the performance of Fuzzy Latent-Dynamic Conditional Neural Fields are the highest. Also, the combination of the basic classifiers and Fuzzy C-Means Clustering has the higher than the original ones. The evaluation is tested on a temporal dataset of gesture phase segmentation.

Faster R-CNN with Structured Sparsity Learning and Ristretto for Mobile Environment

— Deep learning is a part of machine learning area that has proven to solve many problems in the ... more — Deep learning is a part of machine learning area that has proven to solve many problems in the real world such as object recognition and detection. One of popular deep learning methods is Faster Region-Based Convolutional Neural Network (Faster R-CNN). Faster R-CNN proposed an integrated structure of CNN and region proposal network to detect multiple objects in a single image. Even though deep learning is powerful for object recognition or detection, it would still be a problem for implementing both the learning and the inference on mobile devices due to the need for a large memory and computation. In this paper, we propose to reduce the number of filters and nodes in the convolutional and fully connected layer to 50% to make it feasible for implementation in a mobile environment and compared it with the original model. Second, we use Structured Sparsity Learning (SSL) in the convolutional layer to regularize Deep Neural Network (DNN) structure with group lasso. Third, we use Ristretto framework to convert floating point to 8 and 16 bits fixed point to represent weights and outputs of the fully connected layer. Our result shows that filter and node number reduction lowering memory storage down to 4.16x and successfully trained on NVIDIA Jetson Tegra TX1 Development Kit as mobile environment emulator. Ristretto successfully condense a model to 16 or 8 bits with error tolerance ~1% but has better accuracy from 0.85 to 0.87 at k = 5 for the original model, and 0.84 to 0.85 at k = 10 for 50% model on CCTV UI dataset. SSL works well on 50% model that obtain better accuracy from 0.83 to 0.84 in k=5 and from 0.84 to 0.86 in k=10 and accelerates computation time 2.72x faster than the original convolution layer without SSL.

Learning Explicit and Implicit Knowledge with Differentiable Neural Computer

by Mohamad Ivan Fanany and Adnan Ardhian

—Neural Network can perform various of tasks well after learning process, but still have limitati... more —Neural Network can perform various of tasks well after learning process, but still have limitations in remembering. This is due to very limited memory. Differentiable Neural Computer or DNC is proven to address the problem. DNC consist of Neural Network which associated with an external memory module that works like a tape on an accessible Turing Machine. DNC can solve simple problems that require memory, such as copy, graph, and Question Answering. DNC learns the algorithm to accomplish the task based on input and output. In this research, DNC with MLP or Multi-Layer Perceptron as the controller is compared with MLP only. The aim of this investigation is to test the ability of the neural network to learn explicit and implicit knowledge at once. The tasks are sequence classification and sequence addition of MNIST handwritten digits. The results show that MLP which has an external memory is much better than without external memory to process sequence data. The results also show that DNC as a fully differentiable system can solve the problem that requires explicit and implicit knowledge learning at once.

Sentence-level Indonesian Lip Reading with Spatiotemporal CNN and Gated RNN

—It is widely known that visual cues play an important role in speech, especially in disambiguati... more —It is widely known that visual cues play an important role in speech, especially in disambiguating confusable phonemes or as a means for " hearing " visually. Interpreting speech only through visual signal is called lip reading. Lip reading has several potential application as a complementary modality to speech recognition or as purely visual speech recognition, which gives rises to silent speech interface, which by itself has numerous practical application. Although the overwhelming potential of such system, research on lip reading for the Indonesian language was extremely limited, with settings still very distant from the real world. This research is an attempt to make a lip reading model that has the potential to be applicable in the real world, specifically by building a lip reading model that supports a variable-length sentence as its input. We build the model using deep learning, specifically spatiotemporal Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) that both respectively form spatiotemporal feature extractor and character-level sentence decoder. During the process, we also investigate whether knowledge on lip reading on other language affects the acquisition of a different language. To the best of our knowledge, our model was the first sentence level Indonesian language lip reading that supports variable-length input. Our model achieved superhuman performance on all metrics, with almost 2× better word accuracy.

Visual-Only Word Boundary Detection

by Mohamad Ivan Fanany and Muhammad Aulia

Word boundary detection is one of the primary components in speech recognition system, which can ... more Word boundary detection is one of the primary components in speech recognition system, which can be learned jointly as part of the speech model or independently as an extra step of preprocessing, reducing the problem into a conditionally independent word prediction. It can also be used to separate Out of Vocabulary (OOV) words in the sentence, thereby avoiding unnecessary computation. By itself, word boundary detection is essential in multimodal corpus collection, in which it allows automated and detailed labeling towards the dataset, be it on sentence or word level. In this research, we proposed a novel approach in word boundary detection, that is, by utilizing only visual information , using 3−Dimensional Convolutional Neural Network (3D-CNN) and Bidirectional-Gated Recurrent Unit (Bi-GRU). This research is important in paving the way for a better lip reading system, as well as multi-modal speech recognition, as it allows easier creation of novel dataset and enables conventional word-level visual or multimodal speech recognition system to work on continuous speech. Training was done on GRID video corpus on 118 epochs. The proposed model performed well compared to the baseline method, with considerably lower error rate.

Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition

—Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data... more —Advancement of Automatic Speech Recognition (ASR) relies heavily on the availability of the data, even more so for deep learning ASR system which is at the forefront of ASR research. A multitude of such corpus has been built to accommodate such need, ranging from single modal corpus which caters the need for mostly acoustic speech recognition, with several exceptions on visual speech decoding, to multimodal corpus which provides the need for both modalities. Multimodal corpus was significant in the development of ASR as speech is inherently multimodal in the very first place. Despite the importance , none of this corpus was built for Indonesian language, resulting in little to no development of visual-only or multimodal ASR systems. This research is an attempt to solve that problem by constructing AVID, an Indonesian audiovisual speech corpus for multimodal ASR. The corpus consists of 10 speakers speaking 1,040 sentences with a simple structure, resulting in 10,400 videos of spoken sentences. To the best of our knowledge, AVID is the first audiovisual speech corpus for the Indonesian language which is designed for multimodal ASR. AVID was heavily tested and contains overall low errors in both modality tests, which indicates the high quality and suitability of the corpus for building multimodal ASR systems.

Deep Learning for Text Processing with Focus on Word Embedding: Concept and Applications

Big Data Concepts

SCIENCE & TECHNOLOGY Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays

Cheating Video Description Based on Sequences of Gestures

Cheating during exams is a problem in the field of education. Cheating during exams undermine the... more Cheating during exams is a problem in the field of education. Cheating during exams undermine the efforts to evaluate the student's proficiency and growth. We propose a real-time cheating detection system using video feed that allows the ability to monitor students during written exams for any illegal behaviors and gestures, such as giving codes, looking at friends, using a cheat sheet, talking and exchanging papers between students. The gestures recognized during the runtime of the video from sequences of actions performed by the subjects which are then used to generate textual descriptions based on the detected cheating gestures. These textual descriptions help the process of documenting activities that transpired during the exams for later use. Our proposed system comprises two primary subsystems, a gesture recognition model based on 3DCNN and XGBoost and a language generation model based on an LSTM network. The gesture recognition model achieves recognition of the cheating gestures with 81.11% accuracy and Kappa statistic 0.760. The language generation model achieves 95.3 % word accuracy and average edit distance 1.076 on single subject description sentences, and 96.6% word accuracy and average edit distance 3.305 on interaction description sentences. The system runs at 32.54 fps on a mid-range laptop.

Handwriting Recognition on Form Document Using Convolutional Neural Network and Support Vector Machines (CNN-SVM

In this paper, we propose a workflow and a machine learning model for recognizing handwritten cha... more In this paper, we propose a workflow and a machine learning model for recognizing handwritten characters on form document. The learning model is based on Convolutional Neural Network (CNN) as a powerful feature extraction and Support Vector Machines (SVM) as a high-end classifier. The proposed method is more efficient than modifying the CNN with complex architecture. We evaluated some SVM and found that the linear SVM using L1 loss function and L2 regularization giving the best performance both of the accuracy rate and the computation time. Based on the experiment results using data from NIST SD 19 2 nd edition both for training and testing, the proposed method which combines CNN and linear SVM using L1 loss function and L2 regularization achieved a recognition rate better than only CNN. The recognition rate achieved by the proposed method are 98.85% on numeral characters, 93.05% on uppercase characters, 86.21% on lowercase characters, and 91.37 on the merger of numeral and uppercase characters. While the original CNN achieves an accuracy rate of 98.30% on numeral characters, 92.33% on uppercase characters, 83.54% on lowercase characters, and 88.32% on the merger of numeral and uppercase characters. The proposed method was also validated by using ten folds cross-validation, and it shows that the proposed method still can improve the accuracy rate. The learning model was used to construct a handwriting recognition system to recognize a more challenging data on form document automatically. The pre-processing, segmentation and character recognition are integrated into one system. The output of the system is converted into an editable text. The system gives an accuracy rate of 83.37% on ten different test form document.

Man Woman Detection in Surveillance Images

Human gender detection from body profile is an important task for surveillance. Most surveillance... more Human gender detection from body profile is an important task for surveillance. Most surveillance cameras are placed at a distance such that it is not possible to see people's face clearly. In this paper, we report the comparison between fast-feature pyramids and deep region-based convolutional neural network (RCNN) to detect a person in surveillance images. Since RCNN performs better in detecting a person, further training is applied to the RCNN to detect man and woman. Transfer learning strategy is used due to a small number of training images. The result shows that the trained RCNN can detect man and woman with promising result.

EEG Channels Reduction using PCA to Increase XGBoost's Accuracy for Stroke Detection

by Mohamad Ivan Fanany and Nilam Fitriah

In Indonesia, based on the result of Basic Health Research 2013, the number of stroke patients ha... more In Indonesia, based on the result of Basic Health Research 2013, the number of stroke patients had increased from 8.3‰ (2007) to 12.1‰ (2013). These days, some researchers are using electroencephalography (EEG) result as another option to detect the stroke disease besides CT Scan image as the gold standard. A previous study on the data of stroke and healthy patients in National Brain Center Hospital (RS PON) used Brain Symmetry Index (BSI), Delta-Alpha Ratio (DAR), and Delta-Theta-Alpha-Beta Ratio (DTABR) as the features for classification by an Extreme Learning Machine (ELM). The study got 85% accuracy with sensitivity above 86% for acute ischemic stroke detection. Using EEG data means dealing with many data dimensions, and it can reduce the accuracy of classifier (the curse of dimensionality). Principal Component Analysis (PCA) could reduce dimensionality and computation cost without decreasing classification accuracy. XGBoost, as the scalable tree boosting classifier, can solve real-world scale problems (Higgs Boson and Allstate dataset) with using a minimal amount of resources. This paper reuses the same data from RS PON and features from previous research, preprocessed with PCA and classified with XGBoost, to increase the accuracy with fewer electrodes. The specific fewer electrodes improved the accuracy of stroke detection. Our future work will examine the other algorithm besides PCA to get higher accuracy with less number of channels.

Gesture Recognition using Latent-Dynamic based Conditional Random Fields and Scalar Features

by Mohamad Ivan Fanany and Intan Nurma Yulita

The need for segmentation and labeling of sequence data appears in several fields. The use of the... more The need for segmentation and labeling of sequence data appears in several fields. The use of the conditional models such as Conditional Random Fields is widely used to solve this problem. In the pattern recognition, Conditional Random Fields specify the possibilities of a sequence label. This method constructs its full label sequence to be a probabilistic graphical model based on its observation. However, Conditional Random Fields can not capture the internal structure so that Latent-based Dynamic Conditional Random Fields is developed without leaving external dynamics of inter-label. This study proposes the use of Latent-Dynamic Conditional Random Fields for Gesture Recognition and comparison between both methods. Besides, this study also proposes the use of a scalar features to gesture recognition. The results show that performance of Latent-dynamic based Conditional Random Fields is not better than the Conditional Random Fields, and scalar features are effective for both methods are in gesture recognition. Therefore, it recommends implementing Conditional Random Fields and scalar features in gesture recognition for better performance.

A Heuristic Hidden Markov Model to Recognize Inflectional Words in Sign System for Indonesian Language known as SIBI (Sistem Isyarat Bahasa Indonesia

by Mohamad Ivan Fanany and Erdefi Rakun

—SIBI (Sistem Isyarat Bahasa Indonesia) is the commonly used sign language in Indonesia. SIBI, wh... more —SIBI (Sistem Isyarat Bahasa Indonesia) is the commonly used sign language in Indonesia. SIBI, which follows Indonesian language's grammatical structure, is a complex and unique sign language. A method to recognize SIBI gestures in a rapid, precise and efficient manner needs to be developed for the SIBI machine translation system. Feature extraction method with space-efficient feature set and at the same time retained its capability to recognize different types of SIBI gestures is the ultimate goal. There are four types of SIBI gestures: root, affix, inflectional and function word gestures. This paper proposed to use heuristic Hidden Markov Model and a feature extraction system to separate inflectional gesture into its constituents, prefix, suffix and root. The separation reduces the amount of feature sets that would otherwise as big as the product of the prefixes, suffixes and root words feature sets of the inflectional word gestures.

Utilizing Word Vector Representation for Classifying Argument Components in Persuasive Essays

by Mohamad Ivan Fanany and Afif A Iskandar

Aside from the proper usage of grammar, diction and punctuation, a good essay must have cohesion ... more Aside from the proper usage of grammar, diction and punctuation, a good essay must have cohesion and coherence. In persuasive essay, argumentative discourse is important as the parameter to see the cohesion and coherence among the arguments. An argument is characterized by one's stance (claim) which is strengthened with facts (premises) to complete the validity of the stance. Ideally, claims have to be followed by premises either they support or attack the claims. In this paper, we try to identify 4 kinds of argument components (major claim, claim, premise, and non-argumentative) using some predefined features and measure the performance of word vector representation utilization in identifying argument components. We also present the results of our initial experiment by using deep learning to classify the argument components.

Multimodal Decomposable Models by Superpixel Segmentation and Point-in-Time Cheating Detection

—This research aims to classify cheating activity during exam from video observation. The method ... more —This research aims to classify cheating activity during exam from video observation. The method uses Conditional Random Field (CRF) for classifying and detecting some classes of cheating activities. The method used to detect the location of the joints of the body is a Multimodal Decomposable Model (MODEC) with superpixel segmentation. The used joints are head, shoulders, elbows, and wrists. The superpixel method is Simple Linear Iterative Clustering (SLIC). Comparison between MODEC and MODEC + SLIC as feature detector for CRF showed that MODEC + SLIC capable of providing a better activity classification. From our experiments, the cheating activities in average can be detected up to 83.9%. Moving beyond only detecting the class of motion segments, we also devised point-in-time event detection system also using CRF. The time of occurrences of three consecutive cheating activities are determined from a sequence of video frames.

Big Data Concepts II

Membuat Proposal Penelitian yang Baik

Agents Based Modeling and Simulation of Indonesian Rice Price From Decentralized Bilateral Exchange

In this study, we build an agent based modeling based on decentralized bilateral exchange for und... more In this study, we build an agent based modeling based on decentralized bilateral exchange for understanding the impact of import decisions on Indonesian rice prices. We modeled four main agents or players in Indonesian rice market: farmers, small traders, big traders, and BULOG (Indonesian Logistic Body). Some factual data on Indonesian rice price dynamics are addressed.

Understanding The Impact of Import Decisions on Indonesian Livestock Prices Through Decentralized Bilateral Exchange and Walrasian Economy

In this study, we build an agent based modeling based on decentralized bilateral exchange for und... more In this study, we build an agent based modeling based on decentralized bilateral exchange for understanding the impact of import decisions on Indonesian livestock prices. We modeled three main agents or players in Indonesia livestock market: cattle raiser, cattle feedlot industry, beef products importers. Some factual data on Indonesian livestock dynamics are addressed.

Comprehensive Hyperspectral Analysis for Indonesian Rice Agricultural Needs involving Climate and Social Dynamics

Paddy field monitoring in Indoenesia, still uses direct human observation and based on statistica... more Paddy field monitoring in Indoenesia, still uses direct human observation and based on statistical calculation. This often causes irregularities, since the results tend to be either excessive or low-estimated. To support our national food security program, more comprehensive predicition system for paddy fields are really needed. Remote sensing technology is applied by also considering future climate and social dynamics.

IEEE Transactions and Journals List, Review Speed, Impact Factors, and Open Access Fee

The following is a list of some of IEEE computer science Transactions and Journals that are relev... more The following is a list of some of IEEE computer science Transactions and Journals that are relevant to our current works at Faculty of Computer Science, Universitas Indonesia
All links to the paper are given in the Journal Names. I also computed the average on each column.
I randomly select 5 most popular and recent papers from each Journal to compute Review Time, Revision Time, and Publication Time in days.
The Detailed Review Time Analysis can be found at the following link.
Open Access Fee for IEEE Journals varied. For hybrid Journals: $1750. For fully open access journals: start from $1350.
I computed Total Impacts per Total Review Time in months. Hope this would be helpful.

IEEE Transactions and Journals Review Time Analysis

The following I listed some of computer science IEEE Journals that are relevant to our current w... more The following I listed some of computer science IEEE Journals that are relevant to our current works at Faculty of Computer Science, Universitas Indonesia.
All links to the paper are given in the Journal Names. I also computed the average on each column.
I randomly select 5 most popular and recent “Regular” papers from each Journal to compute Review Time, Revision Time, and Publication Time.
Sometimes, the Revised date or the Publication date were not given in the paper but only the Accepted date (thus adaptation is necessary).

Hindawi Computer Science Journals List, Review Speed, Impact Factors, and Open Access Fee

The following I listed some of computer science Hindawi Journals that are relevant to our current... more The following I listed some of computer science Hindawi Journals that are relevant to our current works at Faculty of Computer Science, Universitas Indonesia
All links to the paper are given in the Journal Names. I also computed the average on each column. Open Access Fee for Hindawi Journals varied and for some even free.
I computed Average Impacts per Total Review Speed in Months. Hope this would be helpful. Unfortunately, unlike Elsevier and Springer Journals, not all Hindawi's Journal are indexed by Scopus (which is a must requirement in our university and government in Indonesia)

Elsevier Computer Science Journals List, Review Speed, Impact Factors, Open Access Fee, and Acceptance Rate

The following I listed some of computer science Elsevier Journals that are relevant to our curren... more The following I listed some of computer science Elsevier Journals that are relevant to our current works at Faculty of Computer Science, Universitas Indonesia. All links to the paper are given in the Journal Names. I also computed the average on each column. I computed Average Impacts per Total Review Time in Months. There is a correlation between the Average Impact Factors and Acceptance Rate. As writers, in my opinion, we generally prefer good Impact Factors and high acceptance rate. Thus I measure the Acceptance Rate (in %) times the Average Impact Factors. Hope this would be helpful.

Review on The Deepest of All Deep Learning

This writing summarizes and reviews a paper that put the deepest deep learning, i.e., Recurrent N... more This writing summarizes and reviews a paper that put the deepest deep learning, i.e., Recurrent Neural Networks in a seamless historical context of Deep Learning: Deep Learning in Neural Networks: An Overview. The author also gives a video lecture. This summary especially interested on the recurring themes of Deep Learning.

Review on A Comparative Study of Collaborative Filtering Algorithms

This writing summarizes and reviews a paper on A Comparative Study of Collaborative Filtering Alg... more

A Review on a Deep Learning that Reveals the Importance of Big Data

This writing summarizes and reviews on the paper that reveals the importance of Big Data for Deep... more

Review on Deep Learning that Predict How We Pose using Motion Features

This writing summarizes and reviews a deep learning that predict how we pose using motion feature... more

Review on A Paper that Combines Gabor Filter and Convolutional Neural Networks for Face Detection

This writing summarizes and reviews a paper that combines Gabor filters and convolutional neural ... more

Review on Deep Learning for Information Processing

This writing summarizes and reviews Deep Learning and Its Applications to Signal and Information ... more

Review on A Deep Learning for Sleep Analysis

This writing summarizes and reviews on a deep learning for sleep analysis: Sleep Stage Classifica... more

Review on A Deep Learning for Sentiment Analysis from Twitter

This writing summarizes and reviews a deep learning for sentiment analysis from twitter: Coooolll... more

Review on A Deep Learning for Sentiment Analysis

This writing summarizes and reviews a deep learning for large-scale sentiment classification (or ... more

Review on Deep Learning for Big Data: Challenges and Perspectives

This writing summarizes and reviews a paper on deep learning for big data: Big Data Deep Learning... more

Review on a Deep Learning that Makes Image Speaks Naturally

This writing summarizes and reviews a deep learning which make image speaks naturally: Deep Visua... more

Review on The First Deep Learning that Surpasses Human-Level Performance

This writing summarizes and reviews on the first reported paper on ImageNet classification using ... more

Review on The First Paper on Rectified Linear Units (The Building Block for Current State-of-the-art Deep Convolutional NN)

This writing summarizes and reviews the first paper on rectified linear units (the building block... more

Review on The Paper which Reveals The Power of Convolutional Neural Net

This writing summarizes and reviews on a paper that try to confirm and understand why large convo... more

Review on The Most Intriguing Paper on Deep Learning

This writing summarizes and reviews the most intriguing paper on deep learning: Intriguing proper... more This writing summarizes and reviews the most intriguing paper on deep learning: Intriguing properties of neural networks.

Motivations:

Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks.
Their expressiveness is the reason they succeed but also causes them to learn uninterpretable solutions that could have counter-intuitive properties.

Review on The First Deep Learning for Churn Prediction

This writing summarizes and reviews the first reported work on deep learning for churn (the loss ... more

Introduction to Recommender System

A recommender system (RS) is the software system that determines which services or products shou... more A recommender system (RS) is the software system that determines which services or products should be offered to a particular visitor. RS may serve two different purposes: To stimulate users into doing something such as buying or watching; As tools for dealing with information overload, select
the most interesting items from a larger set.

A TUTORIAL ON COLLABORATIVE FILTERING

Collaborative Filtering Basic Idea: If users shared the same interests in the past – if they vi... more Collaborative Filtering
Basic Idea:
If users shared the same interests in the past – if they viewed or bought the same books, for instance – they will also have similar tastes in the future.
Selection of hopefully interesting books involves filtering the most promising ones from a large set and because the users implicitly collaborate with one another, thus called collaborative filtering (CF).
Pure CFs do not exploit or require any knowledge about the books themselves.
Advantage: the data do not have to be entered into the system.
Shortcomings: using such characteristics to propose books that the user like might be more effective.

A Worksheet Tutorial on Conditional Random Fields

This is a worksheet tutorial which serves for understanding the basic concept and process of Con... more This is a worksheet tutorial which serves for understanding the basic concept
and process of Conditional Random Fields (CRF). This worksheet is based on a very
excellent tutorial on CRF by [Edwin Chen](http://blog.echen.me/2012/01/03/introduction-to-conditional-random-fields/).
I hope this worksheet will clarify the tutorial. I add other two feature functions
to make the example looks more realistic. I hope to extend this worksheet latter
to the dynamic programming part, HCRF, and deep structured CRF.

An SVM tutorial compiled in mind map for easier understanding and more concise summary.

RVM tutorial in mind map for easier understanding and more concise summary.

RVM Explained in a mind map for easier understanding and concise summary.