Papers by Deepu Vijayasenan
arXiv (Cornell University), Mar 1, 2023
Computers in Biology and Medicine
2022 National Conference on Communications (NCC)
2016 22nd Annual International Conference on Advanced Computing and Communication (ADCOM), 2016
In this work, we try to study the effect of a wireless channel on physical parameter prediction b... more In this work, we try to study the effect of a wireless channel on physical parameter prediction based on speech data. Speech data from 207 speakers along with corresponding speaker’s height and weight is collected. A three path Rayleigh fading channel with typical values of Doppler shift, path gain and path delay is utilized to create the mobile channel output audio. A Bag of Words (BoW) representation based on log magnitude spectrum is used as features. Support Vector Regression (SVR) predicts the physical parameter of the speaker from the BoW representation. The proposed system is able to achieve a Root Mean Square Error (RMSE) of 6.6 cm for height estimation and 8.9 Kg for weight estimation for clean speech. The effect of Rayleigh channel increase the RMSE values to 8.17 cm and 11.84 Kg respectively for height and weight.
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019
TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 2019
Ki-67 labeling index is a widely used biomarker for the diagnosis and monitoring of cancer. Many ... more Ki-67 labeling index is a widely used biomarker for the diagnosis and monitoring of cancer. Many automated techniques have been proposed for evaluating Ki-67 index. In this paper, we introduce an integrated deep learning based approach. We use MobileUnet model for segmentation and classification and connected component based algorithm for the estimation of Ki-67 index in bladder cancer cases. The average F1 score is 0.92 and dice score is 0.96. The mean absolute error in the evaluated Ki-67 index is 2.1. We also explore possible pre-processing steps to generalize the segmentation model to at least one another type of cancer. Histogram matching and re-sizing improve the performance in breast cancer data by 12% in F1 score and 8% in dice score.
Recent advances in sensor technology and mobile computing are now enabling practical non-intrusiv... more Recent advances in sensor technology and mobile computing are now enabling practical non-intrusive approaches to measure vital signs and other biological signals. Remote measurements of physiological signals can provide comfortable health assessment without the presence of any electrodes or devices on the body. Our goal is to extract cardiac pulse rate and blood pressure from colour video recordings of human face. Our method is based on Eulerian Video Magnification (EVM) framework, which takes a standard video sequence as input, and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information. EVM framework is a generic algorithm developed to reveal minute changes happening in real world. In our case it is used to visualize the flow of blood as it fills the face and also to amplify and reveal small motions which cannot be observed by naked eye. EVM typically magnifies colour variations to visualize f...
TENCON 2017 - 2017 IEEE Region 10 Conference
Estimating speaker's physical parameters like height, weight and shoulder size can assist in ... more Estimating speaker's physical parameters like height, weight and shoulder size can assist in voice forensics by providing additional knowledge about the speaker. In this work, statistics of the components of background GMM are employed as features in estimating the physical parameters. These features improved the performance of height and shoulder size estimation as compared to our earlier attempt based on a Bag of Word representation. The robustness of the features is validated using two different training subsets containing different languages.
2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)
Ki-67 labelling index is a biomarker which is used across the world to predict the aggressiveness... more Ki-67 labelling index is a biomarker which is used across the world to predict the aggressiveness of cancer. To compute the Ki-67 index, pathologists normally count the tumour nuclei from the slide images manually; hence it is timeconsuming and is subject to inter pathologist variability. With the development of image processing and machine learning, many methods have been introduced for automatic Ki-67 estimation. But most of them require manual annotations and are restricted to one type of cancer. In this work, we propose a pooled Otsu’s method to generate labels and train a semantic segmentation deep neural network (DNN). The output is postprocessed to find the Ki-67 index. Evaluation of two different types of cancer (bladder and breast cancer) results in a mean absolute error of 3.52%. The performance of the DNN trained with automatic labels is better than DNN trained with ground truth by an absolute value of 1.25%.
2021 IEEE 18th India Council International Conference (INDICON), 2021
In the field of Neuro-oncology, there is a need for improved diagnosis and prognosis of brain tum... more In the field of Neuro-oncology, there is a need for improved diagnosis and prognosis of brain tumors. Brain tumor segmentation is important for treatment planning and assessing the treatment outcomes. Manual segmentation of brain tumors is tedious, time-consuming, and subjective. In this work, an efficient encoder-decoder based architectures were implemented for automatic segmentation of brain tumors from low resolution 2D images. Ensemble of the multiple architectures (EMMA) improves the performance of the brain tumor segmentation. Furthermore, the computational requirements of the proposed models are lower than that of BraTS-challenge methods. The average Fl-scores on the BraTS-challenge validation dataset for Tumor Core, Whole Tumor, and Enhancing Tumor are 0.82, 0.87, and 0.78, respectively. The average Fl-scores on the KMC-Manipal dataset for TC, WT, and ET are 0.74, 0.82, and 0.68 respectively.
ArXiv, 2020
In this paper, we try to investigate the presence of cues about the COVID-19 disease in the speec... more In this paper, we try to investigate the presence of cues about the COVID-19 disease in the speech data. We use an approach that is similar to speaker recognition. Each sentence is represented as super vectors of short term Mel filter bank features for each phoneme. These features are used to learn a two-class classifier to separate the COVID-19 speech from normal. Experiments on a small dataset collected from YouTube videos show that an SVM classifier on this dataset is able to achieve an accuracy of 88.6% and an F1-Score of 92.7%. Further investigation reveals that some phone classes, such as nasals, stops, and mid vowels can distinguish the two classes better than the others.
Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004
Principal Component Analysis for Online Handwritten Character Recognition Deepu V. Hewlett-Packar... more Principal Component Analysis for Online Handwritten Character Recognition Deepu V. Hewlett-Packard Labs Bangalore 560 030, India ... In this paper, Principal Component Analysis (PCA) is applied to the problem of Online Handwritten Character Recognition in the Tamil script. ...
Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 2005
This paper investigates the use of large TDOA feature vectors to-gether with acoustic information... more This paper investigates the use of large TDOA feature vectors to-gether with acoustic information in speaker diarization of meetings. TDOAs are obtained by considering all possible microphones pairs and this approach is compared with conventional TDOA features ex-tracted w.r.t. a reference channel. The study is carried using two systems, the first based on Gaussian Mixture Modeling and the sec-ond based on the Information Bottleneck approach. Results on NIST RT06/RT07/RT09 evaluation datasets show a large speaker error re-duction of 30 % relative going from 14.3 % to 10.8 % for the first and from 12.3 % to 8.2 % for the second whenever the feature weighting is properly handled. Furthermore results reveal that the IB system is more robust to different number of microphones even when all pairs large TDOA vectors are used thus outperforming the HMM/GMM by 25 % relative (8.2 % error compared to 10.8%). Index Terms — Speaker diarization, Time Delay Of Arrival fea-tures, Meetings Recordin...
Speaker diarization of meeting recordings is generally based on acoustic information ignoring tha... more Speaker diarization of meeting recordings is generally based on acoustic information ignoring that meetings are instances of con-versations. Several recent works have shown that the sequence of speakers in a conversation and their roles are related and statistically predictable. This paper proposes the use of speaker roles n-gram model to capture the conversation patterns probability and investi-gates its use as prior information into a state-of-the-art diarization system. Experiments are run on the AMI corpus annotated in terms of roles. The proposed technique reduces the diarization speaker error by 19 % when the roles are known and by 17 % when they are estimated. Furthermore the paper investigates how the n-gram models generalize to different settings like those from the Rich Transcription campaigns. Experiments on 17 meetings reveal that the speaker error can be reduced by 12 % also in this case thus the n-gram can generalize across corpora. Index Terms — Speaker diarization, m...
Abstract—A speaker diarization system based on an information theoretic framework is described. T... more Abstract—A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the Information Bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the trade-off between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore we benchmark the proposed system against a state-of-the-art...
This paper describes Lipi Toolkit (LipiTk)- a generic toolkit whose aim is to facilitate developm... more This paper describes Lipi Toolkit (LipiTk)- a generic toolkit whose aim is to facilitate development of online handwriting recognition engines for new scripts, and simplify integration of the resulting engines into real-world application contexts. The toolkit provides robust implementations of tools, algorithms, scripts and sample code necessary to support the activities of handwriting data collection and annotation, training and evaluation of recognizers, packaging of engines and their integration into pen-based applications. The toolkit is designed to be extended with new tools and algorithms to meet the requirements of specific scripts and applications. The toolkit attempts to satisfy the requirements of a diverse set of users, such as researchers, commercial technology providers, do-ityourself enthusiasts and application developers. In this paper we describe the first version of the toolkit which focuses on isolated online handwritten shape and character recognition.
2016 Computing in Cardiology Conference (CinC)
Uploads
Papers by Deepu Vijayasenan