A review and evaluation of elastic distance functions for time series clustering
Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure and those that derive ...
Visualizations for universal deep-feature representations: survey and taxonomy
In data science and content-based retrieval, we find many domain-specific techniques that employ a data processing pipeline with two fundamental steps. First, data entities are represented by some visualizations, while in the second step, the ...
Multilabel classification using crowdsourcing under budget constraints
Multilabel classification has excelled in several distinct fields during the past few decades but still has significant limitations. One of the critical concerns is the lack or insufficient availability of label instances, and data labelling also ...
Mixed membership distribution-free model
We consider the problem of community detection in overlapping weighted networks, where nodes can belong to multiple communities and edge weights can be finite real numbers. To model such complex networks, we propose a general framework—the mixed ...
Big data analytics enabled deep convolutional neural network for the diagnosis of cancer
Artificial intelligence (AI) has been shown to be a formidable instrument in managing Big Healthcare Data, and it has seen considerable success in bioinformatics. The advancement of big data in biological sciences has given rise to big data ...
Binarized spiking neural networks optimized with Nomadic People Optimization-based sentiment analysis for social product recommendation
Big data analytics is essential for many industries that use computing applications, like real-time purchasing and e-commerce. Big data is used to promote products and improve the communication among retailers and shoppers. At present, individuals ...
A new ontology-based similarity approach for measuring caching coverages provided by mediation systems
Most mediation systems use a caching policy in order to overcome their performance challenges. One of the most widely adopted strategies is known as semantic caching. Semantic caches are called so because they store the descriptions of all ...
Directed dynamic attribute graph anomaly detection based on evolved graph attention for blockchain
Blockchain is gradually becoming an important data storage platform for Internet digital copyright confirmation, electronic deposit, and data sharing. Anomaly detection on the blockchain has received extensive attention as the foundation for ...
Identifying influential nodes based on new layer metrics and layer weighting in multiplex networks
Identifying influential nodes in multiplex complex networks have a critical importance to implement in viral marketing and other real-world information diffusion applications. However, selecting suitable influential spreaders in multiplex networks ...
SS-WDRN: sparrow search optimization-based weighted dual recurrent network for software fault prediction
Predicting software faults at the primary stage is a challenging role for software engineers and tech industries. During the development of software projects, it is necessary to predict the number of probable faults to have occurred on software ...
Tuning parameters of Apache Spark with Gauss–Pareto-based multi-objective optimization
When there is a need to make an ultimate decision about the unique features of big data platforms, one should note that they have configurable parameters. Apache Spark is an open-source big data processing platform that can process real-time data, ...
Online influence maximization under continuous independent cascade model with node-edge-level feedback
We study the online influence maximization problem in social networks. We concentrate on solving two challenges in this paper. First, we work with continuousindependentcascademodel instead of independentcascademodel. In the independent cascade ...
SR-HetGNN: session-based recommendation with heterogeneous graph neural network
The session-based recommendation system aims to predict the user’s next click based on their previous session sequence. The current studies generally learn user preferences according to the transitions of items in the user’s session sequence. ...
Enumerating all multi-constrained s-t paths on temporal graph
In this paper, we propose the problem of multi-constrained s-t simple paths enumeration on temporal graph, which aims to list all temporal paths with the minimum timestamp in the given time interval. To solve this problem, a two-stage algorithm ...
Few-shot partial multi-label learning with synthetic features network
In partial multi-label learning (PML) problems, each training sample is partially annotated with a candidate label set, among which only a subset of labels are valid. The major hardship for PML is that its training procedure is prone to be misled ...
Generalized few-shot node classification: toward an uncertainty-based solution
For real-world graph data, the node class distribution is inherently imbalanced and long-tailed, which naturally leads to a few-shot learning scenario with limited nodes labeled for newly emerging classes. There are many carefully designed ...
Online log parsing using evolving research tree
Logs are a reliable source of information for development and maintenance purposes. They record information at runtime regarding the state of a system and are commonly used to analyze its behavior. Parsing operations on logs structure the ...
Cyclic Action Graphs for goal recognition problems with inaccurately initialised fluents
Goal recognisers attempt to infer an agent’s intentions from a sequence of observed actions. This is an important component of intelligent systems that aim to assist or thwart actors; however, there are many challenges to overcome. For example, ...
Adversarial enhanced attributed network embedding
Attributed network embedding aims to extract latent features of complex networks from structural topology and node attributes. Existing embedding models either use two separate learning processes to capture the complementarity of network topology ...
Next-generation antivirus for JavaScript malware detection based on dynamic features
- Sidney M. L. de Lima,
- Danilo M. Souza,
- Ricardo P. Pinheiro,
- Sthéfano H. M. T. Silva,
- Petrônio G. Lopes,
- Rafael D. T. de Lima,
- Jemerson R. de Oliveira,
- Thyago de A. Monteiro,
- Sérgio M. M. Fernandes,
- Edison de Q. Albuquerque,
- Washington W. A. da Silva,
- Wellington P. dos Santos
There are many kinds of Exploit Kits, each one being built with several vulnerabilities, but almost all of them are written in JavaScript. So, we created an antivirus, endowed with machine learning, expert in detecting JavaScript malware based on ...
Ontology-based soft computing and machine learning model for efficient retrieval
Unstructured and unorganized data always degrade the performance of search techniques and produce irrelevant results in response to the query as well as decrease the speed of retrieval results. Ontology in semantic web (SW) provides an adequate ...
Microaneurysms detection in fundus images using local Fourier transform and neighbourhood analysis
Microaneurysms, tiny, circular red dots that occur in retinal fundus images, are one of the earliest symptoms of diabetic retinopathy. Because microaneurysms are small and delicate, detecting them can be difficult. Their small size and cunning ...
Top-k approximate selection for typicality query results over spatio-textual data
Spatial keyword query is a classical query processing mode for spatio-textual data, which aims to provide users the spatio-textual objects with the highest spatial proximity and textual similarity to the given query. However, the top-k result ...
A semantics-enabled approach for personalised Data Lake exploration
The increasing availability of Big Data is changing the way data exploration for Business Intelligence is performed, due to the volume, velocity and uncontrolled variety of data on which exploration relies. In particular, data exploration is ...
Text-based paper-level classification procedure for non-traditional sciences using a machine learning approach
Science as a whole is organized into broad fields, and as a consequence, research, resources, students, etc., are also classified, assigned, or invited following a similar structure. Some fields have been established for centuries, and some others ...