Newsletter Downloads
Mining big data: current status, and forecast to the future
Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large ...
Scaling big data mining infrastructure: the twitter experience
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this paper, we discuss the evolution of our infrastructure and the development of ...
Mining heterogeneous information networks: a structural analysis approach
Most objects and data in the real world are of multiple types, interconnected, forming complex, heterogeneous but often semi-structured information networks. However, most network science researchers are focused on homogeneous networks, without ...
Big graph mining: algorithms and discoveries
How do we find patterns and anomalies in very large graphs with billions of nodes and edges? How to mine such big graphs efficiently? Big graphs are everywhere, ranging from social networks and mobile call networks to biological networks and the World ...
Mining large streams of user data for personalized recommendations
The Netflix Prize put the spotlight on the use of data mining and machine learning methods for predicting user preferences. Many lessons came out of the competition. But since then, Recommender Systems have evolved. This evolution has been driven by the ...
Outlier ensembles: position paper
Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and ...
Studying the source code of scientific research
Just as inspecting the source code of programs tells us a lot about the process of programming, inspecting the "source code" of scientific papers informs on the process of scientific writing. We report on our study of the source of tens of thousands of ...
Discovering interesting information with advances in web technology
The Web is a steadily evolving resource comprising much more than mere HTML pages. With its ever-growing data sources in a variety of formats, it provides great potential for knowledge discovery. In this article, we shed light on some interesting ...