Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2023
A hierarchical outlier detection method for spare parts transaction
International Journal of Space-Based and Situated Computing (IJSSC), Volume 9, Issue 3Pages 173–181https://doi.org/10.1504/ijssc.2023.133245In nuclear power production, seamless equipment maintenance is integral, significantly achieved through spare parts transactions. Yet, abnormal transaction data can obstruct operational efficiency. Traditional anomaly detection methods, often subjective ...
- research-articleJanuary 2019
A Review on Data Cleansing Methods for Big Data
Procedia Computer Science (PROCS), Volume 161, Issue CPages 731–738https://doi.org/10.1016/j.procs.2019.11.177AbstractMassive amounts of data are available for the organization which will influence their business decision. Data collected from the various resources are dirty and this will affect the accuracy of prediction result. Data cleansing offers a better ...
- research-articleMay 2018
DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataPages 1793–1796https://doi.org/10.1145/3183713.3193544We showcase the first semantic data profiler, DataProf. For the constraint class of interest, current profilers compute all constraints that hold on the given data set. DataProf also computes perfect sample records that together satisfy the same ...
- research-articleDecember 2017
Detection of Relation Assertion Errors in Knowledge Graphs
K-CAP '17: Proceedings of the 9th Knowledge Capture ConferenceArticle No.: 22, Pages 1–8https://doi.org/10.1145/3148011.3148033Although the link prediction problem, where missing relation assertions are predicted, has been widely researched, error detection did not receive as much attention. In this paper, we investigate the problem of error detection in relation assertions of ...
- short-paperNovember 2017
Learning Biological Sequence Types Using the Literature
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementPages 1991–1994https://doi.org/10.1145/3132847.3133051We explore in this paper automatic biological sequence type classification for records in biological sequence databases. The sequence type attribute provides important information about the nature of a sequence represented in a record, and is often used ...
-
- research-articleAugust 2017
Toward Mining Stop-by Behaviors in Indoor Space
ACM Transactions on Spatial Algorithms and Systems (TSAS), Volume 3, Issue 2Article No.: 7, Pages 1–38https://doi.org/10.1145/3106736In this article, we explore a new mining paradigm, called Indoor Stop-by Patterns (ISP), to discover user stop-by behavior in mall-like indoor environments. The discovery of ISPs enables new marketing collaborations, such as a joint coupon promotion, ...
- research-articleJuly 2017
Genetic improvement of computational biology software
GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference CompanionPages 1657–1660https://doi.org/10.1145/3067695.3082540There is a cultural divide between computer scientists and biologists that needs to be addressed. The two disciplines used to be quite unrelated but many new research areas have arisen from their synergy. We selectively review two multi-disciplinary ...
- short-paperOctober 2016
Cleansing indoor RFID data using regular expressions
SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsArticle No.: 77, Pages 1–4https://doi.org/10.1145/2996913.2996979RFID (Radio Frequency Identification)-based object tracking is increasingly deployed and used in indoor environments such as airports, shopping malls, etc. However, the inherent noise in the raw RFID data makes it difficult to support queries and ...
- research-articleJune 2016
Learning-Based Cleansing for Indoor RFID Data
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataPages 925–936https://doi.org/10.1145/2882903.2882907RFID is widely used for object tracking in indoor environments, e.g., airport baggage tracking. Analyzing RFID data offers insight into the underlying tracking systems as well as the associated business processes. However, the inherent uncertainty in ...
- research-articleAugust 2015
Detecting near-duplicate text documents with a hybrid approach
Journal of Information Science (JIPP), Volume 41, Issue 4Pages 405–414https://doi.org/10.1177/0165551515577912Near duplicate data not only increase the cost of information processing in big data, but also increase decision time. Therefore, detecting and eliminating nearly identical information is vital to enhance overall business decisions. To identify near-...
- research-articleSeptember 2014
TimeCleanser: a visual analytics approach for data cleansing of time-oriented data
- Theresia Gschwandtner,
- Wolfgang Aigner,
- Silvia Miksch,
- Johannes Gärtner,
- Simone Kriglstein,
- Margit Pohl,
- Nik Suchy
i-KNOW '14: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven BusinessArticle No.: 18, Pages 1–8https://doi.org/10.1145/2637748.2638423Poor data quality leads to unreliable results of any kind of data processing and has profound economic impact. Although there are tools to help users with the task of data cleansing, support for dealing with the specifics of time-oriented data is rather ...
- ArticleJune 2014
Planning meets data cleansing
One of the motivations for research in data quality is to automatically identify cleansing activities, namely a sequence of actions able to cleanse a dirty dataset, which today are often developed manually by domain-experts. Here we explore the idea ...
- ArticleDecember 2013
kDMI: A Novel Method for Missing Values Imputation Using Two Levels of Horizontal Partitioning in a Data set
ADMA 2013: Part II of the Proceedings of the 9th International Conference on Advanced Data Mining and Applications - Volume 8347Pages 250–263https://doi.org/10.1007/978-3-642-53917-6_23Imputation of missing values is an important data mining task for improving the quality of data mining results. The imputation based on similar records is generally more accurate than the imputation based on all records of a data set. Therefore, in this ...
- research-articleNovember 2013
A graph model for false negative handling in indoor RFID tracking data
SIGSPATIAL'13: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsPages 464–467https://doi.org/10.1145/2525314.2525461The Radio Frequency Identification (RFID) emerges to be one of the key technologies to modernize object tracking and monitoring systems in indoor environments, e.g., airport baggage tracking. Although RFID has advantages over alternative identification ...
- ArticleOctober 2013
Using Domain Knowledge in Initial Stages of Knowledge Discovery in Databases
Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology - Volume 8171Pages 1–6https://doi.org/10.1007/978-3-642-41299-8_1In this tutorial the topic of data preparation for Knowledge Discovery in Databases KDD is discussed on rather general level, with just few detailed descriptions of particular data processing steps. The general ideas are illustrated with application ...
- research-articleSeptember 2012
DBpedia ontology enrichment for inconsistency detection
I-SEMANTICS '12: Proceedings of the 8th International Conference on Semantic SystemsPages 33–40https://doi.org/10.1145/2362499.2362505In recent years the Web of Data experiences an extraordinary development: an increasing amount of Linked Data is available on the World Wide Web (WWW) and new use cases are emerging continually. However, the provided data is only valuable if it is ...
- posterMay 2012
Redeeming pedigree data with an interactive error cleaning visualisation
AVI '12: Proceedings of the International Working Conference on Advanced Visual InterfacesPages 741–744https://doi.org/10.1145/2254556.2254698We describe a visual data cleansing application for pedigree genotype data, which is used to redeem otherwise unusable pedigree data sets. Biologists and bioinformaticians dynamically and iteratively mask pieces of information from a dirty data set and ...
- ArticleApril 2012
In silico infection of the human genome
EvoBIO'12: Proceedings of the 10th European conference on Evolutionary Computation, Machine Learning and Data Mining in BioinformaticsPages 245–249https://doi.org/10.1007/978-3-642-29066-4_22The human genetic sequence database contains DNA sequences very like those of mycoplasma bacteria. It appears such bacteria infect not only molecular Biology laboratories but their genes were picked up from contaminated samples and inserted into GenBank ...
- research-articleDecember 2011
A decision tree-based missing value imputation technique for data pre-processing
Data pre-processing plays a vital role in data mining for ensuring good quality of data. In general data pre-processing tasks include imputation of missing values, identification of outliers, smoothening out of noisy data and correction of inconsistent ...