Authors:
Tome Eftimov
1
and
Barbara Korousicg Seljak
2
Affiliations:
1
Jožef Stefan Institute and Jozef Stefan International Postgraduate School, Slovenia
;
2
Jožef Stefan Institute, Slovenia
Keyword(s):
Part of Speech Tagging, Probability Model, Information Retrieval, Food Composition Databases, Ingredient Matching.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Mining Text and Semi-Structured Data
;
Pre-Processing and Post-Processing for Data Mining
;
Symbolic Systems
Abstract:
In this paper, we present a new method that can be used for matching recipe ingredients extracted from the
Internet to nutritional data from food composition databases (FCDBs). The method uses part of speech tagging
(POS tagging) to capture the information from the names of the ingredients and the names of the food analyses
from FCDBs. Then, probability weighted model is presented, which takes into account the information from
POS tagging to assign the weight on each match and the match with the highest weight is used as the most
relevant one and can be used for further analyses. We evaluated our method using a collection of 721 lunch
recipes, from which we extracted 1,615 different ingredients and the result showed that our method can match
91.82% of the ingredients with the FCDB.