Academia.eduAcademia.edu
A preliminary report on the MathBrush pen-math system George Labahn, Scott MacLean, Mirette Marzouk, Ian Rutherford, David Tausky David R. Cheriton School of Computer Science University of Waterloo, Waterloo, Ontario, Canada e–mail: {glabahn,smaclean,msmarzouk,ijrutherford,datausky}@scg.uwaterloo.ca Abstract In this paper we give a preliminary description of an experimental system, currently named MathBrush, for working with mathematics using pen-based devices. The system allows a user to enter mathematical expressions with a pen and to then do mathematical computation using a computer algebra system. The system provides a simple and easy way for users to verify the correctness of their handwritten expressions and, if needed, to correct any errors in recognition. Choosing mathematical operations is done making use of context menus, both with input and output expressions. Key words: PC-tablets, Pen-based input, Maple 1 Introduction The fact that entering mathematics on a computer is problematic has long been known. To the mathematically trained person, writing the expression Z 3 x2 + 2 sin x3 + 2 x − 1 dx cos (x3 + 2 x − 1)   is much more natural than typing the latex form \int {\frac { \left( 3\,{x}^{2}+2 \right) \sin \left( {x}^{3}+2\,x-1 \right) } { \cos \left( {x}^{3}+2\,x-1 \right)}} ~ dx or the maple form Int((3*x^2+2)*sin(x^3+2*x-1)/cos(x^3+2*x-1),x); of the corresponding expression. Input for other Computer Algebra Systems (CAS) or for text processing systems provide other examples which use alternate formats and syntax to formulate a mathematical expression. While inputting mathematics is best done using classical handwriting it is also the case that handwriting is not well suited to actually doing the mathematics. For example, the Maple command representation for the above integral also allows for a simple mechanism for actually computing the integral itself using the value command : value( % ) = − ln(cos(x3 + 2 x − 1)). Here % is the Maple syntax for the previous expression. Output is also a concern since handwritten experimentation with pen and paper rarely encounters problems such as massive output expressions, a common occurrence when using CAS. Our project focuses on two components of pen-based mathematics. The first is to investigate the use of pen-based devices for mathematical computation and exploration while the second is to 1 study the key issues when combining pen-based interfaces with CAS. In order to address issues such as input and output of mathematical expressions, the editing and manipulation of such expressions and how to interact with CAS taking the advantage of digital ink and gesture and the power and features in available CAS we are building a pen-based math system MathBrush. The intend of MathBrush is not to have a commercial product. Rather the intend is to provide an environment for experimenting with the various components needed for doing mathematics with pen-based devices. These components include mathematical handwriting recognizers, computer algebra systems, editing and manipulation of expressions both at input and output levels and finally mathematical computation itself using pen-based devices. The state of the art for pen-math based systems currently focuses on mathematical handwriting recognition. Examples include work by Chan and Yeung [2], the systems Infty [12], Freehand Formula Entry System (FFES) [11, 15], and MathJournal from xThink Inc (www.xthink.com). MathJournal appears to be the first commercial system for inputting mathematics on a Tablet PC but is somewhat limited, particularly when it comes to mathematical capabilities. Indeed for the most part the above systems have very basic mathematical functionality. A different approach is that of MathPad 2 [3] from Brown University which attempts to convert diagrams into mathematical formulas. We remark that for any pen-based system for doing mathematics there are a number of challenges. On the level of input recognition, one faces the problem that the recognition of mathematical symbols is considerably more difficult than recognizing text input. Existing text recognizers can not be used for many reasons. For example they only work with a limited character set (alpha-numeric characters) while mathematical symbols have significantly more variation. In addition, text recognizers gain much of their strengths by depending on language specific dictionaries in order to validate combinations of input characters for generation of meaningful words, something not yet setup for mathematical expressions. A significant challenge comes at the level of construction of a valid mathematical expression. Text is by nature a one dimensional input, whereas mathematical expressions are by nature two-dimensional input. This requires a determination of proper baselines and subscript/superscript regions. In addition, even when a valid mathematical expression is successfully recognized, there is still the problem of ambiguous text (for example |2| looks very similar to 121 in handwritten text) and ambiguous expressions (for example u(x + y) has multiple mathematical meanings). Finally there are significant challenges at the level of display/rendering and editing. In this case problems include the need for proper line-breaking, along with additional needs for interactivity for both editing and manipulation of output expressions with a pen. The remainder of this paper is organized as follows. In Section 2 we give the main system characteristics used for the MathBrush system, while Sections 3 to 6 give descriptions of the five main system components for MathBrush. The Section 7 gives some discussion of future work with the final section giving our conclusions. 2 Main System Characteristics In this section we list a number of characteristics that are central for the design of the MathBrush system. This includes modular components, a standard interaction mechanism, ease of use, context menus and finally a logging mechanism. Central to the experimental aspect of our system are pluggable components. MathBrush has been designed so that it allows for replacing of current components with those more advanced (when available), comparing different versions of a given component, and isolating component problems. Our design carefully separates features and functionalities of different components 2 taking in mind the standard design and functionalities of similar modules. For example, in our system we have built a character recognizer to determine the symbols in an input math expression. This character recognizer is constructed to have a similar interface functions to those of the Microsoft Recognizer. The standard design guidelines [5] suggested are also taken into account. From our interface we give the user the option to select from a list of different available recognizers for comparison. System components are described in details in Section 5.1. The use of separate, independent modules mandates the use of a standard communication mechanism. We use MathML [1] as our standard for representation of mathematical expressions and for communication with the CAS. MathML is used because it is now becoming a mathematics standard and is supported by different CAS. In addition it has support in different web browsers. One central concern for pen-based math systems in general is ensuring ease of use for both entering and manipulating input and output of mathematical expressions. In our case, this means that the handwritten mathematics is easy to enter and requires the minimum interaction from the user to ensure that the recognition is correct. Gestures are used in a limited way and chosen to be similar to those that the user naturally uses when working with pen and paper. These gestures also need to be consistent with those from other well designed pen-based applications. These gestures include, for example, the scratch out to delete input characters and a right click to display context menus. Working in a pen-based environment leaves only a limited opportunity to operate with commands. This leaves very little chance to do any mathematics once a handwritten mathematical expression has actually been input into the system. We have chosen to make use of dynamic context menus as done in Maple (c.f. [6]) to operate with both input and output expressions. Context menus are easy to use with a pen (considered as a mouse) and without any need for a keyboard. Items in these context menus are generated to contain operations that are most likely to be useful on the given expression. The use of context menus allows a user to choose from a list of limited operations that apply to the current expression rather than choosing from the whole set of commands. The use of context menus is also helpful in the case of operating on multiple expressions or on part of an expression. Of course these dynamic context menus need to be user extensible. An important feature we are including in our system and which is not available using pen and paper is a mechanism for logging our mathematical manipulations. By this we mean keeping track of all the user’s actions while they work on a math sheet (the session that was started and is currently being worked on). Manipulations on mathematical expressions are done in-place. Users working on a math sheet (even on a piece of paper) often change expressions in certain places but do not necessarily reflect these changes on all dependent expressions. As such one can end up with an inconsistent document or a document that cannot be tracked in any way. We believe that giving the user the ability of replaying their actions is very important. At some point of time a user often asks the question : how did one end up having a particular math sheet. The logging mechanism is available to answer such a question. 3 System Components The five main system modules that make up MathBrush consist of a user interface, a character recognizer, a structural analyzer, a CAS interface tool and finally a mathematics rendering tool. These five system modules and their interdependencies are depicted in Figure (1). A general architecture for pen-based math systems can be found in [10]. The MathBrush user interface module receives the ink from the user, collects the user’s 3 Figure 1: System Components interactions and commands (via context menus), and ultimately renders the results back to the user. The interface module sends the collected ink to the character recognizer. The character recognizer detects different characters and generates a set of bounding boxes. For each box it generates a set of candidate characters and the recognition confidence associated with each candidate. It passes this information back to the interface module. The interface module displays the recognition results to the user and allows for correction of the results (in our case, by choosing from a drop down menu containing the alternative candidates). The interface module then passes the bounding boxes, their character candidates -after applying user’s correctionsto the structural analyzer. The structural analyzer processes the information from the character recognizer and constructs a well formed mathematical expression. Presentation MathML corresponding to this expression is then generated by the analyzer and is passed back to the interface module. The interface module sends the resulting MathML representation together with whatever operation specified by the user to the CAS interface tool. This tool is used to interact with the target CAS system and returns back the computed results generated from the CAS represented as a presentation MathML expression. The presentation MathML and the format defined by the user are sent to a MathML rendering tool, which generates a set of boxes and characters for the interface module to display. These system components are described in detail in the following sections. 4 Top Level User Interface The top level of MathBrush is the user interface where the user inputs handwritten mathematical expressions, corrects any mistakes, interacts with the CAS and plays with the resulting computation or expression. For input the user can write multiple expressions in the math sheet and has the ability to operate on one or more of such input expressions. After an ink expression has been entered the user can use context menus to recognize the expression. Optionally it is also possible that such recognition can be performed automatically after a pause or even after each stroke input. The user interface displays the recognition results in a separate, nearby window preserving the original relative location of the input ink strokes. This is used to make it easy for a user to decide on the characters that need to be corrected. Corrections, when needed, are accomplished by using a drop down menu of assorted alternate candidates. Figure (2) shows an example of the use of drop down menus to correct an error in character recognition. Once the user has the intended valid mathematical expression a context menu can be used to render the expression. Once an expression is rendered a context menu for output can be used 4 Figure 2: User Interface - Character Recognition Results Figure 3: User Interface - Expressions Context Menu to choose mathematical operations and have the chosen CAS perform the intended operations. Figure (3) shows an example of the use of context menus for operating on output. Results coming back from a CAS are rendered by the top level user interface. As is the case with context menus in Maple, operations available in the context menu depend on the expression itself. For example the context menu generated for a polynomial contains such operations as factoring while the context menu for an equation contains operations such as solving or isolation of terms. Again as in the case of context menus in Maple, operations are extensible so that users can add convenient operations for specific types of output expressions. Of course, the user can also copy any ink or rendered expression from one place in the math sheet and paste it elsewhere. The recognition corrections and context menu changes will be preserved with such operations. Finally, the interface also gives the user the ability to tailor their environment to their own preferences. Such preferences include specification of CAS, recognition strategy (after each stroke, after a pause or when user requests), various formats for the input ink and of the rendered results, and so on. 5 Figure 4: Character Recognizer 5 Character Recognizer and Structural Analyzer of Input The mechanism for recognizing handwritten mathematics and its conversion to a valid mathematical expression is handled through the character recognizer and then the structural analyzer. In this section we give a brief description of these two components as currently used in the MathBrush system. 5.1 Character Recognizer The character recognizer [4] used in MathBrush is an implementation of existing methods found in the literature. We chose to create such a recognizer for practical purposes, since this allowed us to both experiment and to have full control of this process. On the other hand there was no obvious candidate that was available and which could full fill our requirements. The character recognizer module receives the ink from the interface module and generates, as most character recognizers do, a set of bounding boxes, with each box including a set of candidate characters along with their recognition confidence values. As shown in Figure (4), the character recognizer uses a symbols database that contains samples for each symbol. The character recognizer involves three phases: stoke preprocessing, segmentation and finally matching. The preprocessing of strokes include (a) Stroke joining - where broken strokes due to hardware or user’s hesitation are joined (b) Re-sampling - where input points are resampled in a way that preserves end- and cusp-points (c) Trimming - where end points of a stroke are trimmed if they exhibit high curvature (d) Smoothing - to prevent jitter caused by hardware or user hesitation and finally (e) Normalization - where we normalize input in order to preserve aspect ratio. The second phase is segmentation where the input is broken into distinct stroke groups. Segmentation is done by first estimating the likely number of strokes which make up the input symbol. This is then followed by a process of feature extraction. The estimation of stoke numbers includes determining proximity of strokes, that is how many strokes in a row are intersecting or are close enough together that they are likely part of the same symbol, and stacking of strokes, which determines where groups of strokes that appear to be stacked up on each other vertically (as in +/- or ”equivalent to”). Feature extraction looks at information such as : width, height, angle between end points and width to height ratio which are all extracted from the input strokes. Every group of strokes is weighed by comparing its features to the features from the database. Both processes together generate a ranking of candidates. A confusion matrix is used to eliminate the possibility of early reporting of symbols with few strokes included in other symbols. For example, this helps to prevent the reporting of F,- or L,= instead of a correct E. 6 Once the stroke preprocessing and segmentation phases have been completed the characters are ready to be compared to a database of symbols. The recognition phase combines a number of matching procedures to obtain a final score or confidence value. These procedures include basic elastic matching, deformable template matching and finally structural chain code matching. Elastic matching involves minimizing a distance measure between two stroke groups (we follow a basic algorithm presented in [13]) with the distance measure including information about both point-to-point and tangent vector comparisons. Following [9], a weighted measure is also included where points which lend a symbol its characteristic shape are weighted higher than points which may be present in any symbol. The final steps in the recognition phase involves deformable template matching [7] and structural chain code matching [2]. Deformable template matching converts a model to match the input with costs associated with each deformation operation. Model points are assigned a circular Gaussian distribution from which the probability of the model matching the input can be determined. Costs themselves are assigned for moving model points, model points lying on white space, and how well the model matches the input. Structural chain code matching is a process of breaking an input into intervals with an assignment of a numeric code to each interval based on the stroke direction in that interval. The sequence of these codes extracted from the input is called a ”chain code”. A process similar to elastic matching is then used to determine which model’s chain code most closely matches that of the input. 5.2 Structural Analyzer A structural analysis of an input expression involves the process of converting a set of symbols into a mathematical expression. The mathematical expression can then be passed to a CAS for evaluation or computation. The structural analyzer used in MathBrush is described in [8]. In this subsection we give a flavour of some of the techniques used in the analyzer. Input for the analyzer is a set of bounding boxes along with a set of candidates for each box ranked by their recognition confidence values. The analyzer makes use of several additional pieces of information about these character candidates. This additional information is stored in a symbol database and includes, for each character candidate, a unique structural type which defines the symbol’s expected positioning on a baseline and one or more possible semantic types, which define grammar rules to be applied to a symbol during expression parsing. There are 14 different structural types. These types are used to determine how well a character candidate fits into a potential baseline. Semantic types are used to determine the grammatical structure of an expression. There are 17 different semantic types, arranged into 4 groups: brackets, operators, self-contained operators, and operands. As shown in Figure (5) the structural analyzer uses the character data and the symbols information in the database to generate a valid math expression. The resulting expression is then converted to presentation MathML and sent back to the interface module for rendering. Our structural analysis framework [8] consists of four consecutive phases: determination of layout, pre-parsing, structural grouping and final parsing. The phase that determines the layout uses only the bounding box information to determine the relative symbols locations in order to construct an initial baseline tree. See Figure (6). Pre-parsing makes use of expected mathematical content to refine and correct some character candidates. Examples of some of these refinements include the matching of brackets, integral matching with differential symbols, and determination of function names and numbers (c.f. Figure (7)). At present the pre-parsing step is only used when the user wishes to view the expression without first verifying the correctness of individual characters. Structural grouping finds baselines in an expression and refines character candidates by 7 Figure 5: Structural Analyzer Figure 6: Baseline Tree fitting characters onto these baselines. This is the main phase in the structural analysis. The grouping process itself works by first estimating a baseline position based on the candidates and using information from the symbol database. This in turn generates a structural confidence for each candidate which, when combined with character recognizer confidence values is then used to update the candidates list and their probabilities. This process is repeated until the candidate list stabilizes. The final parsing phase involves a final set of refinements making use of a database of likely expressions in order to select the best result. Examples are in shown Figure (8). 6 CAS Interface and Rendering Tools The final two components of MathBrush involve the interaction with a CAS and a final rendering of the final mathematical expressions. The CAS interface tool depicted in Figure (9) receives the presentation MathML of a math expression from the interface module together with the operation the user wants to perform on that expression and the CAS system the user wants to use. This module forms the appropriate command and sends it to the CAS. It passes the presentation MathML coming back from the CAS to the interface module for rendering. Currently the CAS interface tool supports interaction with Maple and can be easily extended to work with any CAS that supports MathML. The Math rendering tool shown in Figure (10) takes as its input a presentation MathML expression passed from the interface module together with any user requested formats and provides instructions on rendering the expression in the user interface. The tool itself generates a set of boxes with each of the characters to be displayed in that box for the interface to display. A database that stores the operator and external entities dictionary as defined by the W3C 8 Figure 7: Pre-Parsing Examples Figure 8: Parsing Examples consortium [1] is accessed during rendering. The input MathML is parsed to build an expression tree which is then processed to generate the boxes. A line breaking algorithm has been implement with the strategy of trying to keep a sub-tree together. For example, it tries to fit a fraction in a line (preferably the current line) and if not then it displays it as nominator/dominator). The rendering algorithm also takes care of stretching certain operators (for example, brackets, square roots, integral and summation operators) in order to fit multi-lined operands if needed. The bounding boxes generated use the font and output format provided by the user. 7 Extensions and Future Work One of the goals of the MathBrush system is to allow use with a wide variety of mathematical computation systems. This design results in the replacement of the interfaces of such systems. A significant disadvantage of such an approach is that one cannot take advantage of the features that are present in todays CAS interfaces. Instead important features need to be implemented separately, resulting in situations where existing work is repeated. This is the case for our use of context menus, for example, which have been well implemented in Maple for a number of versions now. An alternative to the need for replacing existing interfaces is given in [10] where pen-based windows are proposed which fit into the interfaces of CAS and even text processing systems such as Microsoft Word. However, as mentioned previously, in our case we are less interested in creating a complete interface with multiple features, but rather in the creation of a simple system that allows us to investigate the effectiveness of various techniques for mathematical explorations with pen-based systems and CAS. 9 Figure 9: The CAS Interface Tool Figure 10: The Rendering Tool Work is underway to further improve the MathBrush system. For example, there are still many classes of expressions not yet recognized by the system, including for example limits and matrices. The latter in particular offers a significant set of challenges, both in recognition of such objects and in manipulating the elements inside them. We also plan to investigate the use of guided input such as ruled lines in order to help the character recognizer to distinguish between such things as lower and upper case letters and super- and sub-scripts. Such guided input will also help to establish first approximations of baselines for improved performance for the structural analyzer. Additional improvements include a personalized symbol database and the training of our structural recognizer. Tablet PCs tend to have a single primary user and as such character recognition will be significantly improved by personalizing the character recognizer database. Currently the symbols database used by the character recognizer contains different samples for each character to be used by default. We expect significant improvement when individual writing characteristics are taken into consideration for individual characters. In addition, the design of the structural analyzer contains many parameters. We expect to do further experiments with MathBrush on the training of these parameters in order to obtain optimal values. The use of the system has shown that the user most likely corrects the recognition results before rendering. This makes the job of the structural analyzer easier and it makes it better to focus on correcting other structural errors. In parallel to the above work, it is also our plan to improve recognition accuracy by using a different approach to the recognition problem. We plan to investigate the replacement of our recognizer/analyzer components with a single entity which is based on the use of graphical probabilistic models, in particular Bayesian networks or their extensions. These allow the recognizer to reason naturally like a human, providing the most reasonable guesses within any context. As these models are intuitive, they allow us to improve them or to extend them easily. These models can also be trained easily, which allows them to adapt to individual users. Currently our focus is also on investigation of editing and manipulation of output expressions making use of a pen. Such actions are typically done with the use of pen and paper and 10 are natural for systems such as MathBrush. We expect to continue with doing such operations in-place. We also expect that editing and manipulation requirements will require alternate representations for our rendered expressions. Having full control over how our output expression is rendered gives an additional reason why we have preferred to work with an interface which is independent of current in CAS interfaces. 8 Conclusions MathBrush is a system for allowing users to experiment with mathematical computation combining pen-based systems along with CAS. It is designed for experimentation, allowing for replacement of important components such as the character recognizer or structural analyzer (or both in case of recognition/analyzers that involve feedback loops), interchange one CAS with a different CAS, and so on. It has been constructed in order to allow actual users to make use of such a pen-based system. The availability of a system such as MathBrush allows for investigation of many issues related to the user interface of pen-math systems. Such issues include how (and when) to provide recognition results; how to display results in a mathsheet; methods for allowing users to choose mathematical operations for expressions; relevant ways, along with associated gestures, for editing and manipulation; other functionalities that the user might need but which is not available with a simple use of pen and paper. All these issues will need to be considered for any pen-math interface which hope to gain wide acceptance with users. References [1] D. Carlisle, P. Ion, N. Poppelier, R. Miner (editors), R. Ausbrooks, S. Buswell, S. Dalmas, S. Devitt, A. Diaz, R. Hunter, B. Smith, N. Soiffer, R. Sutor, S. Watt. Mathematical Markup Language (MathML) Version 2.0, W3C Recommendation (2001), http://www.w3.org/TR/2001/REC-MATHML2-20010221. [2] K-F. Chain and D-Y Yeung, Recognizing on-line handwritten alphanumeric characters through flexible structural matching Pattern Recognition, 32(7), pp. 1099-1114 (1999) [3] J.J. LaViola Jr and R.C. Zeleznik, MathPad 2 : A system for the creation and exploration of Mathematical sketches. ACM Transactions on Graphics. Special Issue: Proceedings of 2004 SIGGRAPH 432-440 (2004) [4] S. MacLean, The MathBrush character recognizer, Internal report for Symbolic Computation Group, 20 pages, (2005) [5] Microsoft Recognizer Guidelines, http://msdn.microsoft.com/library/default.asp?url=/library/enus/tpcsdk10/lonestar/appendix/tbconcustomrecognizer.asp [6] M. Monagan, K.O. Geddes, K.M. Heal, G. Labahn, S.M. Vorkoetter, J. McCarron and P. DeMarco, Maple Advanced Programming Guide (2005) [7] M. Revow, C. Williams and G. Hinton, Using generative models for handwritten digit recognition, IEEE Transactions Pattern Analysis and Machine Intelligence 18(6), pp. 592-606 (1996) [8] I. Rutherford, Structural Analysis for Pen-Based Math Input Systems, MMath Thesis, School of Computer Science, University of Waterloo, Waterloo, Canada. (2005) [9] P. Scattolin, Recognition of Handwritten Numerals Using Elastic Matching. Master’s thesis, Computer Science Department, Concordia University Montreal (1993) [10] E. Smirnova and S.M. Watt, A Context for Pen-Based Mathematical Computing, Proceedings of the 2005 Maple Summer Workshop, pp. 409-422. (2005) 11 [11] S. Smithies, Freehand Formula Entry System, Master’s thesis, University of Otago, Dunedin, New Zealand (1999). [12] M. Suzuki, F. Tamari, R. Fukuda, S. Uchida, T. Kanahori, Infty- an integrated OCR system for mathematical documents, Proceedings of ACM Symposium on Document Engineering 2003, Grenoble, Ed. C.Vanoirbeek, C.Roisin, E. Munson, pp.95-104 (2003) [13] C.C. Tappert, Cursive Script Recognition by Elastic Matching, IBM Journal of Research and development 26(6), pp. 765-771 (1982) [14] Bo Wan and S.M. Watt, An Interactive Mathematical Handwriting Recognizer for the Pocket PC, Proc. International Conf. on MathML and Math on the Web (MathML 2002), June 28-30 2002, Chicago USA. [15] R. Zanibbi, D. Blostein and J.R. Cordy, Aiding manipulation of handwriting mathematical expressions through style preserving morphs, Graphics Interface 2001, pp 127-134 (2001) 12