��<HTML> <HEAD> <TITLE>LREC 2000 - Paper 105 summary</title> <SCRIPT LANGUAGE="JavaScript" TYPE="text/javascript"> <!-- // preload images: if(document.images) { hom_d= new Image(100,20); hom_d.src="../eikones/hom_d.gif"; pap_g=new Image(100,20); pap_g.src="../eikones/pap_g.gif"; pap_d=new Image(100,20); pap_d.src="../eikones/pap_d.gif"; pap_l=new Image(100,20); pap_l.src="../eikones/pap_l.gif"; hom_l=new Image(100,20); hom_l.src="../eikones/hom_l.gif"; aut_d=new Image(100,20); aut_d.src="../eikones/aut_d.gif"; aut_l=new Image(100,20); aut_l.src="../eikones/aut_l.gif"; Key_d=new Image(100,20); Key_d.src="../eikones/Key_d.gif"; Key_l=new Image(100,20); Key_l.src="../eikones/Key_l.gif"; ses_d=new Image(100,20); ses_d.src="../eikones/ses_d.gif"; ses_l=new Image(100,20); ses_l.src="../eikones/ses_l.gif"; abs_l=new Image(100,20); abs_l.src="../eikones/abs_l.gif"; abs_d=new Image(100,20); abs_d.src="../eikones/abs_d.gif"; aut_l=new Image(100,20); aut_l.src="../eikones/aut_l.gif"; } function changimg(imgName,imgObjName) { if (document.images) { document.images[imgName].src=eval(imgObjName+".src"); } } //--> </SCRIPT> </HEAD> <BODY marginwidth="0" marginheight="0" leftmargin="0" topmargin="0" rightmargin="0" background="../eikones/fonto.jpg"> <TABLE align="center" border="0" width="100%" cellspacing="0" cellpadding="0" > <TR> <TD height="50" valign="center" colspan="7" bgcolor="#003163"><font face="Arial" size="4" color="#ffffff"><b>LREC 2000</b> 2<sup>nd</sup> International Conference on Language Resources &amp; Evaluation</font></TD> </TR> <tr bgcolor="#003162"> <td width="100" valign="center"><A href="../../default.htm" onmouseout="changimg('home','hom_d')" onmouseover="changimg('home','hom_l')"><IMG border="0" height="20" name="home" src="../eikones/hom_d.gif" width="100"></A></td> <TD width="100"><A href="../session.htm" onmouseout="changimg('sessions','ses_d')" onmouseover="changimg('sessions','ses_l')"><IMG border="0" height="20" name="sessions" src="../eikones/ses_d.gif" width="100"></A></TD> <TD width="100"><A href="../paper.htm" onmouseout="changimg('papers','pap_d')" onmouseover="changimg('papers','pap_l')"><IMG border="0" height="20" name="papers" src="../eikones/pap_d.gif" width="100"></a></TD> <TD width="100"><A href="../abstract.htm" onmouseout="changimg('abstracts','abs_d')" onmouseover="changimg('abstracts','abs_l')"><IMG border="0" height="20" name="abstracts" src="../eikones/abs_d.gif" width="100"></A></TD> <TD width="100"><A href="../author.htm" onmouseout="changimg('authors','aut_d')" onmouseover="changimg('authors','aut_l')"><IMG border="0" height="20" name="authors" src="../eikones/aut_d.gif" width="100"></a></TD> <TD width="100"><A href="../keyword.htm" onmouseout="changimg('keywords','Key_d')" onmouseover="changimg('keywords','Key_l')"><IMG border="0" height="20" name="keywords" src="../eikones/Key_d.gif" width="100"></A></TD> <td width="1000">&nbsp;</td> </tr> </TABLE> <BLOCKQUOTE style="MARGIN-RIGHT: 0px"> <P><A href="104.htm">Previous Paper</A>&nbsp;&nbsp; <A href="106.htm">Next Paper</A></P></BLOCKQUOTE> <center> <TABLE width="95%" Align="center" Border="1" bordercolor="#669999" cellspacing="1"> <tr> <td width="15%" height="40"><b>Title</b></font></td> <td width="85%" height="40"><font color="#990033" size="4">A Web-based Text Corpora Development System</font></td> </tr> <tr> <td height="40"><b>Authors</b></td> <td height="40"><font color="#006600">Bohu_ Dan</font> (Politehnica University of Timisoara, Vasile Parvan 2, 1900 Timisoara, Romania, bd1206@cs.utt.ro)<br><font color="#006600">Boldea Marian</font> (Politehnica University of Timisoara, Vasile Parvan 2, 1900 Timisoara, Romania, boldea@cs.utt.ro)</td> </tr> <tr> <td height="40"><b>Keywords</b></td> <td height="40">Diacritic Characters Restoration, HTML-to-Text Conversion, Morpho-Syntactic Annotation, Part-of-Speech Tagging, Text Corpora</td> </tr> <tr> <td height="40"><b>Session</b></td> <td height="40">Session WP7 - Corpus Projects</td> </tr> <tr> <td height="40"><b>Full Paper</b></td> <td height="40"><a href="../../ps/105.ps" target="newps" type="application/postscript">105.ps</a>, <a href="../../pdf/105.pdf" target="newpdf" type="application/pdf">105.pdf</a></td> </tr> <tr> <td height="40"><b>Abstract</b></td> <td height="40">One of the most important starting points for any NLP endeavor is the construction of text corpora of appropriate size and quality. This paper presents a web-based text corpora development system which focuses both on the size and the quality of these corpora. The quantitative problem is solved by using the Internet as a practically limitless source of texts. To ensure a certain quality, we enrich the text with relevant information, to be fit for further use, by treating in an integrated manner the problems of morpho-syntactic annotation, lexical ambiguity resolution, and diacritic characters restoration. Although at this moment it is targeted at texts in Romanian, the system can be adapted to other languages, provided that some appropriate auxiliary resources are available.</td> </tr> </table><br> </center> </BODY> </html>