Tomislav Stojanov
Marie Curie-Sklodowska Research Fellow at the University of Nottingham.
PhD (2015) at the Faculty of Humanities and Social Sciences, University of Zagreb.
Guest researcher in the Centre for Corpus Linguistics at University of Birmingham (UK) in 2004, Fryske Akademy and Instituut voor Nederlandse Lexicologie (The Netherlands) in 2015, and Institute of German Language in Mannheim in 2016.
Dean's Reward of University of Zagreb (1995) and Ivan Filipović Reward of Ministry of science, education and sport (2013).
PhD (2015) at the Faculty of Humanities and Social Sciences, University of Zagreb.
Guest researcher in the Centre for Corpus Linguistics at University of Birmingham (UK) in 2004, Fryske Akademy and Instituut voor Nederlandse Lexicologie (The Netherlands) in 2015, and Institute of German Language in Mannheim in 2016.
Dean's Reward of University of Zagreb (1995) and Ivan Filipović Reward of Ministry of science, education and sport (2013).
less
InterestsView All (7)
Uploads
PhD Thesis
Due to deficiency of similar analyses of Croatian, the dissertation contains two targeted surveys of the average orthographic literacy (526 Croatian polytechnic students) and of the opinion on Croatian orthographic standardization and language policy (2000 members of the Croatian academic society).
The contemporary history of the Croatian orthography standard and the language policy situation in Croatia today are described. 25 instances of orthographic misunderstandings, prejudice, and misconceptions about literacy in Croatian are listed and analyzed. A bibliometric investigation of Croatian orthography manuals from 1639 till 2014 is provided and orthography development phases are established.
The linguistic e-literacy and the broader context of language and computers are described. The traditional linguistic division into langue and speech has been changed into the triad of speech, writing and meaning, pursuant to which the linguographic description of Croatian developed.
Investigating the context of the Croatian language with regard to computing, linguistic e literacy and the extent to which language is adapting to the information society, the dissertation illuminates the need for a green book on the development of literacy, which would give rise to a development strategy for standard Croatian with a defined language policy and measurable success criteria.
The standardization of language has to take into account contemporary writing and literacy and include the computational linguistics and digital media aspects into the language policy.
Popular Science papers
My response is 9000 characters long and was published a week later in the Jutarnji list, a daily newspaper with national coverage in Croatia.
Jozić's reaction was here: https://www.jutarnji.hr/vijesti/hrvatska/problemi-s-anketnim-ispitivanjem-dr-stojanova-sto-zna-500-ispitanika-a-ne-zna-4-milijuna-gradana-15408346
An online version is behind the paywall and with a changed title: "In Croatia, 12% of respondents believe that Croats and Serbs speak the same language, and the percentage in Serbia will shock you"
A newspaper article/essay published in the daily newspaper "Jutarnji list" on July 9, 2022. It contains some critical considerations of the draft of the Higher Education and Scientific Activity Act in the Republic of Croatia.
Research Papers
It resulted in 11 various quotation mark pairs, of which six are hapax legomena, and the remaining five of which are present in modern Croatian orthographic handbooks. Although many consider „quotation marks” traditional Croatian quotation mark forms, they are only present after Boranić (1930), who ended 150 years of the continuous use of „quotation marks“ in Croatian orthographic books. As opposed to the first quotation marks, which appeared in Šilobod's Aritmetika (1758), single quotation marks came much later with Kušar (1889). Eight single quotation mark pairs were found, of which two are hapax legomena, with six total meanings.
Twenty-one meanings of quotation marks are described and categorized, of which eighteen are used in Croatian orthographic books from Kratki navuk and Uputjenje (both from 1779) to the Institute of Croatian Language and Linguistics' 2013 Hrvatski pravopis. Croatian orthographic books describe rules for eleven of them in a number of meanings ranging from four (Tutavac and Anić-Silić) to ten (Cipra-Klaić).
Quotation marks are examined from three research perspectives: the orthographic and sociolinguistic perspective, the linguographic and computational perspective, as well as
the terminological perspective.
Of the thirty characters in five punctuation subcategories with the feature of a quotation mark in the Unicode system, fifteen of them are Latinic (8 quotation marks and 7 single quotation marks). Croatian orthographic books use six of eight quotation marks („ “ » « ” " plus two graphemes that do not exist in Unicode) and all seven single quotation marks (‚ ‛ ’ ‘ ' › ‹ plus one other non-standardized grapheme).
Two models of nomenclature for the terminological norming of all existing quotation marks are suggested (not only for signs that have been used or are still used in the Croatian language): one that is founded in a graphic, graphemic description, and one that is founded in terminological transparency.
In place of a discussion on the choice of graphemes in the Croatian linguistic norm, all relevant quotation marks and single quotation marks are evaluated by seven criteria (orthographic
tradition and continuity, frequency, transparency, legibility, typographic aesthetics, computational acceptance, and distinctiveness), and three normative models are suggested
for the Croatian graphemic standard for quotation marks.
Although a very broad categorization can be established for European orthographic methodologies, six methodological perspectives of orthographic standardization stand out for the present status of the Croatian language with regard to standardization (initiative, authority, acceptance, engagement scope, establishing standard model and authorship), which are insufficiently discussed in domestic literature.
It is stressed that the regulation of orthographic policy by means of laws has a positive impact on the stability of the orthographic standard and that it is not possible to implement high-quality orthographic standardization without a language
authority in the community.
It is concluded that the establishment of a regulatory center and the creation of fundamental documents on orthographic planning (the green and the white book, development strategy, orthography dispute resolution, and other) could have a crucial
impact on further successful development of Croatian orthographic standardization methodology.
CroMo morphological analyzer for Croatian. It was designed as a monolithic finite state based morphological segmentation and annotation tool, that emits feature bundles for all recognized morphemes and sub-morphemes, and generates lemmata for the lexical root and complex compounds in one swoop. Its linguistic base uses morphological and morphotactic regularities only, and is easily extensible. The development
and potential expansions of the lexical base is and can be done in a short amount of time, generating platform independent and extremely efficient code and binaries based exclusively on open source tools. We approach the (linguistic) interoperability problem utilizing a GOLD-based (Ontology-oriented) annotation schema for uniquely mappable linguistic terminology in the annotation output. CroMo was developed to provide
an initial morphological and morpho-syntactic annotation and lemmatization for the Croatian Language Corpus, but can be applied to other similar languages.
The terminology of scientific and technical chemistry is taken as a sample and prototype due to its relevance, complexity and specificity. Semi-automatic method of term extraction from digitalized university textbook Osnove analitičke kemije (Skoog 1999), filtration, normalization, and tokenization have been performed. Advantages and disadvantages of this method have been shown. Answers to linguistic issues in chemical terminology, which are result of consensus among chemists and linguists, have been suggested.
Distinction between complementation and adjunctivization relations, or rather exocentrism and endocentrism during the process of defining the dependency principles, is emphasized.
The tagmeme is defined as the basic syntactic unit for the government and agreement principle while the relationship between constructor and functor is established on the syntactic level of description.
Syntactic redundancy explains the relationship between complement and adjunct as the important aspect of syntagmeme relations.
The operating function principle establishes tagmemes with specific syntactic function in formation of question, negation, affirmation, intensity, motivation, and imperative verbal mood.
The analysis of verbal phrase explains the relationship between different positions of verbs and morphosyntactic and syntactic forms.
Aside from two new Unicode dash characters (the two-em dash and three-em dash, Unicode 6.1, January 2012) having been standardized in the meantime, differing methodology and a comparison of the linguistic-historical and computational linguistic aspects have spread awareness of dash characters in the Croatian language as described in Portada-Stojanov (2009). A categorization is presented that is sensitive to the dichotomy of graphic representation and meaning that divides all dash characters into five hierarchical levels. Among the 44 Unicode horizontal and unbroken dash characters, a division into type, time, functionality, direction, and line height has resulted in 11 contemporary Latin alphabetic horizontal central characters, among which each language written in the Latin alphabet chooses its own. The semantic value and usage of all Unicode dash graphemes has been described.
On the other hand, the paper also described dash characters from the perspective of Croatian historical linguistics and orthography. In comparison to the rich repository of standardized Unicode dash characters, it has been shown that orthographic standards are significantly reductive. Orthographic norming of dash characters is divided into two periods and three groups, depending on their graphemic form (the first and second generation of orthography manuals) and terminology (the pre-standard phase and the two standard norming schools, depending on the acceptance of the terminological pairs “spojnica – crtica” and “crtica – crta”).
The historical linguistic and computational linguistic comparative research and the contrastive analysis of the Unicode standardization of dash characters with traditional orthographic descriptions of dash characters was intended to highlight (i) the need for a broader, interdisciplinary approach to describing written linguistic practice, (ii) the insufficiency of descriptions in primary and secondary school orthography manuals for modern writing, and (iii) the insufficiency of the existing Croatian codification of both terminological schools. In order for orthography manuals to be called scholarly, it is claimed that computer writing should be better described, and that a differentiation between characters and graphemes should be introduced on the level of punctuation. One of the areas in which orthography manuals could bring themselves technologically up to date is the issue of the writing of compound words at the beginning of a broken line, and the paper provides eight reasons to abandon the current tradition.
Analysis has shown that it would be justified to base dash codification on three or four characters, which reduces the 11 Latin Unicode characters to basic groups of dashes – the short, medium, long, and very long dashes, referred to as c1, c2, c3 and c4.
The computational procedures of information retrieval and n-gram SQL/regex queries will be shown in order to extract token co-frequencies and reveal phrases, collocations and more constant syntagmemes. The JavaScript wiring library WireIt is used for a token frequencies visualization in browser.
We have compared the output with Google search results based on which we have pointed out seven Google search shortcomings for linguistic investigations and have concluded that our approach could produce unique results in linguistic research.
The methodological possibilities are emphasized for studying anaphoric references through semantic-discourse
and syntactic-clause framework, referring to the distinction of anaphor and anaphora (like Crystal [1997a] does).
By accepting the division of pronouns and adjectives according to the morphosyntactic criteria and by observing the prenominal (before nouns, Cro: prednominalni) & pronominal (instead of nouns, Cro. podnominalni) positions, a scheme of features is established which considers the relation of nouns to pronouns and pronominal adjectives (Cro: zamjenični pridjevi) within the categories of reflexiveness/non-reflexiveness and possessiveness/non-possessiveness.
The distinction of (syntactic-clause) anaphors and determiners (Cro. determinanti), and subject and object sentence elements (i. e. sentence function), results in the syntactic analysis of reflexiveness for which the author establishes the fundamental rules of well-formation. The issue of possessive pronominal adjective is analysed particularly in the Croatian compared to English and German language.
The first part provides a conclusion that de Saussure defines the syntagm through syntagmatic relationships, and that in turn they result from linear delimitation. Defined in such a way, the syntagm is not a syntactic unit, but a sequence (string) which can also represent some other units of different grammatical
levels.
The second part gives an outline of Croatian grammars and syntaxes according to their familiarity with phrase-structure syntax. It also provides four reasons and eight general conclusions about the methodology used in Croatian grammars.
The last section considers the relationship between the methodology and the theory through a grammatological aspect. It offers a relationship grid that covers minimal, primitive, basic and derived syntactic units with regard to
syntactic entities of relation, syntactic categories, and the proposed terms of tagmeme and syntagmeme.
and practical use of horizontal dashes. The new Croatian Orthography, recently published by Matrix Croatica and written by Badurina, Marković and Mićanović, contributed even more to the confusion by prescribing solutions that deviate significantly from orthographic tradition and typographic practice. Practical, orthographic, and computational linguistic arguments have been stated and elaborated against these solutions. The authors propose terms spojnica, en-crtica and em-crtica for characters -, – and —. Two possible directions in the development of orthographic rules and usage have been
pointed out. The authors have also drawn attention to some other inconsistencies in orthography which should be systematized and standardized.
Talks
Međutim, danas se hrvatska leksikografija suočava s brojnim izazovima. Ne računajući izvore kao što su mrežni servisi Google Books i Archive.org broj pretraživih leksikografsko‐digitalizacijskih projekata hrvatskoga jezika iznimno je mali i pokriva vrlo ograničeni dio starije hrvatske leksikografske građe. Za razliku od drugih jezika Akademijin hrvatski rječnik još nije mrežno pretraživ. Što se tiče suvremene leksikografije, hrvatski jezik ima mnoštvo rječnika standardnoga jezika, ali je samo jedan leksikografski izvor slobodno i besplatno pretraživ. Prema istraživanju Gerbrich de Jong (2014.) na temelju raščlanjenih 125 akademskih rječnika 83 europskih jezika, otvoreni mrežni pristup standard je europske leksikografije.
Digitalizaciju rječničke građe treba ostvariti kroz otvorenu, besplatnu i naprednu tražilicu jer je jedino tako moguće dugoročno poduprijeti znanstvena istraživanja. Pojedinačna institucionalna ili individualna nastojanja za uspostavom javno dostupnoga leksikografskoga sadržaja nisu dovoljna te je nužno surađivati na digitalizacijskim projektima. Nalik na Hrvatski arhiv weba koji okuplja mrežne tekstove, tako bi Nacionalna i sveučilišna knjižnica mogla inicirati stvaranje kapitalnoga resursa za brojne znanstvenike u vidu tražilice i natkorpusa hrvatskoga jezika, sastavljenoga od svih tiskanih publikacija do čijeg bi digitalnoga sadržaja mogla doći.
Sredinom 2016. sudionici COST akcije IS1305 još uvijek izgrađuju Europski rječnički portal za koji se pretpostavlja da će u dogledno vrijeme postati središnja europska leksikografska metatražilica. U skladu sa svojom bogatom rječničkom tradicijom, (izraženija) prisutnost hrvatskoga jezika na portalu trebala bi biti važan cilj domaće leksikografije, bez obzira je li riječ o digitaliziranoj građi ili rječnicima standardnoga jezika.
Conference Presentations
The orthographic literacy was tested using 41 monitored examples with 650 students of technological studies during five consecutive academic years. The categorization of orthographic disputability into six classes was established, which was used as a methodological guideline for establishing the orthographic standard of Hrvatski pravopis (Croatian orthography) of the Institute of Croatian Language and Linguistics.
Papers
Due to deficiency of similar analyses of Croatian, the dissertation contains two targeted surveys of the average orthographic literacy (526 Croatian polytechnic students) and of the opinion on Croatian orthographic standardization and language policy (2000 members of the Croatian academic society).
The contemporary history of the Croatian orthography standard and the language policy situation in Croatia today are described. 25 instances of orthographic misunderstandings, prejudice, and misconceptions about literacy in Croatian are listed and analyzed. A bibliometric investigation of Croatian orthography manuals from 1639 till 2014 is provided and orthography development phases are established.
The linguistic e-literacy and the broader context of language and computers are described. The traditional linguistic division into langue and speech has been changed into the triad of speech, writing and meaning, pursuant to which the linguographic description of Croatian developed.
Investigating the context of the Croatian language with regard to computing, linguistic e literacy and the extent to which language is adapting to the information society, the dissertation illuminates the need for a green book on the development of literacy, which would give rise to a development strategy for standard Croatian with a defined language policy and measurable success criteria.
The standardization of language has to take into account contemporary writing and literacy and include the computational linguistics and digital media aspects into the language policy.
My response is 9000 characters long and was published a week later in the Jutarnji list, a daily newspaper with national coverage in Croatia.
Jozić's reaction was here: https://www.jutarnji.hr/vijesti/hrvatska/problemi-s-anketnim-ispitivanjem-dr-stojanova-sto-zna-500-ispitanika-a-ne-zna-4-milijuna-gradana-15408346
An online version is behind the paywall and with a changed title: "In Croatia, 12% of respondents believe that Croats and Serbs speak the same language, and the percentage in Serbia will shock you"
A newspaper article/essay published in the daily newspaper "Jutarnji list" on July 9, 2022. It contains some critical considerations of the draft of the Higher Education and Scientific Activity Act in the Republic of Croatia.
It resulted in 11 various quotation mark pairs, of which six are hapax legomena, and the remaining five of which are present in modern Croatian orthographic handbooks. Although many consider „quotation marks” traditional Croatian quotation mark forms, they are only present after Boranić (1930), who ended 150 years of the continuous use of „quotation marks“ in Croatian orthographic books. As opposed to the first quotation marks, which appeared in Šilobod's Aritmetika (1758), single quotation marks came much later with Kušar (1889). Eight single quotation mark pairs were found, of which two are hapax legomena, with six total meanings.
Twenty-one meanings of quotation marks are described and categorized, of which eighteen are used in Croatian orthographic books from Kratki navuk and Uputjenje (both from 1779) to the Institute of Croatian Language and Linguistics' 2013 Hrvatski pravopis. Croatian orthographic books describe rules for eleven of them in a number of meanings ranging from four (Tutavac and Anić-Silić) to ten (Cipra-Klaić).
Quotation marks are examined from three research perspectives: the orthographic and sociolinguistic perspective, the linguographic and computational perspective, as well as
the terminological perspective.
Of the thirty characters in five punctuation subcategories with the feature of a quotation mark in the Unicode system, fifteen of them are Latinic (8 quotation marks and 7 single quotation marks). Croatian orthographic books use six of eight quotation marks („ “ » « ” " plus two graphemes that do not exist in Unicode) and all seven single quotation marks (‚ ‛ ’ ‘ ' › ‹ plus one other non-standardized grapheme).
Two models of nomenclature for the terminological norming of all existing quotation marks are suggested (not only for signs that have been used or are still used in the Croatian language): one that is founded in a graphic, graphemic description, and one that is founded in terminological transparency.
In place of a discussion on the choice of graphemes in the Croatian linguistic norm, all relevant quotation marks and single quotation marks are evaluated by seven criteria (orthographic
tradition and continuity, frequency, transparency, legibility, typographic aesthetics, computational acceptance, and distinctiveness), and three normative models are suggested
for the Croatian graphemic standard for quotation marks.
Although a very broad categorization can be established for European orthographic methodologies, six methodological perspectives of orthographic standardization stand out for the present status of the Croatian language with regard to standardization (initiative, authority, acceptance, engagement scope, establishing standard model and authorship), which are insufficiently discussed in domestic literature.
It is stressed that the regulation of orthographic policy by means of laws has a positive impact on the stability of the orthographic standard and that it is not possible to implement high-quality orthographic standardization without a language
authority in the community.
It is concluded that the establishment of a regulatory center and the creation of fundamental documents on orthographic planning (the green and the white book, development strategy, orthography dispute resolution, and other) could have a crucial
impact on further successful development of Croatian orthographic standardization methodology.
CroMo morphological analyzer for Croatian. It was designed as a monolithic finite state based morphological segmentation and annotation tool, that emits feature bundles for all recognized morphemes and sub-morphemes, and generates lemmata for the lexical root and complex compounds in one swoop. Its linguistic base uses morphological and morphotactic regularities only, and is easily extensible. The development
and potential expansions of the lexical base is and can be done in a short amount of time, generating platform independent and extremely efficient code and binaries based exclusively on open source tools. We approach the (linguistic) interoperability problem utilizing a GOLD-based (Ontology-oriented) annotation schema for uniquely mappable linguistic terminology in the annotation output. CroMo was developed to provide
an initial morphological and morpho-syntactic annotation and lemmatization for the Croatian Language Corpus, but can be applied to other similar languages.
The terminology of scientific and technical chemistry is taken as a sample and prototype due to its relevance, complexity and specificity. Semi-automatic method of term extraction from digitalized university textbook Osnove analitičke kemije (Skoog 1999), filtration, normalization, and tokenization have been performed. Advantages and disadvantages of this method have been shown. Answers to linguistic issues in chemical terminology, which are result of consensus among chemists and linguists, have been suggested.
Distinction between complementation and adjunctivization relations, or rather exocentrism and endocentrism during the process of defining the dependency principles, is emphasized.
The tagmeme is defined as the basic syntactic unit for the government and agreement principle while the relationship between constructor and functor is established on the syntactic level of description.
Syntactic redundancy explains the relationship between complement and adjunct as the important aspect of syntagmeme relations.
The operating function principle establishes tagmemes with specific syntactic function in formation of question, negation, affirmation, intensity, motivation, and imperative verbal mood.
The analysis of verbal phrase explains the relationship between different positions of verbs and morphosyntactic and syntactic forms.
Aside from two new Unicode dash characters (the two-em dash and three-em dash, Unicode 6.1, January 2012) having been standardized in the meantime, differing methodology and a comparison of the linguistic-historical and computational linguistic aspects have spread awareness of dash characters in the Croatian language as described in Portada-Stojanov (2009). A categorization is presented that is sensitive to the dichotomy of graphic representation and meaning that divides all dash characters into five hierarchical levels. Among the 44 Unicode horizontal and unbroken dash characters, a division into type, time, functionality, direction, and line height has resulted in 11 contemporary Latin alphabetic horizontal central characters, among which each language written in the Latin alphabet chooses its own. The semantic value and usage of all Unicode dash graphemes has been described.
On the other hand, the paper also described dash characters from the perspective of Croatian historical linguistics and orthography. In comparison to the rich repository of standardized Unicode dash characters, it has been shown that orthographic standards are significantly reductive. Orthographic norming of dash characters is divided into two periods and three groups, depending on their graphemic form (the first and second generation of orthography manuals) and terminology (the pre-standard phase and the two standard norming schools, depending on the acceptance of the terminological pairs “spojnica – crtica” and “crtica – crta”).
The historical linguistic and computational linguistic comparative research and the contrastive analysis of the Unicode standardization of dash characters with traditional orthographic descriptions of dash characters was intended to highlight (i) the need for a broader, interdisciplinary approach to describing written linguistic practice, (ii) the insufficiency of descriptions in primary and secondary school orthography manuals for modern writing, and (iii) the insufficiency of the existing Croatian codification of both terminological schools. In order for orthography manuals to be called scholarly, it is claimed that computer writing should be better described, and that a differentiation between characters and graphemes should be introduced on the level of punctuation. One of the areas in which orthography manuals could bring themselves technologically up to date is the issue of the writing of compound words at the beginning of a broken line, and the paper provides eight reasons to abandon the current tradition.
Analysis has shown that it would be justified to base dash codification on three or four characters, which reduces the 11 Latin Unicode characters to basic groups of dashes – the short, medium, long, and very long dashes, referred to as c1, c2, c3 and c4.
The computational procedures of information retrieval and n-gram SQL/regex queries will be shown in order to extract token co-frequencies and reveal phrases, collocations and more constant syntagmemes. The JavaScript wiring library WireIt is used for a token frequencies visualization in browser.
We have compared the output with Google search results based on which we have pointed out seven Google search shortcomings for linguistic investigations and have concluded that our approach could produce unique results in linguistic research.
The methodological possibilities are emphasized for studying anaphoric references through semantic-discourse
and syntactic-clause framework, referring to the distinction of anaphor and anaphora (like Crystal [1997a] does).
By accepting the division of pronouns and adjectives according to the morphosyntactic criteria and by observing the prenominal (before nouns, Cro: prednominalni) & pronominal (instead of nouns, Cro. podnominalni) positions, a scheme of features is established which considers the relation of nouns to pronouns and pronominal adjectives (Cro: zamjenični pridjevi) within the categories of reflexiveness/non-reflexiveness and possessiveness/non-possessiveness.
The distinction of (syntactic-clause) anaphors and determiners (Cro. determinanti), and subject and object sentence elements (i. e. sentence function), results in the syntactic analysis of reflexiveness for which the author establishes the fundamental rules of well-formation. The issue of possessive pronominal adjective is analysed particularly in the Croatian compared to English and German language.
The first part provides a conclusion that de Saussure defines the syntagm through syntagmatic relationships, and that in turn they result from linear delimitation. Defined in such a way, the syntagm is not a syntactic unit, but a sequence (string) which can also represent some other units of different grammatical
levels.
The second part gives an outline of Croatian grammars and syntaxes according to their familiarity with phrase-structure syntax. It also provides four reasons and eight general conclusions about the methodology used in Croatian grammars.
The last section considers the relationship between the methodology and the theory through a grammatological aspect. It offers a relationship grid that covers minimal, primitive, basic and derived syntactic units with regard to
syntactic entities of relation, syntactic categories, and the proposed terms of tagmeme and syntagmeme.
and practical use of horizontal dashes. The new Croatian Orthography, recently published by Matrix Croatica and written by Badurina, Marković and Mićanović, contributed even more to the confusion by prescribing solutions that deviate significantly from orthographic tradition and typographic practice. Practical, orthographic, and computational linguistic arguments have been stated and elaborated against these solutions. The authors propose terms spojnica, en-crtica and em-crtica for characters -, – and —. Two possible directions in the development of orthographic rules and usage have been
pointed out. The authors have also drawn attention to some other inconsistencies in orthography which should be systematized and standardized.
Međutim, danas se hrvatska leksikografija suočava s brojnim izazovima. Ne računajući izvore kao što su mrežni servisi Google Books i Archive.org broj pretraživih leksikografsko‐digitalizacijskih projekata hrvatskoga jezika iznimno je mali i pokriva vrlo ograničeni dio starije hrvatske leksikografske građe. Za razliku od drugih jezika Akademijin hrvatski rječnik još nije mrežno pretraživ. Što se tiče suvremene leksikografije, hrvatski jezik ima mnoštvo rječnika standardnoga jezika, ali je samo jedan leksikografski izvor slobodno i besplatno pretraživ. Prema istraživanju Gerbrich de Jong (2014.) na temelju raščlanjenih 125 akademskih rječnika 83 europskih jezika, otvoreni mrežni pristup standard je europske leksikografije.
Digitalizaciju rječničke građe treba ostvariti kroz otvorenu, besplatnu i naprednu tražilicu jer je jedino tako moguće dugoročno poduprijeti znanstvena istraživanja. Pojedinačna institucionalna ili individualna nastojanja za uspostavom javno dostupnoga leksikografskoga sadržaja nisu dovoljna te je nužno surađivati na digitalizacijskim projektima. Nalik na Hrvatski arhiv weba koji okuplja mrežne tekstove, tako bi Nacionalna i sveučilišna knjižnica mogla inicirati stvaranje kapitalnoga resursa za brojne znanstvenike u vidu tražilice i natkorpusa hrvatskoga jezika, sastavljenoga od svih tiskanih publikacija do čijeg bi digitalnoga sadržaja mogla doći.
Sredinom 2016. sudionici COST akcije IS1305 još uvijek izgrađuju Europski rječnički portal za koji se pretpostavlja da će u dogledno vrijeme postati središnja europska leksikografska metatražilica. U skladu sa svojom bogatom rječničkom tradicijom, (izraženija) prisutnost hrvatskoga jezika na portalu trebala bi biti važan cilj domaće leksikografije, bez obzira je li riječ o digitaliziranoj građi ili rječnicima standardnoga jezika.
The orthographic literacy was tested using 41 monitored examples with 650 students of technological studies during five consecutive academic years. The categorization of orthographic disputability into six classes was established, which was used as a methodological guideline for establishing the orthographic standard of Hrvatski pravopis (Croatian orthography) of the Institute of Croatian Language and Linguistics.