Skip to main content

Greville G. Corbett

University of Surrey, Surrey Morphology Group, Faculty Member

Followers

6,779

Following

614

Co-authors

29

Public Views

please see my home page at: www.smg.surrey.ac.uk/corbett

less

InterestsView All (30)

Uploads

Papers

The Agreement Hierarchy revisited: The typology of controllers

Word Structure, 2023

The Agreement Hierarchy consists of four principal target positions: attributive, predicate, rela... more The Agreement Hierarchy consists of four principal target positions: attributive, predicate, relative pronoun and anaphoric personal pronoun. It constrains the distribution of alternative agreements, in that the likelihood of agreement with greater semantic justification increases monotonically as we move rightwards along the hierarchy. The Agreement Hierarchy covers a wide range of disparate data, and continues to figure regularly in work on theoretical syntax. Since the hierarchy was first proposed, typology has moved on. This means that to remain fit for the purposes for which it is currently used, the hierarchy needs an overhaul. The typology of agreement controllers is the area where the need is most urgent; this is therefore our focus. The canonical typology of controllers is shown to have two dimensions: lexeme to phrase, and local to extraneous (the latter involving honorific agreement, associative agreement, back agreement and "pancake sentences"). These two dimensions are amply illustrated. Finally, interactions between the different types of agreement controller are investigated, since these prove revealing for the typology. Besides making progress on the typology of agreement, the paper contributes to typology more generally, in incorporating insights from other typological disciplines.

The Agreement Hierarchy and (generalized) semantic agreement

Glossa: a journal of general linguistics, 2023

Agreement systems often allow alternatives: This family has / have lost everything. Therefore typ... more Agreement systems often allow alternatives: This family has / have lost everything. Therefore typology requires a means for generalizing over them. Instances like plural have are frequently termed "semantic agreement" (vs. "syntactic agreement" for singular has), but this notion has proved difficult. The challenge is to encompass the full typological range of alternative agreements. These include the core instances: (i) hybrid nouns like family; and (ii) constructional mismatches, such as conjoined nominal phrases, but also less obvious phenomena: (iii) split hybrids where neither alternative is straightforwardly semantic, both appear related to form, and (iv) examples like Scandinavian "pancake sentences", which stretch semantic agreement towards pragmatics. These different types are comparable in that (i) the alternatives are realized by the normal agreement forms; and (ii) they are subject to the Agreement Hierarchy. Hence they demand a common treatment. To achieve this, I first unpack the Agreement Hierarchy constraint into the agreement target positions and the directionality implied by "semantic agreement". I show how the latter arises from mismatches between the agreement information available from different sources. Typically, in the core instances, the information from one source is more evidently semantic than from the second. But in other instances, this is less clear. I argue that it is more parsimonious to treat these less obvious phenomena as falling under the constraint of the Agreement Hierarchy. They are seen as part of the pattern of a Hierarchy of Agreement Sources, which gives different degrees of "generalized semantic agreement". This reworking offers a more robust underpinning to the Agreement Hierarchy, and fits into a current trend: a typology that works is no longer sufficient, rather we examine and justify the defining criteria, and relate them to the underlying attributes of the domain.

Is morphosyntactic agreement reflected in acoustic detail? The s duration of English regular plural nouns

English Language and Linguistics

Studies have challenged the assumption that different types of word-final s in English are homoph... more Studies have challenged the assumption that different types of word-final s in English are homophonous. On the one hand, affixal (e.g. laps) and non-affixal s (e.g. lapse) differ in their duration; on the other hand, variation exists across several types of affixal s (e.g. between the plural (cars) and genitive plural (cars’)). This line of research was recently expanded in a study in which an interesting side effect appeared: the s was longer if followed by a past tense verb (e.g. The pods/odds eventually dropped), in comparison to a following present tense verb (e.g. The old screens/jeans obviously need replacing.). Put differently, the s became longer in the absence of overt morphosyntactic agreement, where it was mostly the sole plurality marker in the sentence. The objective of the present article is to examine whether this effect can be replicated in a more controlled setting. Having considered a large number of potential confounding variables in a reading experiment, we found...

Surrey Syncretisms Database

The Surrey Syncretisms Database encodes information on inflectional syncretism in 30 genetically ... more The Surrey Syncretisms Database encodes information on inflectional syncretism in 30 genetically and geographically diverse languages, representing such morphosyntactic features as case, person, number and gender, in all the inflectional classes where they are relevant. The database was created for the project 'Where word forms collide: A typology of syncretism', funded by the Economic and Social Research Council under grant number R000237939.

Plural number words in the Alor-Pantar languages

Studies in Diversity Linguistics, Sep 6, 2014

The Alor-Pantar family constitutes the westernmost outlier group of Papuan (Non-Austronesian) lan... more The Alor-Pantar family constitutes the westernmost outlier group of Papuan (Non-Austronesian) languages. Its twenty or so languages are spoken on the islands of Alor and Pantar, located just north of Timor, in eastern Indonesia. Together with the Papuan languages of Timor, they make up the Timor-Alor-Pantar family. The languages average 5,000 speakers and are under pressure from the local Malay variety as well as the national language, Indonesian.This volume studies the internal and external linguistic history of this interesting group, and showcases some of its unique typological features, such as the preference to index the transitive patient-like argument on the verb but not the agent-like one; the extreme variety in morphological alignment patterns; the use of plural number words; the existence of quinary numeral systems; the elaborate spatial deictic systems involving an elevation component; and the great variation exhibited in their kinship systems.Unlike many other Papuan lan...

Understanding Intra-System Dependencies: Classifiers In Lao

Understanding intra-system dependencies: Classifiers in Lao

Lexicon Schemas and Related Data Models: when Standards Meet Users

Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers ... more Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers and field linguists. A variety of lexicon schemas have been developed, with goals ranging from computational lexicography (DATR) through archiving (LIFT, TEI) to standardization (LMF, FSR). A number of requirements for lexicon schemas are given. The lexicon schemas are introduced and compared to each other in terms of conversion and usability for this particular user group, using a common lexicon entry and providing examples for each schema under consideration. The formats are assessed and the final recommendation is given for the potential users, namely to request standard compliance from the developers of the tools used. This paper should foster a discussion between authors of standards, lexicographers and field linguists.

Periphrasis: The Role of Syntax and Morphology in Paradigms

Periphrasis straddles the border between two major linguistic components, morphology and syntax. ... more Periphrasis straddles the border between two major linguistic components, morphology and syntax. It describes a situation where a grammatical meaning, such as a tense, which could be expected to be expressed morphologically within a word, is instead expressed by a syntactic phrase. Inclusion of syntactic phrases in morphological paradigms creates analytical and theoretical problems that have yet to be resolved by linguists, who have been hampered by the rather narrow range of data available for consideration and by a lack of adequate theoretical devices. This book addresses the challenge by broadening the range of phenomena under discussion and presenting new theoretical approaches to the problem of periphrasis. Part I takes four key languages from diverse families - Nakh-Daghestanian, Gunwinyguan (Australian), Uralic and Indo-European - as examples of languages in which periphrasis poses particular problems for current linguistic theories. Part II views periphrasis in different con...

Linguistic Typology

Linguistics, and typology in particular, can have a bright future. We justify this optimism by di... more Linguistics, and typology in particular, can have a bright future. We justify this optimism by discussing comparability from two angles. First, we take the opportunity presented by this special issue of Linguistic Typology to pause for a moment and make explicit some of the logical underpinnings of typological sciences, linguistics included, which we believe are worth reminding ourselves of. Second, we give a brief illustration of comparison, and particularly measurement, within modern typology.

Pluralia tantum nouns and the theory of features: a typology of nouns with non-canonical number properties

Morphology, 2018

The theory of feature systems: One feature versus two for Kayardild tense-aspect-mood

Morphology, 2016

Russian colour term salience

Russian Linguistics, 1989

The penumbra of morphosyntactic feature systems

Morphology, 2010

Colour terms in Russian: reflections of typological constraints in a single language

Journal of Linguistics, 1988

One of the milestones in typological studies is Berlin & Kay's (1969) account of basic colour... more One of the milestones in typological studies is Berlin & Kay's (1969) account of basic colour terms, which has produced a steady stream of research of various types. Berlin & Kay summarized their work as follows.In sum, our two major findings indicate that the referents for the basic color terms of all languages appear to be drawn from a set of eleven universal perceptual categories, and these categories become encoded in the history of a given language in a partially fixed order (1969: 4–5).

Definiteness, Gender, and Hybrids: Evidence from Norwegian Dialects

Journal of Germanic Linguistics, 2012

In some Norwegian dialects, such as older Oslo dialect, the noun mamma ‘mother’ unexpectedly appe... more In some Norwegian dialects, such as older Oslo dialect, the noun mamma ‘mother’ unexpectedly appears to be masculine. The Nordreisa dialect (Northern Norwegian) goes one step further. The word looks like it is masculine, but only in the definite form. This is an unusual “split” because gender mixture is normally based on number, not definiteness (but we find some few corroborative examples in other Norwegian dialects and different, but converging evidence on the Web). The Nordreisa example of mamma is unusual also because agreement targets are affected differently. The preference is for masculine agreement within the noun phrase, but for feminine agreement outside it. This is, therefore, an intriguing example since it combines a split based on definiteness with different gender require-ments according to different agreement targets. On careful analysis, and given strict adherence to the classical, agreement-based definition of gender, the unusual behavior of mamma turns out to confo...

Resources for suppletion: a typological database and a bibliography

Resources for suppletion: a typological database and a bibliography, Feb 11, 2016

The phenomenon of suppletion, as found in English go~went where different inflectional forms of ... more The phenomenon of suppletion, as found in English go~went where different
inflectional forms of the same lexical item are not related phonologically, has a special
place in morphology. Part of its importance is that it sets one of the outer bounds for the
notion ‘possible word’ in a human language. It provokes questions about how such
forms are to be treated in our theories, and how they are stored (Carstairs-McCarthy
1994). There has been considerable work on suppletion, particularly from Osthoff
(1899) onwards. Current interest in the topic is shown by the recent appearance of two
dissertations (Veselinova 2003 and Veselinovič 2003). While the body of research is
extensive, the range of languages investigated is rather restricted in many publications.
In order to stimulate further progress, we have constructed and made available a
database (Brown, Chumakina, Corbett and Hippisley 2004). We hope this will help to
put future research on a broader empirical base An annotated bibliography is now
available (Chumakina 2004); it contains over seventy entries on works written in five
different languages (English, French, German, Italian and Russian) and this will give
the reader a view of the literature.

Corbett 2023 external splits

Language, 2023

The lexicon divides into parts of speech (or lexical categories), and there are cross-cutting reg... more The lexicon divides into parts of speech (or lexical categories), and there are cross-cutting regularities (features). These two dimensions of analysis take us a long way, but several phenomena elude us. For these the term 'split' is used extensively ('case split', 'split agreement', and more), but in confusingly different ways. Yet there is a unifying notion here. I show that a split is an additional partition, whether in the part-of-speech inventory or in the feature system. On this base an elegant typology can be constructed, using minimal machinery. The typology starts from four external relations (government, agreement, selection, and anti-government), and it specifies four types of split within each (sixteen possibilities in all). This typology (i) highlights less familiar splits, from diverse languages, and fits them into the larger picture; (ii) introduces a new relation, anti-government, and documents it; (iii) elucidates the complexities of multiple splits; and (iv) clarifies what exactly is split, which leads to a sharpening of our analyses and applies across different traditions.*

Corbett 2023 external splits Supplementary material

Language, 2023

Additional materil for 2023 publication "The typology of external splits"

Marcel Schlechtweg & Greville G. Corbett: Is morphosyntactic agreement reflected in acoustic detail? The s duration of English regular plural nouns 1

by Greville G. Corbett and Marcel Schlechtweg

English Language and Linguistics, 2023

Studies have challenged the assumption that different types of word-final s in English are homoph... more Studies have challenged the assumption that different types of word-final s in English are homophonous. On the one hand, affixal (e.g. laps) and non-affixal s (e.g. lapse) differ in their duration; on the other hand, variation exists across several types of affixal s (e.g. between the plural (cars) and genitive plural (cars')). This line of research was recently expanded in a study in which an interesting side effect appeared: the s was longer if followed by a past tense verb (e.g. The pods/odds eventually dropped), in comparison to a following present tense verb (e.g. The old screens/jeans obviously need replacing.). Put differently, the s became longer in the absence of overt morphosyntactic agreement, where it was mostly the sole plurality marker in the sentence. The objective of the present article is to examine whether this effect can be replicated in a more controlled setting. Having considered a large number of potential confounding variables in a reading experiment, we found an effect in the expected direction, one that is compatible with the literature on the impact that predictability has on duration. We interpret this finding against the background of the role of fine acoustic detail in language.

Default Genders

The Agreement Hierarchy revisited: The typology of controllers

Word Structure, 2023

The Agreement Hierarchy consists of four principal target positions: attributive, predicate, rela... more The Agreement Hierarchy consists of four principal target positions: attributive, predicate, relative pronoun and anaphoric personal pronoun. It constrains the distribution of alternative agreements, in that the likelihood of agreement with greater semantic justification increases monotonically as we move rightwards along the hierarchy. The Agreement Hierarchy covers a wide range of disparate data, and continues to figure regularly in work on theoretical syntax. Since the hierarchy was first proposed, typology has moved on. This means that to remain fit for the purposes for which it is currently used, the hierarchy needs an overhaul. The typology of agreement controllers is the area where the need is most urgent; this is therefore our focus. The canonical typology of controllers is shown to have two dimensions: lexeme to phrase, and local to extraneous (the latter involving honorific agreement, associative agreement, back agreement and "pancake sentences"). These two dimensions are amply illustrated. Finally, interactions between the different types of agreement controller are investigated, since these prove revealing for the typology. Besides making progress on the typology of agreement, the paper contributes to typology more generally, in incorporating insights from other typological disciplines.

The Agreement Hierarchy and (generalized) semantic agreement

Glossa: a journal of general linguistics, 2023

Agreement systems often allow alternatives: This family has / have lost everything. Therefore typ... more Agreement systems often allow alternatives: This family has / have lost everything. Therefore typology requires a means for generalizing over them. Instances like plural have are frequently termed "semantic agreement" (vs. "syntactic agreement" for singular has), but this notion has proved difficult. The challenge is to encompass the full typological range of alternative agreements. These include the core instances: (i) hybrid nouns like family; and (ii) constructional mismatches, such as conjoined nominal phrases, but also less obvious phenomena: (iii) split hybrids where neither alternative is straightforwardly semantic, both appear related to form, and (iv) examples like Scandinavian "pancake sentences", which stretch semantic agreement towards pragmatics. These different types are comparable in that (i) the alternatives are realized by the normal agreement forms; and (ii) they are subject to the Agreement Hierarchy. Hence they demand a common treatment. To achieve this, I first unpack the Agreement Hierarchy constraint into the agreement target positions and the directionality implied by "semantic agreement". I show how the latter arises from mismatches between the agreement information available from different sources. Typically, in the core instances, the information from one source is more evidently semantic than from the second. But in other instances, this is less clear. I argue that it is more parsimonious to treat these less obvious phenomena as falling under the constraint of the Agreement Hierarchy. They are seen as part of the pattern of a Hierarchy of Agreement Sources, which gives different degrees of "generalized semantic agreement". This reworking offers a more robust underpinning to the Agreement Hierarchy, and fits into a current trend: a typology that works is no longer sufficient, rather we examine and justify the defining criteria, and relate them to the underlying attributes of the domain.

Is morphosyntactic agreement reflected in acoustic detail? The s duration of English regular plural nouns

English Language and Linguistics

Studies have challenged the assumption that different types of word-final s in English are homoph... more Studies have challenged the assumption that different types of word-final s in English are homophonous. On the one hand, affixal (e.g. laps) and non-affixal s (e.g. lapse) differ in their duration; on the other hand, variation exists across several types of affixal s (e.g. between the plural (cars) and genitive plural (cars’)). This line of research was recently expanded in a study in which an interesting side effect appeared: the s was longer if followed by a past tense verb (e.g. The pods/odds eventually dropped), in comparison to a following present tense verb (e.g. The old screens/jeans obviously need replacing.). Put differently, the s became longer in the absence of overt morphosyntactic agreement, where it was mostly the sole plurality marker in the sentence. The objective of the present article is to examine whether this effect can be replicated in a more controlled setting. Having considered a large number of potential confounding variables in a reading experiment, we found...

Surrey Syncretisms Database

The Surrey Syncretisms Database encodes information on inflectional syncretism in 30 genetically ... more The Surrey Syncretisms Database encodes information on inflectional syncretism in 30 genetically and geographically diverse languages, representing such morphosyntactic features as case, person, number and gender, in all the inflectional classes where they are relevant. The database was created for the project 'Where word forms collide: A typology of syncretism', funded by the Economic and Social Research Council under grant number R000237939.

Plural number words in the Alor-Pantar languages

Studies in Diversity Linguistics, Sep 6, 2014

The Alor-Pantar family constitutes the westernmost outlier group of Papuan (Non-Austronesian) lan... more The Alor-Pantar family constitutes the westernmost outlier group of Papuan (Non-Austronesian) languages. Its twenty or so languages are spoken on the islands of Alor and Pantar, located just north of Timor, in eastern Indonesia. Together with the Papuan languages of Timor, they make up the Timor-Alor-Pantar family. The languages average 5,000 speakers and are under pressure from the local Malay variety as well as the national language, Indonesian.This volume studies the internal and external linguistic history of this interesting group, and showcases some of its unique typological features, such as the preference to index the transitive patient-like argument on the verb but not the agent-like one; the extreme variety in morphological alignment patterns; the use of plural number words; the existence of quinary numeral systems; the elaborate spatial deictic systems involving an elevation component; and the great variation exhibited in their kinship systems.Unlike many other Papuan lan...

Understanding Intra-System Dependencies: Classifiers In Lao

Understanding intra-system dependencies: Classifiers in Lao

Lexicon Schemas and Related Data Models: when Standards Meet Users

Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers ... more Lexicon schemas and their use are discussed in this paper from the perspective of lexicographers and field linguists. A variety of lexicon schemas have been developed, with goals ranging from computational lexicography (DATR) through archiving (LIFT, TEI) to standardization (LMF, FSR). A number of requirements for lexicon schemas are given. The lexicon schemas are introduced and compared to each other in terms of conversion and usability for this particular user group, using a common lexicon entry and providing examples for each schema under consideration. The formats are assessed and the final recommendation is given for the potential users, namely to request standard compliance from the developers of the tools used. This paper should foster a discussion between authors of standards, lexicographers and field linguists.

Periphrasis: The Role of Syntax and Morphology in Paradigms

Periphrasis straddles the border between two major linguistic components, morphology and syntax. ... more Periphrasis straddles the border between two major linguistic components, morphology and syntax. It describes a situation where a grammatical meaning, such as a tense, which could be expected to be expressed morphologically within a word, is instead expressed by a syntactic phrase. Inclusion of syntactic phrases in morphological paradigms creates analytical and theoretical problems that have yet to be resolved by linguists, who have been hampered by the rather narrow range of data available for consideration and by a lack of adequate theoretical devices. This book addresses the challenge by broadening the range of phenomena under discussion and presenting new theoretical approaches to the problem of periphrasis. Part I takes four key languages from diverse families - Nakh-Daghestanian, Gunwinyguan (Australian), Uralic and Indo-European - as examples of languages in which periphrasis poses particular problems for current linguistic theories. Part II views periphrasis in different con...

Linguistic Typology

Linguistics, and typology in particular, can have a bright future. We justify this optimism by di... more Linguistics, and typology in particular, can have a bright future. We justify this optimism by discussing comparability from two angles. First, we take the opportunity presented by this special issue of Linguistic Typology to pause for a moment and make explicit some of the logical underpinnings of typological sciences, linguistics included, which we believe are worth reminding ourselves of. Second, we give a brief illustration of comparison, and particularly measurement, within modern typology.

Pluralia tantum nouns and the theory of features: a typology of nouns with non-canonical number properties

Morphology, 2018

The theory of feature systems: One feature versus two for Kayardild tense-aspect-mood

Morphology, 2016

Russian colour term salience

Russian Linguistics, 1989

The penumbra of morphosyntactic feature systems

Morphology, 2010

Colour terms in Russian: reflections of typological constraints in a single language

Journal of Linguistics, 1988

One of the milestones in typological studies is Berlin & Kay's (1969) account of basic colour... more One of the milestones in typological studies is Berlin & Kay's (1969) account of basic colour terms, which has produced a steady stream of research of various types. Berlin & Kay summarized their work as follows.In sum, our two major findings indicate that the referents for the basic color terms of all languages appear to be drawn from a set of eleven universal perceptual categories, and these categories become encoded in the history of a given language in a partially fixed order (1969: 4–5).

Definiteness, Gender, and Hybrids: Evidence from Norwegian Dialects

Journal of Germanic Linguistics, 2012

In some Norwegian dialects, such as older Oslo dialect, the noun mamma ‘mother’ unexpectedly appe... more In some Norwegian dialects, such as older Oslo dialect, the noun mamma ‘mother’ unexpectedly appears to be masculine. The Nordreisa dialect (Northern Norwegian) goes one step further. The word looks like it is masculine, but only in the definite form. This is an unusual “split” because gender mixture is normally based on number, not definiteness (but we find some few corroborative examples in other Norwegian dialects and different, but converging evidence on the Web). The Nordreisa example of mamma is unusual also because agreement targets are affected differently. The preference is for masculine agreement within the noun phrase, but for feminine agreement outside it. This is, therefore, an intriguing example since it combines a split based on definiteness with different gender require-ments according to different agreement targets. On careful analysis, and given strict adherence to the classical, agreement-based definition of gender, the unusual behavior of mamma turns out to confo...

Resources for suppletion: a typological database and a bibliography

Resources for suppletion: a typological database and a bibliography, Feb 11, 2016

The phenomenon of suppletion, as found in English go~went where different inflectional forms of ... more The phenomenon of suppletion, as found in English go~went where different
inflectional forms of the same lexical item are not related phonologically, has a special
place in morphology. Part of its importance is that it sets one of the outer bounds for the
notion ‘possible word’ in a human language. It provokes questions about how such
forms are to be treated in our theories, and how they are stored (Carstairs-McCarthy
1994). There has been considerable work on suppletion, particularly from Osthoff
(1899) onwards. Current interest in the topic is shown by the recent appearance of two
dissertations (Veselinova 2003 and Veselinovič 2003). While the body of research is
extensive, the range of languages investigated is rather restricted in many publications.
In order to stimulate further progress, we have constructed and made available a
database (Brown, Chumakina, Corbett and Hippisley 2004). We hope this will help to
put future research on a broader empirical base An annotated bibliography is now
available (Chumakina 2004); it contains over seventy entries on works written in five
different languages (English, French, German, Italian and Russian) and this will give
the reader a view of the literature.

Corbett 2023 external splits

Language, 2023

The lexicon divides into parts of speech (or lexical categories), and there are cross-cutting reg... more The lexicon divides into parts of speech (or lexical categories), and there are cross-cutting regularities (features). These two dimensions of analysis take us a long way, but several phenomena elude us. For these the term 'split' is used extensively ('case split', 'split agreement', and more), but in confusingly different ways. Yet there is a unifying notion here. I show that a split is an additional partition, whether in the part-of-speech inventory or in the feature system. On this base an elegant typology can be constructed, using minimal machinery. The typology starts from four external relations (government, agreement, selection, and anti-government), and it specifies four types of split within each (sixteen possibilities in all). This typology (i) highlights less familiar splits, from diverse languages, and fits them into the larger picture; (ii) introduces a new relation, anti-government, and documents it; (iii) elucidates the complexities of multiple splits; and (iv) clarifies what exactly is split, which leads to a sharpening of our analyses and applies across different traditions.*

Corbett 2023 external splits Supplementary material

Language, 2023

Additional materil for 2023 publication "The typology of external splits"

Marcel Schlechtweg & Greville G. Corbett: Is morphosyntactic agreement reflected in acoustic detail? The s duration of English regular plural nouns 1

by Greville G. Corbett and Marcel Schlechtweg

English Language and Linguistics, 2023

Studies have challenged the assumption that different types of word-final s in English are homoph... more Studies have challenged the assumption that different types of word-final s in English are homophonous. On the one hand, affixal (e.g. laps) and non-affixal s (e.g. lapse) differ in their duration; on the other hand, variation exists across several types of affixal s (e.g. between the plural (cars) and genitive plural (cars')). This line of research was recently expanded in a study in which an interesting side effect appeared: the s was longer if followed by a past tense verb (e.g. The pods/odds eventually dropped), in comparison to a following present tense verb (e.g. The old screens/jeans obviously need replacing.). Put differently, the s became longer in the absence of overt morphosyntactic agreement, where it was mostly the sole plurality marker in the sentence. The objective of the present article is to examine whether this effect can be replicated in a more controlled setting. Having considered a large number of potential confounding variables in a reading experiment, we found an effect in the expected direction, one that is compatible with the literature on the impact that predictability has on duration. We interpret this finding against the background of the role of fine acoustic detail in language.

Default Genders

Fedden, Sebastian, Jenny Audring & Greville G. Corbett (eds.) 2018. Non-canonical gender systems. OUP

by Sebastian Fedden and Greville G. Corbett

https://global.oup.com/academic/product/non-canonical-gender-systems-9780198795438?cc=ch&lang=en&, 2018

Grammatical gender is famously the most puzzling of the grammatical categories. Despite our solid... more Grammatical gender is famously the most puzzling of the grammatical categories. Despite our solid knowledge about the typology of gender systems, exciting and unexpected patterns keep turning up which defy easy classification and straightforward analysis. Some of these question, stretch or even threaten to cross the outer boundaries of the category. These outer boundaries are a largely unexplored territory; yet they are essential for our understanding of gender, besides being interesting in their own right. The purpose of this book is to explore the external borders of the category of gender and discuss their theoretical significance. The ideal framework for this endeavour is provided by Canonical Typology, a cutting-edge approach already successfully applied to a range of linguistic phenomena. In the Canonical approach a linguistic phenomenon, for example a morphosyntactic feature like gender, is established in terms of a canonical ideal: the clearest instance of the phenomenon. The canonical ideal is a clustering of properties that serves as a baseline from which we measure the actual examples that we find. This approach allows us to analyse any gender system and determine for each of its component properties whether it is more or less canonical. The languages discussed in this volume all diverge from the canonical ideal in interesting ways. For each language, we have lined up international experts, all of whom approach their work from a typological perspective. We explore a wide range of typologically different languages drawn from all over the world, from South America to Melanesia, from an Italo-Romance dialect of Central Italy to Mawng of Northern Australia.

ARCHI: COMPLEXITIES OF AGREEMENT IN CROSS-THEORETICAL PERSPECTIVE

by Greville G. Corbett and Dunstan Brown

This book presents a controlled evaluaon of three widely pracsed syntacc theories on the basis of... more This book presents a controlled evaluaon of three widely pracsed syntacc theories on the basis of the extremely complex agreement system of Archi, an endangered Nakh‐Daghestanian language. Even straighorward agreement examples are puzzling for syntaccians because agreement involves both redundancy and arbitrariness. Agreement is a significant source of syntacc complexity, exacerbated by the great diversity of its morphological expression. Imagine how the discipline of linguiscs would be if expert praconers of different theories met in a collaborave seng to tackle such challenging agreement data ‐ to test the limits of their models and examine how the predicons of their theories differ given the same linguisc facts. Following an overview of the essenals of Archi grammar and an introducon to the remarkable agreement phenomena found in this language, three disnct accounts of the Archi data examine the tractability and predicve power of major syntacc theories: Head‐driven Phrase Structure Grammar, Lexical Funconal Grammar, and Minimalism. The final chapter compares the problems encountered and the soluons proposed in the different syntacc analyses and outlines the implicaons of the challenges that the Archi agreement system poses for linguisc theory.

The syntax-morphology interface: a study of syncretism

by Matthew Baerman, Greville G. Corbett, and Dunstan Brown

Understanding and Measuring Morphological Complexity

by Greville G. Corbett, Dunstan Brown, and Matthew Baerman

This book aims to assess the nature of morphological complexity, and the properties that distingu... more This book aims to assess the nature of morphological complexity, and the properties that distinguish it from the complexity manifested in other components of language. Of the many ways languages have of being complex, perhaps none is as daunting as what can be achieved by inflectional morphology: this volume examines languages such as Archi, which has a 1,000,000-form verb paradigm, and Chinantec, which has over 100 inflection classes. Alongside this complexity, inflection is notable for its variety across languages: one can take two unrelated languages and discover that they share similar syntax or phonology, but one would be hard pressed to find two unrelated languages with the same inflectional systems.

In this volume, senior scholars and junior researchers highlight novel perspectives on conceptualizing morphological complexity, and offer concrete means for measuring, quantifying and analysing it. Examples are drawn from a wide range of languages, including those of North America, New Guinea, Australia, and Asia, alongside a number of European languages. The book will be a valuable resource for all those studying complexity phenomena in morphology, and for theoretical linguists more generally, from graduate level upwards

The expression of gender

for the contributors and the contents please see the downloadable file

Features, Oct 11, 2012

Features are a central concept in linguistic analysis. They are the basic building blocks of ling... more Features are a central concept in linguistic analysis. They are the basic building blocks of linguistic units, such as words. For many linguists they offer the most revealing way to explore the nature of language. Familiar features are Number (singular, plural, dual, …), Person (1st, 2nd, 3rd) and Tense (present, past, …). Features have a major role in contemporary linguistics, from the most abstract theorizing to the most applied computational applications, yet little is firmly established about their status. They are used, but are little discussed and poorly understood. In this unique work, Corbett brings together two lines of research: how features vary between languages and how they work. As a result, the book is of great value to the broad range of perspectives of those who are interested in language.

Gender is a fascinating category, central and pervasive in some languages and totally absent in o... more Gender is a fascinating category, central and pervasive in some languages and totally absent in others. In this new, comprehensive account of gender systems, over 200 languages are discussed, from English and Russian to Archi and Chichewa. Detailed analysis of individual languages provides clear illustrations of specific types of system. Gender distinction is often based on sex; sometimes this is only one criterion and the gender of nouns depends on other factors (thus 'house' is masculine in Russian, feminine in French and neuter in Tamil). Some languages have comparable distinctions such as human/non-human, animate/inanimate, where sex is irrelevant.

Hierarchies, targets and controllers: Agreement patterns in Slavic

Predicate Agreement in Russian

Canonical morphology and syntax

Canonical Morphology and Syntax, Nov 2012

Periphrasis: The role of syntax and morphology in paradigms

Periphrasis straddles the border between two major linguistic components, morphology and syntax. ... more Periphrasis straddles the border between two major linguistic components, morphology and syntax. It describes a situation where a grammatical meaning, such as a tense, which could be expected to be expressed morphologically within a word, is instead expressed by a syntactic phrase. Inclusion of syntactic phrases in morphological paradigms creates analytical and theoretical problems that have yet to be resolved by linguists, who have been hampered by the rather narrow range of data available for consideration and by a lack of adequate theoretical devices. This book addresses the challenge by broadening the range of phenomena under discussion and presenting new theoretical approaches to the problem of periphrasis.

Part I takes four key languages from diverse families - Nakh-Daghestanian, Gunwinyguan (Australian), Uralic and Indo-European - as examples of languages in which periphrasis poses particular problems for current linguistic theories. Part II views periphrasis in different contexts, determining its place within the morphological and syntactic systems of the languages it is found in, its relations to other linguistic phenomena, and the typological variation represented by periphrastic constructions. Treating periphrasis as a morphological and syntactic phenomenon at the same time and applying the criteria worked out within the Canonical Typology approach allows linguists to view periphrasis as a family of phenomena within a typological space of syntactic constructions used to fulfil grammatical functions.

Deponency and Morphological Mismatches

Deponency is a mismatch between form and function in language that was first described for Latin,... more Deponency is a mismatch between form and function in language that was first described for Latin, where there is a group of verbs (the deponents) that are morphologically passive but syntactically active. This is evidence of a larger problem involving the interface between syntax and morphology: inflectional morphology is supposed to specify syntactic function, but sometimes it sends out the wrong signal. Although the problem is as old as the Western linguistic tradition, no generally accepted account of it has yet been given, and it is safe to say that all current theories of language have been constructed as if deponency did not exist. In recent years, however, linguists have begun to confront its theoretical implications, albeit largely in isolation from each other. There is as yet no definitive statement of the problem, nor any generally accepted definition of its nature and scope. This volume brings together the findings of scholars working in the area of morphological mismatches, and represents a typological and theoretical treatment of the topic.

Slovar´ arčinskogo jazyka (arčinsko-russko-anglijskij) [A dictionary of Archi: Archi-Russian-English].

by Greville G. Corbett, Marina Chumakina, and Dunstan Brown

Defective paradigms: missing forms and what they tell us

An important design feature of language is the use of productive patterns in inflection. In Engli... more An important design feature of language is the use of productive patterns in inflection. In English, we have pairs such as 'enjoy' ~ 'enjoyed', 'agree' ~ 'agreed', and many others. On the basis of this productive pattern, if we meet a new verb 'transduce' we know that there will be the form 'transduced'. Even if the pattern is not fully regular, there will be a form available, as in 'understand' ~ 'understood'. Surprisingly, this principle is sometimes violated, a phenomenon known as defectiveness, which means there is a gap in a word's set of forms: for example, given the verb 'forego', many if not most people are unwilling to produce a past tense.

Although such gaps have been known to us since the days of Classical grammarians, they remain poorly understood. Defectiveness contradicts basic assumptions about the way inflectional rules operate, because it seems to require that speakers know that for certain words, not only should one not employ the expected rule, one should not employ any rule at all. This is a serious problem, since it is probably safe to say that all reigning models of grammar were designed as if defectiveness did not exist, and would lose a considerable amount of their elegance if it were properly factored in.

This volume addressed these issues from a number of analytical approaches - historical, statistical and theoretical - and by using studies from a range of languages.

Features : Perspectives on a Key Notion in Linguistics

This book presents a critical overview of current work on linguistic features and establishes ne... more This book presents a critical overview of current work on linguistic
features and establishes new bases for their use in the study and
understanding of language. Features are fundamental components
of linguistic description: they include gender (feminine, masculine,
neuter); number (singular, plural, dual); person (1st, 2nd, 3rd); tense
(present, past, future); and case (nominative, accusative, genitive,
ergative). Despite their ubiquity and centrality in linguistic description, much remains to be discovered about them: there is, for example, no readily available inventory showing which features are found in which of the world's languages; there is no consensus about how they operate across different components of language; and there is no certainty about how they interact. This book seeks both to highlight and to tackle these problems. It brings together perspectives from phonology to formal syntax and semantics, expounding the use of linguistic features in typology, computer applications, and logic. Linguists representing different standpoints spell out clearly the assumptions they bring to different kinds of features and describe how they use them. Their contrasting contributions highlight the areas of difference and the common ground between their perspectives. The book brings together
original work by leading international scholars. It will appeal to linguists of all theoretical persuasions.

Case and grammatical relations: Studies in honor of Bernard Comrie

The syntax-morphology interface: A study of syncretism

by Greville G. Corbett and Dunstan Brown

Syncretism – where a single form serves two or more morphosyntactic functions – is a persistent p... more Syncretism – where a single form serves two or more morphosyntactic functions – is a persistent problem at the syntax–morphology interface. It results from a ‘mismatch’ whereby the syntax of a language makes a particular distinction, but the morphology does not. This pioneering book provides the first full-length study of inflectional syncretism, presenting a typology of its occurrence across a wide range of languages. The implications of syncretism for the syntax–morphology interface have long been recognized: it argues either for an enriched model of feature structure (thereby preserving a direct link between function and form), or for the independence of morphological structure from syntactic structure. This book presents a compelling argument for the autonomy of morphology, and the resulting analysis is illustrated in a series of formal case studies within Network Morphology. It will be welcomed by all linguists interested in the relation between words and the larger units of which they are a part.

Agreement: A Typological Perspective

The Surrey Deponency Databases

by Andrew Hippisley, Matthew Baerman, Dunstan Brown, and Greville G. Corbett

The databases record instances of deponency, which is the term we have adopted to describe mismat... more The databases record instances of deponency, which is the term we have adopted to describe mismatches between morphology and morphosyntax. The prototypical example are the deponent verbs of Latin, which involve a mismatch between passive form and active meaning. That is, a normal Latin verb had active forms such as amō 'I love' and amāvī 'I have loved', which contrasted with the passive forms amor 'I am loved' and amātus sum 'I have been loved' (in this case, with a masculine subject). A deponent verb, on the other hand, looks like the passive but functions like the active, as in mīror 'I admire', mīrātus sum 'I have admired'. In the databases we construe deponency in an extended fashion, covering any mismatch between the apparent morphosyntactic value of a morphological form and its actual value in a given syntactic context.

Mian and Kilivila Collection

by Greville G. Corbett, Matthew Baerman, Dunstan Brown, Timothy Feist, and Sebastian Fedden

Categorization is ubiquitous in human thought. The ability to process the continuous stream o... more Categorization is ubiquitous in human thought. The ability to process the continuous stream of information we are confronted with and turn it into manageable units is crucial for dealing both with the world around us and with our fellow human beings. We do this when we think, and we do this when we communicate. And the way we do this reveals interesting differences between different people, languages and cultures, in that the same real-world entities may be treated very differently. For example, the English speaker differentiates between fingers and toes, while for the Spanish speaker they are all referred to by the same word, dedo.

The grammar of a language can also force us to classify. When we use a pronoun in English we have to choose between ‘he’ for males, ‘she’ for females and ‘it’ for inanimates. This type of categorization runs along the lines of biological sex. In a language with a gender system all nouns are treated as either masculine or feminine — even those nouns whose meanings have nothing to do with biological sex.

Quite a different approach is taken by languages with a classifier system. Here categorization is based on fine-grained meaning, involving shape, function, arrangement, place or time interval. One such language is Kilivila (an Oceanic language spoken on the Trobriand Islands in Papua New Guinea), which has at least 177 distinct classifiers.

Mostly a language will have only one system or the other, gender or classifiers, but in a few interesting cases we find both systems together. A key language for this project is Mian, a Papuan language spoken by 1,700 people in Papua New Guinea. Mian has both a gender system and a system of classifiers in the form of prefixes on verbs of object handling or movement (e.g. give, take, put, lift, throw, fall).

Surrey Morphological Complexity Database

by Greville G. Corbett and Matthew Baerman

This database records examples of what we have identified as morphological complexity, which we d... more This database records examples of what we have identified as morphological complexity, which we define as the morphologically-conditioned deviation between inflectional forms and the inflectional features they realize. This is manifested both within the paradigm (e.g. as syncretism or patterns of stem alternation) and across sets of lexemes (as inflection classes and lexically-conditioned allomorphy).

Surrey Periphrasis Database

by Greville G. Corbett and Dunstan Brown

Periphrasis reveals how the construction of meaning in language is apportioned between morphology... more Periphrasis reveals how the construction of meaning in language is apportioned between morphology ('bright' and 'brighter') and syntax ('intelligent' and 'more intelligent'). The Surrey Periphrasis database systematically catalogues data from a sample of 19 languages in a fully structured way to help explore the role of periphrasis in inflectional paradigms.

Surrey Defectiveness Database

by Greville G. Corbett, Matthew Baerman, and Dunstan Brown

The term 'defectiveness' refers to gaps in inflectional paradigms — specifically, gaps which do n... more The term 'defectiveness' refers to gaps in inflectional paradigms — specifically, gaps which do not appear to follow from natural restrictions imposed by meaning or function. The Typological Database on Defectiveness illustrates different types of defective paradigm according to various morphological and morphosyntactic parameters. The Cross-linguistic Database on Defectiveness looks at the prevalence of inflectional defectiveness in a controlled sample of genetically and geographically diverse languages.

Surrey Short Term Morphosyntactic Change Databases

by Greville G. Corbett, Alexander Krasovitsky, Dunstan Brown, and Matthew Baerman

The notion of 'short term morphosyntactic change' can be used to characterise changes in the use ... more The notion of 'short term morphosyntactic change' can be used to characterise changes in the use of forms in a short period of time even when the forms themselves have changed relatively little. The Short Term Morphosyntactic Change (STMC) Databases explore change in six different morphosyntactic phenomena in Russian over a 200 year period from 1801-2000.

Surrey Turning Owners into Actors Database

A fundamental communicative task for all languages is to show which participant in a sentence is ... more A fundamental communicative task for all languages is to show which participant in a sentence is the subject. Languages have various ways of identifying the subject, including word-order, agreement, and case-marking. However, there is another unique and strange method, almost entirely unknown until now, found only in Northwest-Solomonic (NWS), a group of Oceanic languages of the Solomon Islands and Bougainville. In some constructions, these languages indicate subject using word-forms normally indicating possessors of nouns. This use of possessive morphology to mark subjects is theoretically highly significant. To define language fully we must understand the limits on subject-marking. This almost unresearched phenomenon is crucial to our understanding of the fundamental issue of how subjects can be marked.

This database documents this almost unresearched phenomenon: how it works, how it varies, what it does, and where it comes from. Since the key languages investigated are highly endangered, the project was timely, and as a by-product, resulted in some partial primary documentation work. In most Oceanic languages of the North West Solomonic subgroup (spoken in Bougainville and the western Solomon Islands), some use is made of apparent possessive morphology to index subject and encode aspect on verbs.

Archi: A dictionary of the languages of the Archi villages, south Daghestan

by Greville G. Corbett and Dunstan Brown

Archi is a Lezgic language spoken by about 1200 people in the highlands of Daghestan. The online ... more Archi is a Lezgic language spoken by about 1200 people in the highlands of Daghestan. The online version of the Archi-Russian-English Dictionary contains sounds files, digital pictures of culturally significant objects, idioms and example sentences with interlinear glossing. It can be searched in English, Russian and Archi (using Cyrillic or IPA).

Surrey Deponency Databases (Cross-linguistic database and typological database).

by Greville G. Corbett, Matthew Baerman, and Dunstan Brown

Deponency describes mismatches between morphology and morphosyntax. A mismatch occurs where the w... more Deponency describes mismatches between morphology and morphosyntax. A mismatch occurs where the word form is used in some function incompatible with its normal function. The Typological Database on Deponency records the logical space of deponency: What features may be affected, and what are the characteristics of the resulting paradigm? The Cross-linguistic Database on Deponency looks at the presence of morphological mismatches in a controlled sample of genetically and geographically diverse languages.

Resources for suppletion: A typological database and a bibliography

On-line Proceedings of the Fourth Mediterranean Morphology Meeting (MMM4), Catania, Sicilia, 21-23 September 2003, 2005

Surrey Suppletion Database

Suppletion is a morphological phenomenon where different inflectional forms of the same sign are ... more Suppletion is a morphological phenomenon where different inflectional forms of the same sign are maximally regular in their semantics, yet maximally irregular in form. For a sample of 34 languages, the Surrey Suppletion Database encode phonologically distinct stems that belong to the same paradigm and defines the categories along which the suppletion happens.

Surrey Database of Agreement

by Greville G. Corbett and Dunstan Brown

Agreement is the expression of grammatical information in the ‘wrong place’: a relation that can ... more Agreement is the expression of grammatical information in the ‘wrong place’: a relation that can be described in terms of controllers, targets, domains, categories and conditions. The Surrey Database of Agreement encodes information on agreement in fifteen genetically diverse languages and contains reports for the sample languages, providing pointers to examples illustrating different instances of the phenomenon.

Agreement: A bibliography

This bibliography was produced by Carole Tiberius, Greville Corbett and Julia Barron as part of a... more This bibliography was produced by Carole Tiberius, Greville Corbett and Julia Barron as part of an ESRC project 'Agreement: An investigation into the distribution of information' (Grant number: R000238228). This support is gratefully acknowledged.

The bibliography comprises collections and special issues devoted to agreement (section A), monograph-length studies of agreement , mainly studies of agreement in particular languages (section B), articles and book chapters devoted to agreement (section C). There is a good deal of material on agreement in the Slavonic languages which is given separately (section D).

This bibliography does not in general include works which may refer to agreement morphology in connection with language acquisition, language reconstruction or sign language.

Inflectional Syncretism and Corpora

by Greville G. Corbett and Dunstan Brown

Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora (LINC-04), 2004

This paper describes a novel undertaking: comparing the relationship between grammatical ambigu... more This paper describes a novel undertaking:
comparing the relationship between
grammatical ambiguity (syncretism) in nouns,
as represented in a default inheritance
hierarchy, with textual frequency distributions.
In order to do this we consider a language with
a reasonable number of grammatical
distinctions and where syncretism occurs in
different morphological classes. We
investigated this relationship for Russian
nouns. Our results suggest that there is an
intricate relationship between textual
frequency and inflectional syncretism.

Surrey Syncretisms Database

by Greville G. Corbett, Matthew Baerman, and Dunstan Brown

The term 'syncretism' refers to the phenomenon whereby a single form fulfils two or more differen... more The term 'syncretism' refers to the phenomenon whereby a single form fulfils two or more different functions within the inflectional morphology of a language. The Surrey Syncretism Database encodes information on inflectional syncretism in 30 genetically and geographically diverse languages, across morphosyntactic features such as case, person, number and gender.

Russian Lemmatisation with DATR

In this paper, we describe an approach to lemmatisation for Russian nouns, which makes use of a l... more In this paper, we describe an approach to lemmatisation for Russian nouns, which makes use of a large-scale inheritance lexicon implemented in the lexical representation language DATR (Evans and Gazdar 1996). The lexicon was compiled semi-automatically from Zaliznjak's morphological dictionary (Zaliznjak 1977, Ilola and Mustajoki 1989) and automatically generates fully inflected forms together with their associated morphosyntax for around 40,000 Russian nouns. From this resource, we have automatically extracted wordform recognition rules and compiled them into a lemmatiser which hypothesises possible citation form and morphosyntactic features for nominal wordforms. We describe the construction of the lemmatiser and the results of our initial evaluation of its accuracy.

A DATR theory of Russian morphology: dataset

The dog didn’t bark, the noun didn’t inflect: a typology of significant absences

Workshop on uninflectedness (DGfS Cologne), 2023

The dog didn’t bark, the noun didn’t inflect: a typology of significant absences Uninflectedness... more The dog didn’t bark, the noun didn’t inflect: a typology of significant absences

Uninflectedness is nameworthy because the phenomenon is unexpected and significant. We should ask, then, why we expect inflectedness, and why its lack is significant. This leads us to distinguish it from related phenomena, including syncretism and defectiveness. And while full uninflectedness has a history of discussion, we should not treat it as an absolute: rather, there is an interesting canonical scale from fully inflected to uninflected. Items may be uninflected for a part of their paradigm (thus Polish muzeum ‘museum’ is uninflected in the singular only, an unusual type of heteroclisis), while some Macedonian adjectives show a featural split, being uninflected for gender though inflected for number. And when items move towards being inflected, the change may affect particular specific uninflected cells of the paradigm.
Since uninflectedness is an unexpected phenomenon within inflectional morphology, we might assume it would have no consequences outside inflection. And indeed, derivation may remain unaffected. Thus Upper Sorbian abbé ‘priest’ does not inflect, but it derives the possessive abbéowy ‘priest’s’. In syntax, however, while uninflected items often fit smoothly into their expected syntactic slot, this is not always the case. Wechsler & Zlatić (2013: 115-169) argue that uninflected nouns in Serbo-Croat are restricted in the contexts in which they can occur. They cannot occur in a nominal phrase assigned dative or instrumental, unless the case value is morphologically realized by some other element in the phrase.
Thus uninflectedness varies along a range of criteria, to be carefully defined. These criteria will be exemplified from two main sources. First, Slavonic languages, since these show dramatic variation, and have attracted considerable interest. The second main source will be Dagestanian languages, since these can have substantial numbers (even majorities) of uninflecting items, within parts of speech which can inflect. Given this, we need to refine our definitions.
Uninflected items are indeed surprising. They are also more varied than most accounts allow for, and it is only when we map out the typological possibilities that we can appreciate their significance.
References: • Wechsler, Stephen & Zlatić, Larisa. 2003. The Many Faces of Agreement. Stanford: CSLI.Wechsler & Zlatić. 2013.

Prominent possessors

by Irina Nikolaeva and Greville G. Corbett

Philological Society meeting

Variation in agreement: Formal and functional explanations, and the importance of measuring.

Invited presentation at plenary round table “Formal and functional approaches to grammatical variation”, 55th Annual meeting of the Societas Linguistica Europaea (SLE 2022), Bucharest, 24–27 August 2022.

I offer two general suggestions: we should be more ambitious in what we try to explain; and we sh... more I offer two general suggestions: we should be more ambitious in what we try to explain; and we should devote more effort to careful measuring of the linguistic data, when we evaluate claimed explanations. Moreover, while explaining variation is a valid goal, I note that the shape of variation in one phenomenon, P, can often be used to test proposed explanations for another phenomenon, Q. Illustrating these points is made difficult by the fact that formalists and functionalists tend to tackle different issues. One area where interests overlap is agreement. Hence I consider three types of variation in agreement: (i) the dramatic differences between languages according to how pervasive agreement is; (ii) the extreme variation of agreement targets (probes), including within single parts of speech (syntactic categories); (iii) fine-grained variation among alternative agreements within languages and within constructions.
Given (i), the variation in how pervasive agreement is (phenomenon P), we seek to explain why languages can have agreement at all (phenomenon Q). A suggested explanation for Q is that agreement facilitates reference tracking; to evaluate this explanation, the variation within agreement targets (ii), that is, phenomenon P, proves a useful way to measure and thus test the proposed explanation for Q. The suggested explanation is found wanting (Nichols 2018, Fedden in press, Feist 2020).
Turning to explanations of variation itself, fine-grained variation in agreement (iii) has been tackled by formalists from differing perspectives (some relying on different syntactic structures, e.g. Polinsky 2016, others giving morphosyntactic features an important role, e.g. Borsley 2016, Landau 2016, Sadler 2016). Much of this variation involves Agreement Hierarchy effects (Corbett in press); here Wechsler & Zlatić (2003, and subsequent papers) make a noteworthy contribution, and see more recently Shen (2019), Smith (2021) among many others. We need explanations for (i) why alternatives are possible, and (ii) the factors which favour one or other outcome. This is an area where there are rich data to measure, and canonical approaches play an important role in specifying valid baselines and criteria (Round & Corbett 2020).
Ambitious explanation should be based on solid evidence, the result of careful measuring. The existence of agreement, and the variation we find, are deeply puzzling. But this is an environment where linguists of different persuasions can contribute, where careful measurement is being undertaken, and where variation can be used as tool of explanation as well as a problem which itself is to be explained.

References
Bond, Oliver, Greville G. Corbett, Marina Chumakina & Dunstan Brown (2016), Archi: Complexities of agreement in cross-theoretical perspective. Oxford: Oxford University Press.
Borsley, Robert D. (2016), HPSG and the nature of agreement in Archi. In Bond et al., 118-149.
Corbett, Greville G. (In press), The Agreement Hierarchy revisited: the typology of controllers. To appear in Word Structure (special issue The many facets of agreement, ed. by Tania Paciaroni, Alice Idone and Michele Loporcaro).
Fedden, Sebastian (In press), Agreement and argument realization in Mian discourse. To appear in Word Structure (special issue The many facets of agreement, ed. by Tania Paciaroni, Alice Idone and Michele Loporcaro).
Feist, Timothy (2020), Nominal classification: Does it play a role in referent disambiguation? Studies in Language 44.199–239.
Landau, Idan (2016), DP-internal semantic agreement: A configurational analysis. Natural Language and Linguistic Theory 34. 975-1020. doi:10.1007/s11049-015-9319-3.
Nichols, Johanna (2018), Agreement with overt and null arguments in Ingush. Linguistics 56.845-863.
Polinsky, Maria (2016), Agreement in Archi from a minimalist perspective. In Bond et al., 184-232.
Round, Erich R. & Greville G. Corbett (2020), Comparability and measurement in typological science: the bright future for linguistics. Linguistic Typology 24.489-525.
Sadler, Louisa, (2016), Agreement in Archi: An LFG perspective. In Bond et al., 150-183.
Shen, Zheng (2019), The multi-valuation Agreement Hierarchy. Glossa: A Journal of General Linguistics 4(1), 46. doi: 10.5334/gjgl.585.
Smith, Peter W. (2021), Morphology-semantics mismatches and the nature of grammatical features. Berlin: De Gruyter Mouton. doi: 10.1515/9781501511127
Wechsler, Stephen & Larisa Zlatić (2003), The many faces of agreement. Stanford: CSLI.

SLIDES Fedden&Guzman Naranjo&Corbett SLE2021 Typology meets data mining German gender

by Sebastian Fedden and Greville G. Corbett

Typology meets data-mining: the German gender system, 2021

PLEASE CITE AS: Fedden, Sebastian, Matías Guzmán Naranjo & Greville G. Corbett. 2021. Typology me... more

LINK to video of: Sebastian Fedden, Matías Guzmán Naranjo and Greville Corbett. Typology meets data-mining: the German gender system. Video of plenary talk at Societas Linguistica Europaea, Athens (online

Typology meets data-mining: the German gender system, 2021

Typology meets data-mining: the German gender system Sebastian Fedden, Matías Guzmán Naranjo & Gr... more Typology meets data-mining: the German gender system
Sebastian Fedden, Matías Guzmán Naranjo & Greville G. Corbett

Keywords: assignment rules, data mining, gender, German, morphosyntax, typology

In recent years linguistic typology has increasingly profited from computational methods; the hope is to discover patterns in large data sets more quickly and more accurately than would be possible for a human researcher. This is commonly known as ‘data mining’. A linguistic system which could benefit from such an approach is German gender.

The German gender system is a gem among the assignment systems found in the world, for the complexity of its interacting semantic, morphological and phonological assignment principles. As fast as it offers partial results it raises new questions. This is the more remarkable since there are just three gender values: masculine, feminine, and neuter. Furthermore, the basic semantic assignment rules are relatively straightforward. Much more challenging are (i) phonological assignment (investigated by Köpcke 1982, Köpcke & Zubin 1983, among others), and (ii) the relation between gender and inflection class (see Pavlov 1995, Bittner 1999, and Kürschner & Nübling 2011). And yet, despite the progress which has been made, and the great typological interest of German gender, no attempt has been made to analyse the system as a whole.

In a system as complex as German there are at least three pitfalls:
1. cherry picking: observations of alleged regularity are sometimes based on few examples and the overall applicability of these regularities is left unexplored;
2. generalizations without a baseline: thus a prediction of a particular gender value for, say, 35% of the nouns is hardly remarkable if 35% of the nouns overall are of that gender;
3. not allowing for overlapping factors: given that phonological, morphological and semantic properties may make the same gender value more probable, making a claim for a particular generalization (e.g. phonological) requires us also to eliminate the possible effects of morphology and semantics.

To avoid these pitfalls and make progress towards a holistic analysis of the German gender system, we mine a database of more than 30,000 German nouns from WebCELEX (Baayen et al. 1995), coded for gender, frequency, phonological shape, inflection class, and derived/compounded status, which we have cleaned and to which we added semantic information (human, animal, object, abstract, mass) and frequency (based on the COW corpus, Schäfer 2015). We then built a series of analogical models using machine learning algorithms (similar to Guzmán Naranjo 2020), including different combinations of predictors (morphology, semantics, phonology, inflection class).
The overall accuracy results show clearly that the system is anything but arbitrary. The combined factors reach a predictive success of over 96%. Individual
factors are also strong predictors, most notably phonological shape and inflection class. The German gender assignment system – while complex and unusual – represents a typologically well-known type: a combination of semantic and formal (morphological/phonological) assignment principles (Corbett 1991). Our conclusions relate to German gender, but we also make a larger point by showing how typologists can benefit from data mining. And we hope to reduce the ill-informed comments still made about German gender, sometimes even by linguists.

Acknowledgments: This work was partly funded by the grant “Optimal categorisation: the origin and nature of gender from a psycholinguistic perspective” (ESRC UK Grant RN0362A) and the public grant overseen by the French National Research Agency (ANR) as part of the “Investissements d’Avenir” program (reference: ANR10LABX 0083).

References
Baayen, R. Harald, Richard Piepenbrock & Leon Gulikers (1995), The CELEX Lexical Database (CD-ROM), Linguistic Data Consortium, University of Pennsylvania, Philadelphia.
Bittner, Dagmar (1999), Gender classification and the inflectional system of German nouns, in Barbara Unterbeck (ed), (1999), Gender in Grammar and Cognition, Part 1: Approaches to Gender, Berlin: Mouton de Gruyter, 1–23.
Corbett, Greville G. (1991), Gender, Cambridge: Cambridge University Press.
Guzmán Naranjo, Matías (2020), Analogy, complexity and predictability in the Russian nominal inflection system, Morphology 30(3), 219–262.
Köpcke, Klaus-Michael (1982), Untersuchungen zum Genussystem der deutschen Gegenwartssprache, Tübingen: Niemeyer.
Köpcke, Klaus-Michael & David A. Zubin (1983), Die kognitive Organisation der Genuszuweisung zu den einsilbigen Nomen der deutschen Gegenwartssprache, Zeitschrift für germanistische Linguistik 11, 166–182.
Kürschner, Sebastian & Damaris Nübling (2011), The interaction of gender and declension in Germanic languages, Folia Linguistica 45(2), 355–388.
Pavlov, Vladimir (1995), Die Deklination der deutschen Substantive. Synchronie und Diachronie, Frankfurt a. M.: Peter Lang.
Schäfer, Roland (2015), Processing and Querying Large Web Corpora with the COW14 Architecture, in Proceedings of Challenges in the Management of Large Corpora (CMLC-3), 28–34.

Gender: new horizons [Greville G. Corbett, Sebastian Fedden, Michael Franjieh, Alexandra Grandison & Erich Round]

by Greville G. Corbett, Alexandra Grandison, and Erich Round

Invited talk in the series Abralin ao Vivo – Linguists Online, 2020

Gender systems are endlessly fascinating, from those where meaning determines gender (Bagvalal), ... more Gender systems are endlessly fascinating, from those where meaning determines gender (Bagvalal), those where it is dominant but leaves intriguing loopholes (Mian) to those where form has an important role (Russian). Now it is time to integrate these systems into a fuller typology of nominal classification, taking in classifier systems as well as gender. Rethinking in this way leads us to take apart characteristics usually lumped together as defining gender, and those defining traditional classifiers. We then see that these characteristics combine in many ways. A canonical perspective proves helpful: we define the notion of canonical gender, and use this idealization as a baseline from which to calibrate the rich variety we find. It is then possible to approach the origin and nature of gender. Here Oceanic languages can provide a unique insight. Typically, a noun can occur with different possessive classifiers, depending on how the possessed item is used by the possessor: ‘my water (to drink)’ vs ‘my water (for something else)’. But in marked contrast, languages like North Ambrym (Vanuatu) typically have particular nouns occurring with one given classifier: water is just drinkable (Franjieh 2016). North Ambrym’s innovative system resembles a gender system: a noun takes a particular classifier regardless. We want to establish empirically whether gender systems can indeed emerge from possessive classifiers in this way. We must also uncover how and why languages would relinquish a useful, meaningful classificatory system, and move to a more rigid gender system. To this end we are running novel experiments to compare possessive classifier systems in six Oceanic languages of Vanuatu and New Caledonia, each with a different inventory size of classifiers — from two to twenty-three. Combining typology with psycholinguistics in this way promises to shed new light on how systems of nominal classification develop and function. The Oceanic data suggest that, in this instance, we find an interesting parallelism: diachronic change is running in the direction of canonicity.

Comparability in typology

Fahrenheit was the first to produce two thermometers which, under identical conditions, produced ... more Fahrenheit was the first to produce two thermometers which, under identical conditions, produced identical readings. In linguistics, given the same phenomenon, we need to produce the same readings more often. The prerequisite for comparison is measurement, and if we take measurement more seriously, we can hope for the virtuous circle of more accurate measurement, which leads to more insightful accounts, which require more accurate measurement, which leads … Of course, a thermometer targets temperature only. If we refine our linguistic scales (criteria), each targeting one phenomenon, we can tease apart combinations of factors (see, for instance, Nikolaeva 2013 on finiteness). This matters since the clustering we observe may be accidental or significant, a point stressed in both Canonical Typology and Multivariate Typology. Given a criterion, we push it to the extreme (as Kelvin did for temperature) and ask whether instances of the extreme actually occur or not (either result is of interest).
As an example, if we pull apart the possible criteria for reported speech, we find a wide range of possibilities, as Evans (2013) shows. Interestingly, natural languages vary considerably, but each avails itself of few of the possibilities. A large part of the theoretical space is not occupied. Conversely, the extreme instance of inflection classes, as defined by the combination of the extreme values of the canonical criteria, would appear to be highly unlikely on functional grounds. And yet, it is indeed found, in Burmeso (Donohue 2001, discussed in Corbett 2009).
Once we measure carefully, we are no longer confined to labelling label items as just “hot” or “cold” (as if the world were that simple); rather we can explore their finer-grained nature. Variability and empirical uncertainty become easier to characterize, and can now be incorporated into our analyses, rather than being factored out of them (see Round forthcoming for helpful discussion). As we measure more carefully, additional tools become available, and we can enter mainstream (social) science (Bickel 2015). We do not need exceptional devices for linguistics. Rather we measure the variability of the length of vowels, or the range of the genitive case value, using carefully defined criteria (Corbett 2012); we do so similarly when comparing the idiolects of two Russian speakers, when comparing older/younger speakers, Russian speakers with Polish speakers, and Polish speakers with Archi or Tamil speakers.
It is hard to imagine science without consistent measurement. We should follow Fahrenheit’s lead.

References
Bickel, Balthasar. 2015. Distributional typology: statistical inquiries into the dynamics of linguistic diversity. In: Bernd Heine & Heiko Narrog (eds) The Oxford Handbook of Linguistic Analysis, 2nd edition, 901-923. Oxford: Oxford University Press.
Corbett, Greville G. 2009. Canonical inflectional classes. In: Fabio Montermini, Gilles Boyé and Jesse Tseng (eds) Selected Proceedings of the 6th Décembrettes: Morphology in Bordeaux, 1-11. Somerville, MA: Cascadilla Proceedings Project. Available at: http://www.lingref.com/cpp/decemb/6/abstract2231.html.
Corbett, Greville G. 2012. Features. Cambridge: Cambridge University Press.
Donohue, Mark. 2001. Animacy, class and gender in Burmeso. In: Andrew Pawley, Malcolm Ross & Darrell Tryon (eds) The Boy from Bundaberg: Studies in Melanesian Linguistics in Honour of Tom Dutton (Pacific Linguistics 514), 97–115. Canberra: Pacific Linguistics.
Evans, Nicholas. 2013. Some problems in the typology of quotation: a canonical approach. In: Dunstan Brown, Marina Chumakina & Greville G. Corbett (eds) Canonical Morphology and Syntax, 66-98. Oxford: Oxford University Press.
Nikolaeva, Irina. 2013. Unpacking finiteness. In: Dunstan Brown, Marina Chumakina & Greville G. Corbett (eds) Canonical Morphology and Syntax, 99-122. Oxford: Oxford University Press.
Round, Erich R. forthcoming. Review of Matthew K. Gordon, Phonological Typology, Oxford University Press 2016. To appear in Folia Linguistica.

Canonical Typology: a prerequisite to substantive interoperability

by Greville G. Corbett and Dunstan Brown

A new approach to gender and classifier systems: Evidence from Austronesian and Papuan languages

Lexical splits: their surprising typology

LEXICAL SPLITS: THEIR SURPRISING TYPOLOGY Greville G. Corbett Surrey Morphology Group University... more LEXICAL SPLITS: THEIR SURPRISING TYPOLOGY

Greville G. Corbett
Surrey Morphology Group
University of Surrey

In trying to understand natural language, we need to consider what is a ‘possible word’ (lexeme). We find simple lexemes that are internally homogeneous and externally consistent. On the other hand, there are others with splits in their internal structure and inconsistencies in their external behaviour. I first explore the characteristics of the most straightforward lexemes, in order to establish a point in the theoretical space from which we can calibrate the real examples we find. I then schematize the interesting phenomena which deviate from this idealization: these deviations include suppletion, syncretism, deponency and defectiveness. Next I analyse the different ways in which lexemes are split into two or more segments by such phenomena. I set out a typology of possible splits, along four dimensions: (i) splits based on the composition/feature signature of the paradigm versus those based solely on morphological form; (ii) motivated (following a boundary motivated from outside the paradigm) versus purely morphology-internal (‘morphomic’); (iii) regular (extending across the lexicon) versus irregular (lexically specified); (iv) externally relevant versus irrelevant. I identify instances of these four dimensions separately: they are orthogonal to each other. Their interaction gives a substantial typology, and it proves to be surprisingly complete: the possibilities specified are all attested. The typology also allows for the unexpected patterns of behaviour to overlap in particular lexemes, producing some remarkable examples. Such examples show that the notion ‘possible word’ is more challenging than has generally been realized.

Two systems or one? A canonical typology approach

by Greville G. Corbett and Sebastian Fedden

Shape conditions in Turgenev

Zero morphology

Paradigm conventions

Until recently the glossing of examples even in the top journals could be characterized, politely... more Until recently the glossing of examples even in the top journals could be characterized, politely, as chaotic. That situation is being improved, by linguists’ collective conscience and the availability of the Leipzig Glossing Rules (adopted by SLE). We can now consider similarly how we represent the forms of lexemes. For some linguists, this reflects key issues in inflectional morphology; others treat paradigms as epiphenomena, but it is still important to know what can and cannot be inferred from their choices of representation. The need for greater clarity arises because others, such as psycholinguists, are increasingly interested in paradigms, and we risk misleading them by our unstated conventions. And within morphology, recent entropy based and principal part based approaches start from paradigms, implicitly or explicitly, and evaluating their conclusions depends on our understanding the starting point.
The following conventions, beginning with the more superficial and progressing to those with greater analytical significance, all deserve discussion:
• we conventionally represent different features by different dimensions (as in a case X number layout rather than a simple list of forms); this is difficult when we need more than two dimensions
• portrait view is favoured over landscape view, leading to specific choices of paradigm layout: for instance, person values in rows and number values in columns
• we follow traditional ordering of feature values (absolutive before elative)
• we split or combine cells according to unspoken conventions about majority distributions within and across lexemes; for instance, Russian nouns are represented with six case values, though an additional four values occur in different combinations on subsets of nouns
• we “know” that some conditions on paradigms belong in the representation while others are textual notes; for example, mass nouns have no separate paradigm, rather we state somewhere that the plural is available only for nouns of particular semantic types
• we appreciate elegance (witness the original minimal-augmented analyses of number systems)
• we (at least some of us) believe that syntax is morphology-free; hence we include non-autonomous values (such as the Romanian neuter gender) in paradigms. This approach avoids invoking strange rules of agreement or government, but requires additional paradigm cells which are systematically syncretic
• we represent morphosyntactic patterns rather than morphomic patterns
All of these conventions individually have merit. Good practice requires us to be fully explicit about our use of them and our departures from them, particularly where there come into conflict.
Conclusions:
• the substance matters more than the representation; conventions should help make clear what the analyst intends, so that the reader is able to agree or disagree with the actual intention
• the representation has enormous potential: it can clarify our understanding of our material (so that claims are made with full awareness) or it can mislead the unwary
• we are cleaning up our act with regard to morphosyntactic glossing. It is time to begin being more explicit about how we represent the forms of lexemes. Our largely unspoken conventions form a good basis.

Morphomic splits

Lexemes may have an internally consistent paradigm, or the paradigm may be split into segments. S... more Lexemes may have an internally consistent paradigm, or the paradigm may be split
into segments. Splits may be ‘motivated’, that is they may correspond to morphosemantic, morphosyntactic, or phonological specifications. Alternatively the split may lack such motivation, in which case we have a morphomic split, one which arguably increases the complexity of the system with no obvious corresponding return. We shall focus on the difference between these two types, so that we can recognise
morphomic splits. There are some properties which the two types of split share: for instance, both motivated and morphomic splits can be viewed in terms of Wurzel’s Paradigm Structure Conditions, that is, there can be predictive relations within the segments; and both types can persist over long periods of time.2 But they
are also interestingly different, which makes drawing the distinction valuable. It bears on the important notion that syntax is morphology-free. Our main question, then, is ‘How do morphomic splits differ from motivated splits?’

Uncovering variation in classifier assignment in Oceanic

by Michael Franjieh and Greville G. Corbett

ExLing 2021: Proceedings of 12th International Conference of Experimental Linguistics, 2021

We discuss the results of a video vignettes experiment that uncovers the variation of noun-classi... more We discuss the results of a video vignettes experiment that uncovers the variation of noun-classifier assignment in the possessive classifier system of six Oceanic languages. The results show that languages vary in their noun-classifier assignment, with some languages displaying relatively fixed assignment, similar to a grammatical gender system.

OPTIMAL CATEGORISATION: THE NATURE OF NOMINAL CLASSIFICATION SYSTEMS

by Michael Franjieh and Greville G. Corbett

Cadernos de Linguística, 2021

The debate as to whether language influences cognition has been long standing but has yielded con... more The debate as to whether language influences cognition has been long standing but has yielded conflicting findings across domains such as colour and kinship categories. Fewer studies have investigated systems such as nominal classification (gender, classifiers) across different languages to examine the effects of linguistic categorisation on cognition. Effective categorisation needs to be informative to maximise communicative efficiency but also simple to minimise cognitive load. It therefore seems plausible to suggest that different systems of nominal classification have implications for the way speakers conceptualise relevant entities. A suite of seven experiments was designed to test this; here we focus on our card sorting experiment, which contains two subtasks — a free sort and a structured sort. Participants were 119 adults across six Oceanic languages from Vanuatu and New Caledonia, with classifier inventories ranging from two to 23. The results of the card sorting experiment reveal that classifiers appear to provide structure for cognition in tasks where they are explicit and salient. The free sort task did not incite categorisation through classifiers, arguably as it required subjective judgement, rather than explicit instruction. This was evident from our quantitative and qualitative analyses. Furthermore, the languages employing more extreme categorisation systems displayed smaller variation in comparison to more moderate systems. Thus,
systems that are more informative or more rigid appear to be more efficient. The study implies that the influence of language on cognition may vary across languages, and that not all nominal classification systems employ this optimal trade-off between simplicity and informativeness. These novel data provide a new perspective on the origin and nature of nominal classification.