A phraseme, also called a set phrase, fixed expression, idiomatic phrase, multiword expression (in computational linguistics), or idiom, [1] [2] [3] [ citation needed ] is a multi-word or multi-morphemic utterance whose components include at least one that is selectionally constrained[ clarification needed ] or restricted by linguistic convention such that it is not freely chosen. [4] In the most extreme cases, there are expressions such as X kicks the bucket ≈ ‘person X dies of natural causes, the speaker being flippant about X’s demise’ where the unit is selected as a whole to express a meaning that bears little or no relation to the meanings of its parts. All of the words in this expression are chosen restrictedly, as part of a chunk. At the other extreme, there are collocations such as stark naked, hearty laugh, or infinite patience where one of the words is chosen freely (naked, laugh, and patience, respectively) based on the meaning the speaker wishes to express while the choice of the other (intensifying) word (stark, hearty, infinite) is constrained by the conventions of the English language (hence, *hearty naked, *infinite laugh, *stark patience). Both kinds of expression are phrasemes, and can be contrasted with ’’free phrases’’, expressions where all of the members (barring grammatical elements whose choice is forced by the morphosyntax of the language) are chosen freely, based exclusively on their meaning and the message that the speaker wishes to communicate.
Phrasemes can be broken down into groups based on their compositionality (whether or not the meaning they express is the sum of the meaning of their parts) and the type of selectional restrictions that are placed on their non-freely chosen members. [5] [ page needed ] Non-compositional phrasemes are what are commonly known as idioms, while compositional phrasemes can be further divided into collocations, clichés, and pragmatemes.
A phraseme is an idiom if its meaning is not the predictable sum of the meanings of its component—that is, if it is non-compositional. Generally speaking, idioms will not be intelligible to people hearing them for the first time without having learned them. Consider the following examples (an idiom is indicated by elevated half-brackets: ˹ … ˺):
In none of these cases are the meanings of any of the component parts of the idiom included in the meaning of the expression as a whole.
An idiom can be further characterized by its transparency, the degree to which its meaning includes the meanings of its components. Three types of idioms can be distinguished in this way—full idioms, semi-idioms, and quasi-idioms. [6]
An idiom AB (that is, composed of the elements A ‘A’ and B ‘B’) is a full idiom if its meaning does not include the meaning of any of its lexical components: ‘AB’ ⊅ ‘A’ and ‘AB’ ⊅ ‘B’.
An idiom AB is a semi-idiom if its meaning
- ˹private eye (I)˺
- ‘private investigator’
- ˹sea anemone˺
- ‘predatory polyp dwelling in the sea’
- Rus. ˹mozolit´ glaza˺
- ‘be in Y's sight too often or for too long ’ (lit. ‘make corns on Y’s eyes’)
The semantic pivot of an idiom is, roughly speaking, the part of the meaning that defines what sort of referent the idiom has (person, place, thing, event, etc.) and is shown in the examples in italic. More precisely, the semantic pivot is defined, for an expression AB meaning ‘S’, as that part ‘S1’ of AB’s meaning ‘S’, such that ‘S’ [= ‘S1’ ⊕ ‘S2’] can be represented as a predicate ‘S2’ bearing on ‘S1’—i.e., ‘S’ = ‘S2’(‘S1’) (Mel’čuk 2006: 277). [7]
An idiom AB is a quasi-idiom, or weak idiom if its meaning
- Fr. ˹donner le sein à Y˺
- ‘feed the baby Y by putting one teat into the mouth of Y’
- ˹start a family˺
- ‘conceive a first child with one’s spouse, starting a family’
- ˹barbed wire˺
- ‘[artifact designed to make obstacles with and constituted by] wire with barbs [fixed on it in small regular intervals]’
A phraseme AB is said to be compositional if the meaning ‘AB’ = ‘A’ ⊕ ‘B’ and the form/AB/ = /A/ ⊕ /B/ (“⊕” here means ‘combined in accordance with the rules of the language’). Compositional phrasemes are generally broken down into two groups—collocations and clichés .
A collocation is generally said to consist of a base (shown in Small caps ), a lexical unit chosen freely by the speaker, and of a collocate, a lexical unit chosen as a function of the base. [8] [9] [10]
In American English, you make a decision, and in British English, you can also take it. For the same thing, French says prendre [= ‘take’] une décision, German—eine Entscheidung treffen/fällen [= ‘meet/fell’], Russian—prinjat´ [= ‘accept’] rešenie, Turkish—karar vermek [= ‘give’], Polish—podjąć [= ‘take up’] decyzję, Serbian—doneti [= ‘bring’] odluku, Korean—gyeoljeongeul hada 〈naerida〉 [= ‘do 〈take/put down〉’], and Swedish—fatta [= ‘grab’]. This clearly shows that boldfaced verbs are selected as a function of the noun meaning ‘decision’. If instead of DÉCISION a French speaker uses CHOIX ‘choice’ (Jean a pris la décision de rester ‘Jean has taken the decision to stay’ ≅ Jean a … le choix de rester ‘Jean has ... the choice to stay’), he has to say FAIRE ‘make’ rather than PRENDRE ‘take’: Jean a fait 〈*a pris〉 le choix de rester ‘Jean has made the choice to stay’.
A collocation is semantically compositional since its meaning is divisible into two parts such that the first one corresponds to the base and the second to the collocate. This is not to say that a collocate, when used outside the collocation, must have the meaning it expresses within the collocation. For instance, in the collocation sit for an exam ‘undergo an exam’, the verb SIT expresses the meaning ‘undergo’; but in an English dictionary, the verb SIT does not appear with this meaning: ‘undergo’ is not its inherent meaning, but rather is a context-imposed meaning.
Generally, a cliché is said to be a phraseme consisting of components of which none are selected freely and whose usage restrictions are imposed by conventional linguistic usage, as in the following examples:
Clichés are compositional in the sense that their meaning is more or less the sum of the meanings of their parts (not, for example, in no matter what), and clichés (unlike idioms) would be completely intelligible to someone hearing them for the first time without having learned the expression beforehand. They are not completely free expressions, however, because they are the conventionalized means of expressing the desired meanings in the language.
For example, in English one asks What is your name? and answers My name is [N] or I am [N], but to do the same in Spanish one asks ¿Cómo se llama? (lit. ‘How are you called?’) and one answers Me llamo [N] (‘I am called [N]’). The literal renderings of the English expressions are ¿Cómo es su nombre? (lit. ‘What is your name?’) and Soy [N] (‘I am [N]’), and while they are fully understandable and grammatical they are not standard; equally, the literal translations of the Spanish expressions would sound odd in English, as the question ‘How are you called?’ sounds unnatural to English speakers.
A subtype of cliché is the pragmateme, a cliché where the restrictions are imposed by the situation of utterance:[ clarification needed ]
As with clichés, the conventions of the languages in question dictate a particular pragmateme for a particular situation—alternate expressions would be understandable, but would not be perceived as normal.
Although the discussion of phrasemes centres largely on multi-word expressions such as those illustrated above, phrasemes are known to exist on the morphological level as well. Morphological phrasemes are conventionalized combinations of morphemes such that at least one of their components is selectionally restricted. [11] [12] Just as with lexical phrasemes, morphological phrasemes can be either compositional or non-compositional.
Non-compositional morphological phrasemes, [13] also known as morphological idioms, [14] are actually familiar to most linguists, although the term “idiom” is rarely applied to them—instead, they are usually referred to as “lexicalized” or “conventionalized” forms. [15] Good examples are English compounds such as harvestman ‘arachnid belonging to the order Opiliones ’ (≠ ‘harvest’ ⊕ ‘man’) and bookworm (≠ ‘book’ ⊕ ‘worm’); derivational idioms can also be found: airliner ‘large vehicle for flying passengers by air’ (≠ airline ‘company that transports people by air’ ⊕ -er ‘person or thing that performs an action’). Morphological idioms are also found in inflection, as shown by these examples from the irrealis mood paradigm in Upper Necaxa Totonac: [16]
ḭš-tḭ-tachalá̰x-lḭ
PAST-POT-shatter-PFV
‘it could have shattered earlier (but didn't)’
ḭš-tachalá̰x-lḭ
PAST-shatter-PFV
‘it could have shattered now (but hasn’t)’
ka-tḭ-tachalá̰x-lḭ
OPT-POT-shatter-PFV
‘it could shatter (but won't now)’
The irrealis mood has no unique marker of its own, but is expressed in conjunction with tense by combinations of affixes “borrowed” from other paradigms—ḭš- ‘past tense’, tḭ- ‘potential mood’, ka- ‘optative mood’, -lḭ ‘perfective aspect’. None of the resulting meanings is a compositional combination of the meanings of its constituent parts (‘present irrealis’ ≠ ‘past’ ⊕ ‘perfective’, etc.).
Morphological collocations are expressions such that not all of their component morphemes are chosen freely: instead, one or more of the morphemes is chosen as a function of another morphological component of the expression, its base. This type of situation is quite familiar in derivation, where selectional restrictions placed by radicals on (near-)synonymous derivational affixes are common. Two examples from English are the nominalizers used with particular verbal bases (e.g., establishment, *establishation; infestation, *infestment; etc.), and the inhabitant suffixes required for particular place names (Winnipeger, *Winnipegian; Calgarian, *Calgarier; etc.); in both cases, the choice of derivational affix is restricted by the base, but the derivation is compositional, forming a morphological gap. An example of an inflectional morphological collocation is the plural form of nouns in Burushaski: [17]
Meaning | Singular | Plural | Meaning | Singular | Plural |
---|---|---|---|---|---|
‘king’ | thám | thám-u | ‘flower’ | asqór | asqór-iŋ |
‘bread’ | páqu | páqu-mu | ‘plow’ | hárč | harč ̣-óŋ |
‘dragon’ | aiždahár | aiždahár-išu | ‘wind’ | tíš ̣ | tiš ̣̣-míŋ |
‘branch’ | táγ | taγ-ášku, taγ-šku | ‘minister’ | wazíir | wazíir-ting |
‘pigeon’ | tál | tál-Ǯu | ‘woman’ | gús | guš-íngants |
‘stone’ | dán | dan-Ǯó | ‘[a] mute’ | gót | got ̣-ó |
‘enemy’ | dušmán | dušmá-yu | ‘body’ | ḍím | ḍím-a |
‘rockN’ | čár | čar-kó | ‘horn’ | túr | tur-iáŋ |
‘dog’ | húk | huk-á, -ái | ‘saber’ | gaté+nč ̣ | gaté-h |
‘wolf’ | úrk | urk-á, urk-ás | ‘walnut’ | tilí | tilí |
‘man’ | hír | hur-í | ‘demon’ | díu | diw-anc |
Burushaski has about 70 plural suffixal morphemes The plurals are semantically compositional, consisting of a stem expressing the lexical meaning and a suffix expressing PLURAL, but for each individual noun, the appropriate plural suffix has to be learned.
Unlike compositional lexical phrasemes, compositional morphological phrasemes seem only to exist as collocations: morphological clichés and morphological pragmatemes have yet to be observed in natural language. [12]
A lexicon is the vocabulary of a language or branch of knowledge. In linguistics, a lexicon is a language's inventory of lexemes. The word lexicon derives from Greek word λεξικόν, neuter of λεξικός meaning 'of or for words'.
A lexeme is a unit of lexical meaning that underlies a set of words that are related through inflection. It is a basic abstract unit of meaning, a unit of morphological analysis in linguistics that roughly corresponds to a set of forms taken by a single root word. For example, in English, run, runs, ran and running are forms of the same lexeme, which can be represented as RUN.
Lexicology is the branch of linguistics that analyzes the lexicon of a specific language. A word is the smallest meaningful unit of a language that can stand on its own, and is made up of small components called morphemes and even smaller elements known as phonemes, or distinguishing sounds. Lexicology examines every feature of a word – including formation, spelling, origin, usage, and definition.
A morpheme is the smallest meaningful constituent of a linguistic expression. The field of linguistic study dedicated to morphemes is called morphology.
In linguistics, morphology is the study of words, including the principles by which they are formed, and how they relate to one another within a language. Most approaches to morphology investigate the structure of words in terms of morphemes, which are the smallest units in a language with some independent meaning. Morphemes include roots that can exist as words by themselves, but also categories such as affixes that can only appear as part of a larger word. For example, in English the root catch and the suffix -ing are both morphemes; catch may appear as its own word, or it may be combined with -ing to form the new word catching. Morphology also analyzes how words behave as parts of speech, and how they may be inflected to express grammatical categories including number, tense, and aspect. Concepts such as productivity are concerned with how speakers create words in specific contexts, which evolves over the history of a language.
An idiom is a phrase or expression that usually presents a figurative, non-literal meaning attached to the phrase. Some phrases which become figurative idioms, however, do retain the phrase's literal meaning. Categorized as formulaic language, an idiom's figurative meaning is different from the literal meaning. Idioms occur frequently in all languages; in English alone there are an estimated twenty-five thousand idiomatic expressions.
In linguistics, a calque or loan translation is a word or phrase borrowed from another language by literal word-for-word or root-for-root translation. When used as a verb, “to calque” means to borrow a word or phrase from another language while translating its components, so as to create a new lexeme in the target language. For instance, the English word "skyscraper" has been calqued in dozens of other languages, combining words for "sky" and "scrape" in each language, as for example, German: Wolkenkratzer, Portuguese: Arranha-céu, Turkish: Gökdelen, Swedish: Skyskrapa. Another notable example is the Latin weekday names, which came to be associated by ancient Germanic speakers with their own gods following a practice known as interpretatio germanica: the Latin "Day of Mercury", Mercurii dies, was borrowed into Late Proto-Germanic as the "Day of Wōđanaz" (Wodanesdag), which became Wōdnesdæg in Old English, then "Wednesday" in Modern English.
Lexical semantics, as a subfield of linguistic semantics, is the study of word meanings. It includes the study of how words structure their meaning, how they act in grammar and compositionality, and the relationships between the distinct senses and uses of a word.
A root is the core of a word that is irreducible into more meaningful elements. In morphology, a root is a morphologically simple unit which can be left bare or to which a prefix or a suffix can attach. The root word is the primary lexical unit of a word, and of a word family, which carries aspects of semantic content and cannot be reduced into smaller constituents. Content words in nearly all languages contain, and may consist only of, root morphemes. However, sometimes the term "root" is also used to describe the word without its inflectional endings, but with its lexical endings in place. For example, chatters has the inflectional root or lemma chatter, but the lexical root chat. Inflectional roots are often called stems. A root, or a root morpheme, in the stricter sense, may be thought of as a monomorphemic stem.
In corpus linguistics, a collocation is a series of words or terms that co-occur more often than would be expected by chance. In phraseology, a collocation is a type of compositional phraseme, meaning that it can be understood from the words that make it up. This contrasts with an idiom, where the meaning of the whole cannot be inferred from its parts, and may be completely unrelated.
In linguistics, phraseology is the study of set or fixed expressions, such as idioms, phrasal verbs, and other types of multi-word lexical units, in which the component parts of the expression take on a meaning more specific than, or otherwise not predictable from, the sum of their meanings when used independently. For example, ‘Dutch auction’ is composed of the words Dutch ‘of or pertaining to the Netherlands’ and auction ‘a public sale in which goods are sold to the highest bidder’, but its meaning is not ‘a sale in the Netherlands where goods are sold to the highest bidder’; instead, the phrase has a conventionalized meaning referring to any auction where, instead of rising, the prices fall.
In lexicography, a lexical item is a single word, a part of a word, or a chain of words (catena) that forms the basic elements of a language's lexicon (≈ vocabulary). Examples are cat, traffic light, take care of, by the way, and it's raining cats and dogs. Lexical items can be generally understood to convey a single meaning, much as a lexeme, but are not limited to single words. Lexical items are like semes in that they are "natural units" translating between languages, or in learning a new language. In this last sense, it is sometimes said that language consists of grammaticalized lexis, and not lexicalized grammar. The entire store of lexical items in a language is called its lexis.
Nanosyntax is an approach to syntax where the terminal nodes of syntactic parse trees may be reduced to units smaller than a morpheme. Each unit may stand as an irreducible element and not be required to form a further "subtree." Due to its reduction to the smallest terminal possible, the terminals are smaller than morphemes. Therefore, morphemes and words cannot be itemised as a single terminal, and instead are composed by several terminals. As a result, nanosyntax can serve as a solution to phenomena that are inadequately explained by other theories of syntax.
A multiword expression (MWE), also called phraseme, is a lexeme-like unit made up of a sequence of two or more lexemes that has properties that are not predictable from the properties of the individual lexemes or their normal mode of combination. MWEs differ from lexemes in that the latter are required by many sources to have meaning that cannot be derived from the meaning of separate components. While MWEs must have some properties that cannot be derived from the same property of the components, the property in question does not need to be meaning.
Meaning–text theory (MTT) is a theoretical linguistic framework, first put forward in Moscow by Aleksandr Žolkovskij and Igor Mel’čuk, for the construction of models of natural language. The theory provides a large and elaborate basis for linguistic description and, due to its formal character, lends itself particularly well to computer applications, including machine translation, phraseology, and lexicography.
Odia grammar is the study of the morphological and syntactic structures, word order, case inflections, verb conjugation and other grammatical structures of Odia, an Indo-Aryan language spoken in South Asia.
An explanatory combinatorial dictionary (ECD) is a type of monolingual dictionary designed to be part of a meaning-text linguistic model of a natural language. It is intended to be a complete record of the lexicon of a given language. As such, it identifies and describes, in separate entries, each of the language's lexemes and phrasemes. Among other things, each entry contains (1) a definition that incorporates a lexeme's semantic actants (2) complete information on lexical co-occurrence ; (3) an extensive set of examples. The ECD is a production dictionary — that is, it aims to provide all the information needed for a foreign learner or automaton to produce perfectly formed utterances of the language. Since the lexemes and phrasemes of a natural language number in the hundreds of thousands, a complete ECD, in paper form, would occupy the space of a large encyclopaedia. Such a work has yet to be achieved; while ECDs of Russian and French have been published, each describes less than one percent of the vocabulary of the respective languages.
A lexical function (LF) is a tool developed within Meaning-Text Theory for the description and systematization of semantic relationships, specifically collocations and lexical derivation, between particular lexical units (LUs) of a language. LFs are also used in the construction of technical lexica and as abstract nodes in certain types of syntactic representation. Basically, an LF is a function ƒ( ) representing a correspondence ƒ that associates a set ƒ(L) of lexical expressions with an LU L; in f(L), L is the keyword of ƒ, and ƒ(L) = {L´i} is ƒ’s value. Detailed discussions of Lexical Functions are found in Žolkovskij & Mel’čuk 1967, Mel’čuk 1974, 1996, 1998, 2003, 2007, and Wanner (ed.) 1996; analysis of the most frequent type of lexical functions—verb-noun collocations—can be found in Gelbukh & Kolesnikova 2013.
Upper Necaxa Totonac is a native American language of central Mexico spoken by 3,400 people in and around four villages— Chicontla, Patla, Cacahuatlán, and San Pedro Tlaloantongo —in the Necaxa River Valley in Northern Puebla State. Although speakers represent the majority of the adult population in Patla and Cacahuatlán, there are very few monolinguals and few if any children are currently learning the language as a mother tongue, and, as a consequence, the language must be considered severely endangered.
Idiom, also called idiomaticness or idiomaticity, is the syntactical, grammatical, or structural form peculiar to a language. Idiom is the realized structure of a language, as opposed to possible but unrealized structures that could have developed to serve the same semantic functions but did not.