Steve Pepper
Hatcher-Bourque: Towards a reusable
classification of semantic relations
Abstract: A key feature of binominal lexemes is the unstated (or underspecified)
relation, ℜ, that pertains between the two major constituents. The nature of ℜ – the
kinds of relations – has been the topic of considerable research during recent
decades. While early studies focused almost exclusively on English, the last few
years have seen a spate of work on other languages. Unfortunately, this work has
been uncoordinated and each researcher entering the field has tended to devise
their own classification, making it difficult to compare results and advance our
understanding of the phenomenon. This is a pity, because such an understanding
has the potential to provide insights into the nature of concept combination and the
associative character of human thought. The purpose of this chapter is to present
a well-documented, systematic classification of semantic relations that operates at
multiple levels of granularity and is suitable for reuse across languages. HatcherBourque is based on revisions of two earlier classifications, those of Anna Granville
Hatcher and Yves Bourque, which operate at different levels of granularity. These
are integrated into a single, coherent system, with automatic mapping from one
level to the other. The classification is applied to a set of 3,650 binominals from
106 languages, and an analysis is presented of the frequency and distribution of
semantic relations at both a highly abstract level and a more granular level. The
Hatcher-Bourque classification, and an accompanying, Excel-based tool, the Bourquifier, are offered to the research community in order to encourage collaboration,
and researchers are invited to participate in the Hatcher-Bourque Cake Challenge.
1 Introduction
1.1 Background
The unstated (or underspecified) semantic relation, ℜ, is a defining feature of
binominals (see Introduction). Jackendoff (2016) provides a nice set of examples
to show that the kind of semantic relation can be “hugely varied”, even across
binominals that share a common head, such as cake.
Note: This chapter has been made Open Access in memoriam my parents Harry Pepper (1926–1996)
and Edna Pepper (1932–2022).
Open Access. © 2023 Steve Pepper, published by De Gruyter.
This work is licensed under the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://doi.org/10.1515/9783110673494-010
306
Steve Pepper
chocolate cake
birthday cake
coffee cake
marble cake
layer cake
cupcake
urinal cake
‘a cake made with chocolate in it’
‘a cake to be eaten as part of celebrating a birthday’
‘a cake made to be eaten along with coffee and the like’
‘a cake that resembles marble’
‘a cake formed in multiple layers’
‘a little cake made in a cup’
‘a (nonedible) cake to be placed in a urinal’
The nature of ℜ has been a perennial topic of interest in the study of compounding that can be traced back to the Sanskrit grammarians. Modern treatment of
the topic can be said to originate with Jespersen’s (1942) discussion in Volume 6
of his Modern English Grammar on Historical Principles.1 The year 1960 saw the
publication of three seminal works by Marchand, Lees and Hatcher that inspired
further work in a number of different directions. In the years that followed there
were important contributions by Adams (1973), Downing (1977), Levi (1978),
Warren (1978), Ryder (1994), Jackendoff (2009; 2010; 2016) and Schäfer (2018), all
of which focused on English [eng]. More recently the topic of semantic relations
has been explored in other languages, including French [fra] (Arnaud 2003;
2016; Bourque 2014), Nizaa [sgi] (Pepper 2010), Danish [dan] (Szubert 2012),
Norwegian [nor] (Eiesland 2016) and Spanish [spa] (Toquero 2018). The matter
has also received considerable attention in computational and corpus linguistics (e.g. Vanderwende 1994; Moldovan et al. 2004; Girju et al. 2005; Ó Séaghdha
2008; Tratz & Hovy 2010; Nakov 2013; Schäfer 2018) and was the focus of an NAACL-HLT Workshop on Semantic Evaluations task on “the interpretation of noun
compounds using paraphrasing verbs and prepositions” (Butnariu et al. 2009).
1.2 Towards a reusable classification
The point of departure for the present chapter is three observations regarding this
previous work. The first observation is that opinions differ as to whether the set of
semantic relations found in binominals is finite or infinite. Jespersen (1942: 143)
asserted that “the number of possible logical relations between the two elements
[of a noun-noun compound] is endless” and Downing (1977: 810) concluded that
“the semantic relations that hold between the members of [novel] compounds
cannot be characterized in terms of a finite list of ‘appropriate compounding
relationships’.” However, most researchers have had enough faith in the useful-
1 But see also Grimm (1826), Mätzner (1860), Bergsten (1911) and Carr (1939).
Hatcher-Bourque: Towards a reusable classification of semantic relations
307
ness of a finite list that they have taken the trouble to develop one. The position
taken in the present research accords with that of Tratz & Hovy (2010: 679), who
contend that “the vast majority of noun compounds fits within a relatively small
set of categories.” Furthermore, it seems likely that, while the interpretation of
novel compounds depends greatly on context, established compounds do so to a
lesser degree and are more likely to exhibit a fixed set of basic relations.
The second observation is that among authors who have attempted to enumerate a list of relations, the number of relations varies considerably from four (in the
case of Hatcher), to upwards of 40 or 50 (depending on whether or not subtypes
are included). The position taken in the present paper is that the number of relations one identifies should be a function of the degree of granularity required by
the investigation in question. It can therefore be anything the researcher desires,
from one (as suggested by Bauer 1979) to unlimited (as opined by Jespersen). We
further claim that any relation can be subdivided into more specific relations,
if the need arises and – concomitantly – that any two arbitrary relations can be
combined into a single, more general relation.
For some investigations, a small number of (high-level) relations will suffice;
for others, a larger number of (low-level) relations is required. The advantage
of a granular, low-level classification is that it is more concrete, and thus much
easier to apply in practice; its disadvantage is that it results in a rather fragmentary picture from which it can be difficult to generalize. The advantage of a more
abstract, high-level classification is that the generalizations are built into the
scheme; its disadvantage is that the high level of abstraction makes it extremely
hard to apply consistently.
This suggests that a classification scheme that operates at more than one
level of granularity – with automatic mapping from lower to higher levels – may
prove beneficial. Such a scheme, if based on sound principles, would enjoy both
of the advantages outlined above, and suffer from neither of the disadvantages.
The third observation that is relevant here is that each researcher tends to
construct their own scheme instead of reusing an existing one. That is the case
in almost every one of the studies listed above, and one might legitimately ask
why this should be so. Three possible reasons might be put forward. The first is
simply that the material in question is notoriously slippery. Meaning only exists
in our minds and is therefore hard to pin down. Getting inside someone else’s
head is not easy, and it is made more difficult by the fact that many systems are
rather poorly documented. The second reason is that judgements regarding the
nature of a semantic relation are subjective and dependent on the level of granularity one aspires to: some might regard a system of 12 relations (such as Levi’s)
as too vague, while others (Hatcher, no doubt) would find it too low-level (and
too unsystematic). The third reason is that no system is perfect. It is easy to spot
308
Steve Pepper
inconsistencies and errors in others’ work, and when we encounter such errors,
there is a tendency to think that we can do a better job ourselves.
Whatever the reasons may be, the practice of discarding the work of others
and starting from scratch does not seem conducive to the advancement of science.
The position taken in this paper is that a better approach is to build on the work of
earlier researchers, to reuse existing schemes, testing and refining them as necessary, and working incrementally towards the goal of a robust, flexible and easily
reusable system that has been tested against different kinds of data from a large
range of languages. The Hatcher-Bourque classification presented here is such
a system. It is hereby offered to the research community as a basis for further
collaborative work, together with an Excel-based tool for the computer-assisted
analysis of semantic relations, the Bourquifier (Pepper 2021).
1.3 A note on terminology
Before proceeding, it is worth spending time to understand the structure of a
semantic relation and the terminology to be used in this chapter.
The relation ℜ that pertains between the two major constituents of a binominal lexeme, such as honey bee, is by definition binary. It involves two participants,
honey and bee, each of which plays a particular role in the relation. We can characterize the relation here as one of production: a honey bee is a bee that produces
honey; the bee plays the role of producer and the honey plays the role of product.
It is important to distinguish between the role of a participant in a particular
relation and its type, the class to which it belongs and that reflects its essential
being. A bee is primarily an insect, not a producer, and honey is a kind of sweet
fluid rather than just a product. Roles vary depending on the relation in question,
whereas types are constant: the bee in beehive is playing a quite different role
from the bee in honey bee, but it is still a bee.
All binary relations are bidirectional, in the sense that if A is related to B,
then B is perforce related to A. In a symmetric relation, such as that of coordination, B is related to A in the same way as A is related to B (if A is coordinate with
B, then B is coordinate with A). In such relations, the role is the same for both
participants. In an asymmetric relation, such as that of production, B is related
to A in a different way from how A is related to B, and there are two distinct roles.
When a relation is asymmetric, it can take two forms depending on how the
relation is profiled: in honey bee, the constituent denoting the producer (bee) is
the semantic head and the constituent denoting the product (honey) is the modifier. By contrast, in beeswax, the constituent denoting the product is the head
and the constituent denoting the producer is the modifier. Because it is asym-
Hatcher-Bourque: Towards a reusable classification of semantic relations
309
metric, the production relation can be said to consist of two “sub-relations”,
which we might label “producer of” and “produced by”. When both sub-relations
are employed in binominal word-formation, the relation is said to be reversible,
and the terms basic and reversed may be employed to distinguished between the
two. Note that a relation may be asymmetric without necessarily being reversible.
In the following discussion, relations are shown in small caps and roles are
underlined.
1.4 Structure of this chapter
This chapter is structured as follows: Following this introduction, §2 presents
the low-level classification of 25 relations developed by Bourque (2014) that was
chosen as the starting point for the present study; it also details the minor adjustments and extensions that were made to it, resulting in the Bourque29 component (29 refers to the number of relations) of the Hatcher-Bourque classification.
§3 describes Hatcher’s (1960) high-level classification of four relations and how it
was extended by the addition of one more relation in order to cover appositional
as well as non-appositional binominals, resulting in the Hatcher5 component of
the Hatcher-Bourque classification.
§4 describes the two-tiered Hatcher-Bourque system that results from the
integration of Bourque29 with Hatcher5, and how this system relates to Aristotle’s
three principles of remembering. §5 then presents the Bourquifier application and
its use as a computer-assisted tool to expedite the analysis of semantic relations
and ensure more consistent results.
§6 contains a statistical analysis, showing the frequency of various low- and
high-level relations in a sample of 3,650 binominals from 106 languages, and §7
provides a conclusion and a challenge. Documentation for the complete Hatcher-Bourque classification is to be found in the appendix, in the form of detailed
summaries of each relation and a one-page at-a-glance table.
2 Bourque’s low-level classification
2.1 Description of Bourque25
Out of the dozens of classification schemes to be found in the literature, the one
selected for the present study is the one developed by Yves Bourque in his 2014
dissertation Toward a typology of semantic transparency: The case of French com-
310
Steve Pepper
pounds (Bourque 2014). This choice was dictated by a number of considerations,
in particular the quality of Bourque’s documentation, which includes templates,
linking material, examples and extensive discussion of overlaps between relations.
A further reason was that the scheme avoids the Anglocentrism of many earlier
studies, for example by providing examples in both English and French, employing
descriptive labels (e.g. purpose instead of Levi’s for), and using the terms ‘nonhead’ (or ‘modifier’) and ‘head’ instead of the word order dependent ‘A’ and ‘B’ of
Jespersen and Hatcher, or ‘N1’ and ‘N2’ of Levi and Jackendoff. In addition, a study
involving nearly 4,000 binominals (Pepper 2020) shows that this classification
operates at a level of granularity that is both manageable, in terms of the number of
relations (25), and precise, in terms of expressing the nature of the various relations.
Bourque’s classification is furthermore based explicitly on a synthesis of
16 earlier classifications.2 Whereas all but one of these are based on data from
English, Bourque himself tested the system using a large database of French compounds, thus increasing the chance of cross-linguistic coverage.
From the 16 earlier classifications, Bourque synthesizes a set of “retained
relations”, 15 in all, shown in Table 1. Of these 15, ten are considered to be reversible and are indicated by R.
Table 1: Bourque’s (2014:170) retained relations.
coordination
hypernymy
R
compositionR
R
timeR
source
topic
similarity
partR
function
productionR
locationR
purpose
causeR
possessionR
useR
Each relation is introduced by a summary table such as that exemplified for
production in Figure 1. For reversible relations like production, the summary
table consists of two rows, one for each of the (directed) ‘sub-relations’; these are
labelled Basic and Reversed. Each row then contains a “structure template” in
both English and French, examples (in the form of compounds, i.e. binominals of
type cmp or jxt) from each language, and “linking material”. For non-reversible
relations the second row is empty.
2 Those of Jespersen (1942), Hatcher (1960), Adams (1973), Levi (1978), Downing (1977), Warren
(1978), Shoben (1991), Vanderwende (1994), Lauer (1995), Rosario and Hearst (2001), Arnaud (2003),
Moldovan & al (2004), Girju & al (2005), Girju & al (2009), Séaghdha (2008), Jackendoff (2010).
Hatcher-Bourque: Towards a reusable classification of semantic relations
311
The structure template consists of a test frame with slots for the head (H)
and modifier (M), respectively. Populating these slots with the constituents of
a binominal results in a paraphrase of the relation that helps the analyst judge
whether that relation is appropriate to the binominal in question. Thus we see
that the paraphrase of honey bee as “a bee that makes honey’ (the Basic form of
production) provides a satisfactory reading, whereas that of the Reversed form,
“a bee that honey makes”, does not. Conversely, beeswax is “a wax that bees
make” and not “a wax that makes bees”.
PRODUCTION
Relation Type
Structure Template
Examples
Linking Material
Basic
an H that makes M
un T qui fait M
honey bee
appareil photo
makes, produces
fait, produit
Reversed
an H that M makes
un T que M fait
beeswax
jazz manouche
Figure 1: Bourque’s template for production.
The linking material “is meant to draw parallels between the retained relation
and those proposed elsewhere in the literature and may include such items as
verbs (e.g. have, cause, make, etc.), prepositions (e.g. for, from, of, etc.), and even
nouns (e.g. kind, type)” (Bourque 2014: 178).
In addition to the summary table, each relation is accompanied by a lengthy discussion that can run to several pages. This covers the precise nature of the relation,
the ways in which it has been treated by earlier researchers, overlaps with other relations, and any other issues. The complete classification is summarized in Table 2.
Bourque’s system of 15 relations (of which 10 are reversible, for a total of
25 “sub-relations”) was sufficient to cater for the varied sample of nearly 4,000
binominals that will be described in §5. However, a few infelicities were discovered in the process, and when it came time to map the system to that of Hatcher,
certain extensions were deemed necessary. The following section (§2.2) describes
the non-substantive changes that were made to the original system, and §2.3
describes the substantive extensions that resulted in the Bourque29 component
of the Hatcher-Bourque classification.
2.2 Non-substantive changes to Bourque
The non-substantive changes to Bourque’s classification involved renaming some
relations, rewording some templates, and changing some examples. They are
312
Steve Pepper
presented in the following sections. (Refer to Table 2 for the original formulations
and Table 7, in the Appendix, for the revised version.)
Table 2: The original Bourque25 classification.
Label
Type
Template
Linking material
Example
hypernymy
Basic
an H of kind M
an H that M is a kind of
kind of,
type of
oak tree
Rev.
bear cub
coordination
a C is an H and an M
is also, is both / and
boy king
similarity
an H that is similar to M
similar to, like
ant lion
function
possession
part
location
composition
source
cause
production
an H that serves as M
functions, serves as
buffer state
Basic
an H that possesses M
career girl
Rev.
an H that M possesses
possess
(have / of)
Basic
an H that is part of M
table leg
Rev.
an H that M is part of
part of
(have / of)
Basic
an H located at/near/in M
at, near, in, etc.
window seat
Rev.
an H that M is located at/
near/in
Basic
an H made of M
Rev.
Basic
Rev.
an H that M is (made) from
Basic
an H that causes M
Rev.
an H that M causes
Basic
an H that makes M
Rev.
an H that M makes
topic
time
use
family estate
wheelchair
bedroom
sugar cube
an H that M is made of
composed/
made of
an H (made) from M
(made) from
cane sugar
sheet metal
sugar cane
causes
sunburn
motion sickness
makes, produces
honey bee
beeswax
an H about M
about
history
conference
Basic
an H that occurs at/
during M
during, at, in,
before, etc.
summer job
Rev.
an H at/during which M
occurs
Basic
an H that uses M
Rev.
an H that M uses
purpose and
proper function
an H intended for M
golf season
use / with, by
steamboat
hand brake
for
animal doctor
Hatcher-Bourque: Towards a reusable classification of semantic relations
313
Changes to names of relations
Previous researchers have employed a variety of strategies for naming relations.
Levi used a mixture of verbs (be, have, make, cause, use) and prepositions
(in, for, from, about); Warren preferred to use role pairings (source-result,
whole-part, part-whole, size-whole, goal-obj, place-obj, time-obj, activity-actor), but had recourse to other means for symmetric and non-reversible
relations (copula, resemblance, purpose); Jackendoff’s “basic functions”
employ a verb-based naming system for the most part, but with the odd adjective,
role or abbreviation thrown in (classify, be, be at/in/on, made from, cause,
make, serves as, have, protect (from), but also similar, kind, part, comp).
Bourque’s system is more consistent, but not entirely so. As Table 1 shows,
all 15 relations are named by nouns. Of these, six are nominalizations of the verb
or adjective typically used to express the relation, e.g. production < produce
(the others are coordination, composition, possession, location and similarity). hypernymy also denotes a relation, but one that is lexical rather than
conceptual. part, cause, source, time and topic, on the other hand, all denote
one of the roles in the relation, while function, use and purpose3 can denote
either a role or a relation.
For the Hatcher-Bourque classification, it was considered desirable that
names should denote relations rather than roles or linguistic means of expression, preferably using nominalizations of relevant verbs. Where this was not
possible, a role-pair was preferred to a single role, but the latter was considered
acceptable for symmetric and non-reversible relations. While it was not possible
to achieve complete consistency, the following improvements were made:
– hypernymy (lexical relation) > taxonomy (conceptual relation)
– part (role) > partonomy (relation)
– cause (role) > causation (relation)
– time (role) > temporality (relation)
– source (role) > source-result (relation)
– use (conversion) > usage (nominalization)
3 Bourque actually uses the name purpose and proper function for this relation, but since
his system already includes a relation called function the name has been shortened. Considerations relating to having been designed to (or supposed to) perform a certain function (Millikan
1984: 17, cited in Jackendoff 2016: 23) are thus relegated to the description of the relation instead
of its name.
314
Steve Pepper
The names topic, function and purpose, on the other hand, were retained.
Since these appear to be non-reversible, this is not a major issue, especially since
the latter two can denote relations as well as roles.
Changes to templates
Changes made to Bourque’s templates were motivated by the desire for greater
transparency and/or consistency. The template for Reversed hypernymy is actually incorrect, generating “a bear cub is a cub that bear is a kind of” for Bourque’s
example bear cub. A bear is a kind of animal, and not a kind of (bear) cub. One
way to correct this error would be to adapt Jackendoff’s “an N2 that is a kind of N1”
(i.e. “an H that is a kind of M”), but a simpler solution was to replace both hypernymy templates with the more transparent “(an) M is a kind of H” and “(an) H is a
kind of M”. Thus, an oak is a kind of tree (whereas a tree is not a kind of oak), and
a cub is a kind of bear (whereas a bear is not a kind of cub).
In addition, Bourque’s templates for composition, production and source
(all of which employ the verb ‘make’) were modified to use the verbs ‘compose’
and ‘produce’ and the noun ‘source’, respectively, thereby tying the templates
more closely to the name of the relation. For example, the template for Basic production (e.g. honey bee) was changed from “an H that makes M” to “an H that
produces M”, and that for Basic composition (e.g. sugar cube) was changed from
“an H made of M” to “an H composed of M”.
Changes to examples
Most of the changes to Bourque’s examples were motivated by pedagogical and,
in a couple of cases, aesthetic considerations. Only one of his 25 examples is actually erroneous: the use of sunburn to exemplify Basic cause, paraphrased as “an
H that causes M”. It is, of course, the sun (M) that causes the burn (H), not the
other way round, so this example properly belongs under Reversed cause, with
the paraphrase “(a) burn that (a) sun causes”. A better example for Basic cause is
tear gas: “(a) gas that causes (a) tear”.
It is arguable that the example provided by Bourque for similarity is not
incorrect, but it is certainly suboptimal. An ant lion (or antlion) is not a lion that is
similar to an ant, it is a kind of insect, albeit not exactly an ant. The name appears
to be a left-headed calque from Latin formicaleo, which means that the paraphrase “an ant that is similar to a lion” does in fact work. However, as a highly
exceptional left-headed compound it is unsuitable in an English context for ped-
Hatcher-Bourque: Towards a reusable classification of semantic relations
315
agogical reasons (it works fine as Fr. fourmi-lion, which may be how Bourque got
to choose it as his example). It is replaced by kidney bean (a bean shaped like a
kidney), an example taken from Hatcher (1960).
Conservation of space is the main consideration for choosing history book
instead of history conference for topic, and sunburn instead of motion sickness
for Reversed cause; real estate is at a premium not only on paper, but also in the
Bourquifier (see §4.2).
Finally, a number of changes were motivated by the desire to use what DavidAntoine Williams4 has dubbed “boathouse words” wherever possible, for pedagogical reasons. These are pairs of words that have the pleasing property of consisting of the same two constituents in reverse order, like Bourque’s examples for
source, cane sugar and sugar cane. For composition Bourque already has sugar
cube, which is complemented nicely by cube sugar. In addition, song bird and
bird song work well for production, as do oil lamp and lamp oil for use, and car
motor and motor car for part. Finding a suitable boathouse pair for location is
more difficult (the closest I have come is house music and music hall), and candidates are still being sought for possession, time, cause and direction.
2.3 Substantive changes to Bourque’s classification
The more extensive changes to Bourque25 involved the addition of codes for
relations, the provision of names for roles, the enforcement of consistency when
distinguishing between Basic and Reversed forms, and the addition of two new
relations. These changes are described and justified in the following sections.
Addition of codes
In order to represent the data in a database and perform the quantitative study
described in §5, unique identifiers were needed for each relation. Bourque will
have encountered the same need, but he did not publish his codes, so new ones
were created. These take the form of three- or four-letter mnemonic codes for
Basic relations, and the same codes suffixed with -R for reversed relations, thus
POSS for Basic possession, POSS-R for Reversed possession, etc. These codes
are included in the documentation in the Appendix, in order to promote interoperability and to save other users the trouble of devising their own codes.
4 https://thelifeofwords.uwaterloo.ca/boathouse-words/ (accessed 2021-02-10).
316
Steve Pepper
Addition of explicit roles
A more substantive change is the introduction of explicit names for the roles
played by the participants in each relation, as described in §1.2. Thus, the production relation is supplied with the roles product and producer, the possession relation with the roles possessor and possessum, etc. Every asymmetric
relation (reversible or not) involves two distinct roles, as here. The two symmetric
relations, on the other hand, each involve a single role which is played by both
participants in the relation. For coordination that role is named coordinand,
and for similarity it is named likeness.
These names serve as an aid in conceptualizing asymmetric relations, in
understanding the difference between a Basic and its corresponding Reversed
relation, and in describing and communicating about individual relations. If
one adopts the convention of using the role played by the modifier to characterize a (directed) ‘sub-relation’, every one of the 29 sub-relations of the revised
Bourque29 system can be referred to simply as the ‘X relation’. Thus, ‘possessor relation’ and ‘possessum relation’ can be used instead of the unwieldy terms
Basic possession relation and Reversed possession relation for family estate and
career girl, respectively. The reason why this works is because it proved possible
to ensure that every role was unique – except for the use of entity as one of the
roles in the topic, function and purpose relations. Since these relations are
non-reversible, there will seldom be a collision between multiple, homonymous
*-entity relations.5
Alignment of Basic and Reversed relations
Asymmetric relations, as already noted, can take two forms, which work in opposite directions to one another (and incidentally are often paraphrased using active
and passive sentences, respectively). As we have seen, Bourque labels these two
forms Basic and Reversed, respectively, but he does not provide any rationale for
choosing one form rather than the other to designate as Basic. It seems that the
choice was essentially arbitrary. While this clearly did not matter to Bourque for
the purpose of his investigation, such arbitrariness is unnecessary, leads to a less
logically consistent result, and may prove confusing.
5 If at some point Reversed forms of topic, function and purpose are found, the classification
(including the relevant role names) will have to be revised.
Hatcher-Bourque: Towards a reusable classification of semantic relations
317
Consider the relations part, location and composition in Table 2. All of
these are in some sense specializations of a more general relation containment.
If we now focus on the three Reversed examples of these relations (wheelchair,
bedroom and sheet metal), we see that a part (wheel) is in some sense “contained
in” its whole (chair), that a thing located (bed) is “contained in” its location
(room), and that a material (metal) is “contained in” the object of which it is made
(sheet). The wheel, the bed and the metal are thus all “containees” (in a very
general sense), whereas the chair, the room and the sheet are all “containers”.
However, while wheel and bed are denoted by modifiers, metal is denoted by
the head constituent. If we now consider the possession relation (exemplified
by family estate), in which the possessor somehow “contains” the thing possessed, we see that the containee estate, like metal (but unlike wheel and chair),
is denoted by the head constituent.
This inconsistency can be removed by simply inverting Bourque’s possession
and composition relations such that the containees are denoted by the modifier
instead of the head. Thus, the original pair of possession (sub-)relations (before)
becomes the revised pair of (sub-)relations (after).
before:
possession Basic
Rev.
after:
possession Basic
Rev.
an H that possesses M possess
career girl
an H that M possesses (have / of) family estate
an H that M possesses possess
family estate
an H that possesses M (have / of) career girl
The same applies to composition (and taxonomy) where Basic and Reversed are
likewise inverted.
Similar considerations apply to Bourque’s source and use, which are incompatible with cause and production. Focusing again on the Reversed relations,
we see that in the latter two relations, the participant that constitutes the point
of origin (the motion in motion sickness and the bee in beeswax) is expressed by
the modifier, whereas in the source and use examples, the point of origin (the
cane of sugar cane and the brake in hand brake) is expressed by the head. In the
revised Bourque classification, the sub-relations of source and use are therefore
also inverted.
For consistency with other containment-related relations, forms such as
history book are deemed to embody the Reversed form of the topic relation rather
than the (unattested) Basic form. Of the two paraphrases available for the containment relation, the Reversed form is clearly the most felicitous: cf. “a book
that contains history” vs. the Basic form “*a book that is contained in (a) history”.
318
Steve Pepper
The attested forms of the other non-reversible relations, purpose and function,
embody the Basic forms of the relation.
Addition of new relations
Although Bourque’s set of 15 relations was sufficient to cater for the 3,650 binominals examined for the study on which this work is based, two more relations
were added for the sake of completeness, and to facilitate the integration with
Hatcher’s system (described below). These were containment and direction.
The first of these was prompted by consideration of the Hawaiian binominal
pahu meli [box honey] beehive. Is a beehive “a box that honey is part of” (Bourque’s Reversed part) or “a box that honey is located at/near/in” (his Reversed
location)? Of course, location is involved, and one could also (at a pinch) say that
honey is part of the beehive, but it would be more felicitous to say that the box contains honey. Now, although containment is not one of Bourque’s relations, he does
not ignore the matter. He discusses it in depth in the context of the overlap between
part and location, using the example of toolbox. His discussion is quoted here at
some length in order to convey the detail of his discussions in general:
Another issue to consider is that some compounds might be analysed as either part or
location. This dual analysis is related to the fact that location may subsume part: if
something is a part of something else, then it is located at/on/in that thing (cf. Baron &
Herslund 2001). One possible solution is to reserve location for only those compounds that
actually involve a locative noun, as does Adams (1973). The problem, of course, is that one
must treat combinations such as toolbox or treehouse using some other relation, as they do
not, in the strictest sense, involve places. The key distinction that will be used here is one
that views the part relation as a reference to an integral component of the whole, without
which it would either be incomplete, defective, or non-functional. Thus, a negation test may
be used to determine whether the modifier denotes an essential part of the compound. The
formulation in (105) below shows how such a test might apply to compounds in which the
head denotes the whole (cf. 104 above):
(105)
a.
b.
a C without an M is still a C
un C sans M est toujours un C
A positive response to the above sentence would indicate that the modifying noun is not an
essential component of the object denoted by the compound, but instead a distinguishing
feature. Thus, a toolbox without tools is still a toolbox, which indicates that tools is connected to box via some other relationship (i.e. container-contained). This result is the
same for the French boîte à outils (i.e. une boîte à outils sans outils est toujours une boîte à
outils). When applied to compounds that denote a part-whole association, the test produces
defective or incomplete readings.
(pp. 196–197, emphasis added)
Hatcher-Bourque: Towards a reusable classification of semantic relations
319
The case of “honey box” (beehive) is parallel to toolbox: a beehive without honey
is indubitably still a beehive. The distinction Bourque makes is useful, but his
conclusion to treat toolbox (and thus also “honey box”) as (mere) location seems
inadequate. It seems better to bite the bullet and add the relation containment
(which even Bourque recognizes as “some other relationship”) to his system, on
the grounds that the ability to perceive containment is a fundamental part of our
cognitive endowment. The relation is reversible and may be exemplified, following Hatcher (1960: 364), by orange seed and seed orange.
The other addition made to Bourque’s set of relations was motivated by one
of Jespersen’s examples: sun worship. Strictly speaking this is not a binominal
since worship denotes an action, not a thing. However, the scope of Bourque’s
classification is noun-noun compounds in general and therefore it should be able
to accommodate sun worship. It turns out that none of Bourque’s relations are
appropriate. Clearly the notion of the sun as some kind of goal is involved, so one
might think that Bourque’s source would do the job, but no amount of tweaking of either the Basic or the Reversed template produces a paraphrase that is
acceptable for both sun worship and cane sugar. This seems to be because goal as
a complement of source is not compatible with result. It seems that a new relation
is unavoidable, but what to call it, and how to make it sufficiently distinct from
source? The answer is provided by Hatcher, who includes sun worship in her
category A←B (to be discussed below), pointing out that “the sun is that toward
which the worship is directed” (see Figure 2; emphasis added). Now, as we have
seen, it is frequently the case that the verb used to express the paraphrase can
serve in nominalized form as the name of the relation itself (recall ‘possess’ >
possession). The solution to the problem of how to name the new relation is thus
given: ‘direct’ > direction, understood as an asymmetric relation which relates a
starting point or origin and an endpoint or goal, and exemplified by sun worship
and sales target, respectively.
Adding such a relation to Bourque’s scheme can be justified on two grounds
(over and above the desire to accommodate sun worship): firstly, it is very general,
and secondly, the ability to conceptualize direction is an important part of the
human cognitive endowment. Further research may show that direction is
rarely encountered in binominals, but it may turn out to be more important when
synthetic compounds and other complex nominals containing an action-root are
considered (as in sun worship).
The classification resulting from the modifications to Bourque’s system
described in the preceding sections consists of 17 relations, two of them symmetric, and three non-reversible, for a total of 29 (directed) ‘sub-relations’, hence the
name “Bourque29”. Documentation for each of these is provided in the Appendix, together with an at-a-glance summary. In the following section we turn to the
320
Steve Pepper
high-level classification developed by Anna Granville Hatcher to which Bourque29
will be mapped in §4.
3 Hatcher’s high-level classification
3.1 The critique of Jespersen
Hatcher presents her (1960) four-way classification of non-appositional compounds in the form of a critique of Jespersen’s (1942) attempt to classify semantic
relations. Jespersen concedes that his analysis is incomplete and that there are
many compounds which “do not fit in anywhere”, but he claims that his failure is
simply due to the inherent unclassifiability of his material: “the number of possible logical relations between the two elements is endless” (p. 138); “the analysis
of the possible sense-relations can never be exhaustive” (p. 143).
But, says Hatcher,
it all too often happens that scholars in linguistics proclaim a given problem to be insoluble,
when they themselves have not worked out the categories necessary for its solution; we should,
then, examine the outline offered by Jespersen to see if some of the difficulty he encountered
may not be explained by his method of classification. For example, was his set of categories
constructed with logical rigor: and, before surrendering to the “difficult” types that he mentions,
had he been able, at least, to account for all the “easy” compounds, subdividing these as carefully as his patience and his talent permitted? The subdivision of the obvious may lead to greater
understanding of the less obvious, if one is guided by logically consistent criteria.
(p. 356)
Thereupon, Hatcher sets about dissecting and reordering Jespersen’s system. She
starts by listing seven of Jespersen’s types, omitting one of the original eight (Similarity) on the grounds that it more properly belongs to “apposition”, which she
wants to keep separate. Examining each of these in turn, Hatcher notes a lack of
careful subdivisions, an absence of any principle of symmetry, and the mixing
of two basic criteria, Reference and Relation. Her rearrangement of Jespersen’s
scheme is depicted in Figure 2.
Hatcher chooses to avoid Reference and to base her new scheme exclusively
on Relation, so she starts by separating the first three of Jespersen’s types 1–3
(Subject/Object, Place and Time) – all of which are either based on reference or
mixed – from types 4–7 (Purpose, Means, Characterizing Feature and Material),
all of which are relational. The former are set to one side, and to the latter she
adds two relational types found in Mätzner (1860) but absent in Jespersen (α
broomstick and β castor oil). She then proceeds to reorganize these six relational
types into four abstract classes:
Hatcher-Bourque: Towards a reusable classification of semantic relations
Figure 2: Hatcher’s reworking of Jespersen’s classification.
321
322
(a)
(b)
(c)
(d)
Steve Pepper
A⊂B
A⊃B
A→B
A←B
“A is contained in B” (notated Ⓐ by Hatcher)
“B is contained in A” (notated Ⓑ by Hatcher)
“A is the source of B”
“A is the destination of B”
Having reduced the six relational categories of Jespersen/Mätzner to two pairs of
mutually exclusive concepts, Hatcher turns her attention to the referential types,
in order to see how they might be accommodated in her new scheme. She starts
with (2) Place, (3) Time and their subdivisions (to, in/at, from and extent), which
map neatly into her scheme, as (d), (b), (c) and (a), respectively.
Finally, the two verbal types (1) Subject and (2) Object are “easy”:
Sunshine and sun worship, these perfect opposites, fall under A→B and A←B, respectively.
Surely the subject is the “source” of its own activity (in putting sunshine under A→B, we are
merely adding Agent to Agency); and in sun-worship (A←B), the sun is that toward which
the worship is directed.
Thus we see that both the referential and the relational types of Matzner-Jespersen can
be included in our two pairs of relational criteria: the static Ⓐ and Ⓑ, and the dynamic A→B
and A←B.
(p. 365)
Hatcher concludes this part of her analysis by pointing out that the scheme she
has developed has two advantages over the one she has just “torn to pieces”.
Firstly, it is logically conceived, and therefore neater and more pleasing aesthetically; and secondly, it is far more comprehensive, and thus may “be able to
account for all possibilities of determinative, non-appositional compounding in
the English language,” which she suggests are surely not “endless” (p. 365–366).
At the same time she expresses the hope that her work represents not a “result”,
but rather a beginning, and that it will offer “a more spacious framework” within
which research dedicated to the proposition that “all compounds are endowed by
their creators with the right to belong somewhere” may proceed more profitably
and hopefully than before.
3.2 Extending Hatcher’s classification
Hatcher’s work is often cited, but usually dismissed, often on less than scientific
grounds. For example, Søgaard (2005: 320) writes:
such an account is by definition both arbitrary (Bauer 1978; van Santen 1979) and incomplete because of the infinite set of compounding relationships. For illustration, try to place
a compound such as car thief in [Hatcher’s] four-way typology. Is a car thief a ‘car in a thief’,
a ‘thief in a car’, a ‘thief as the goal of a car’ or a ‘thief as the source of a car’?
Hatcher-Bourque: Towards a reusable classification of semantic relations
323
Unfortunately for Søgaard the last two paraphrases are incorrect: He has muddled
up the order of A and B. The head of the construction (B) is thief, not car, so these
two paraphrases should read: a ‘car as the goal of a thief’ and a ‘car as the source
of the thief’. With the correct paraphrase, it is obvious that the car is indeed the
goal of the thief (i.e. A←B). Søgaard’s objection must therefore be rejected.
One researcher who has taken Hatcher seriously is Arnaud (2003; 2016).
Arnaud’s work on categorizing the modification relations in French subordinative NNN compounds is full of interesting observations, examples and discussion. However, in the present context it is noteworthy for the fact that Arnaud
first develops his own highly granular classification, and then attempts to map
it onto Hatcher’s four-way scheme (which Noailly 1990, also working on French
compounds, had arrived at independently).
Arnaud’s classification is based on a database of 949 French binominals of
type cmp and jxt, which he dubs “les composés timbre-poste” (postage stamp
compounds). As none of the then-existing taxonomies of semantic relations
seemed satisfactory, he decided to start from the data up, applying the principles
of cognitive linguistics, “in particular the idea that relations are emergent phenomena which gain psychological existence” (2016: 71). The analysis resulted in
a classification with 58 categories, ranging from the highly abstract (e.g. “Nonhead is the goal of Head’) to the very precise (such as the subtype of the location
relation “Non-head is a secondary activity taking place in Head”).
Arnaud now proceeds to map his set of 58 empirically derived (low-level)
relations to Noailly and Hatcher’s set of four logically derived (high-level) relations. For the most part, this is plain sailing:
In most cases, the fine-grained categories were easy to group under these [high-level relations]. For example, the description in (18) was classified as an instance of (19).
(18)
It is against the effects of Non-head that Head is made/conceived/set up
ex.: minimum vieillesse (lit. ‘minimum old-age’, i.e. basic old-age benefits)
(19)
non-head ← head
Abstract relation (19) represents the fact that in (18) the denotatum of N2 is, so to say,
aimed at that of N1 (p. 81).6
6 Arnaud’s ‘non-head’ and ‘head’ correspond to Hatcher’s A and B. The high-level relation in
his (19) is therefore equivalent to her A←B. It can be useful to think that A stands for Attribute
(= modifier, non-head) and B for Base (= head).
324
Steve Pepper
Arnaud’s bottom-up deduction thus melds neatly with Hatcher’s top-down
induction. Or at least, it almost does. Arnaud experienced difficulties with 12 of
his 58 low-level categories that did not map straightforwardly to Hatcher’s four,
and he felt obliged to extended Hatcher’s system with four more high-level relations: analog, be, head symb non-head and non-head symb head.
The frequencies of the four high-level categories are shown in Table 3. Arnaud
himself concedes that “[the four new] categories are marginal compared with the
initial four,” but he believes they “show that Noailly erred on the side of abstraction (and Hatcher, too, as equivalent English compounds are easily found)” (p. 81).
Table 3: Frequencies of high-level relations in Arnaud (2016).
Relation
Equiv.
Freq.
%
non-head ← head
A←B
428
38.1
((non-head) head)
A⊂B
295
26.3
non-head → head
A→B
159
14.2
(non-head (head))
A⊃B
126
11.2
analog
–
62
5.5
head symb non-head
–
24
2.1
be
–
23
2.0
non-head symb head
–
5
0.4
Pepper (2020) examines each of the 12 low-level relations that seemed to Arnaud to
justify the creation of his four new high-level relations and shows that all but one of
them can in fact be accommodated by Hatcher’s four-way system. For example, the
first of Arnaud’s problematic forms, régime jockey, denotes a diet that is typical of
jockeys. But if A (‘jockey’) typifies (or characterizes) B (‘diet’), then it is a characterizing feature of B and therefore belongs, as Figure 2 shows, under Hatcher’s A⊂B, “A
is somehow, to some extent, contained, comprehended in B”. Thus it turns out, in
other words, that Hatcher’s system is broad enough to cater for eleven of the twelve
low-level relations that prompted Arnaud to add four new high-level categories.
The single exception, one of four subtypes of analog, is exemplified by the
form brasse papillon [breast_stroke butterfly] ‘butterfly stroke’, which falls under
Arnaud’s low-level category “Non-head names analogically a perceptual characteristic of Head”. Here there can be no doubt that some kind of analogy is at work.
But brasse papillon is not a non-appositional compound in Hatcher’s terms and
therefore falls outside the scope of her 1960 paper.
If we want to extend Hatcher’s scheme to cover appositional compounds, then
we do indeed need a new high-level relation. However, analogy may not be the
Hatcher-Bourque: Towards a reusable classification of semantic relations
325
best term for that relation. Hatcher’s logically defined pair of reversible relations
are both based on Contiguity, which is one of Aristotle’s “three principles of remembering”, the others being Similarity and Contrast. In Pepper (2020) I suggest that
the relation underlying the types of appositional compound discussed by Hatcher
herself in an earlier paper (Hatcher 1952), i.e. species-genus and cross-classification – as well as Arnaud’s brasse papillon (and incidentally also coordinative compounds) – is Similarity. This is at about the right level of generality or abstraction
as Hatcher’s original two pairs. So her four-way system can be extended to a fiveway system consisting of two pairs of asymmetric relations (which Hatcher referred
to as ‘static’ and ‘dynamic’) that account for non-appositional compounds, and a
fifth, symmetric relation that accounts for appositional compounds.
The extended system (Hatcher5) is summarized in Table 4. Following Bourque,
Hatcher’s A and B are replaced with M and H, and machine-readable codes (e.g.
HinM) have been added as alternatives to notations such as M⊃H or Ⓑ. Furthermore, Hatcher’s “static” and “dynamic” have been tentatively recast as containment and direction, respectively.
Table 4: Revised high-level classification (Hatcher5).
Contiguity-based
containment (“static”)
M⊃H
HinM
“H is contained in M” (orange seed)
M⊂H
MinH
“M is contained in H” (seed orange)
direction (“dynamic”)
M←H
HtoM
“M is the destination of H” (sugar cane)
M→H
MtoH
“M is the source of H” (cane sugar)
Similarity-based
similarity
M≊H
MisH
“H is similar or identical to M”
4 The Hatcher-Bourque classification
4.1 Description
Mapping the revised Bourque classification to the revised Hatcher system was
quite straightforward. Three of the 17 relations are based on similarity in one way
or another and thus map to the new relation:
326
–
–
Steve Pepper
taxonomy equates to what Hatcher (1952) terms the “species-genus” type
(e.g. pumice stone);
coordination and similarity correspond to two subtypes of her “cross-classification” type (exemplified by fuel oil and butterfly table).
The remaining 14 relations map neatly to Hatcher’s original two pairs of relations
as follows:
– containment, possession, partonomy, location, temporality, composition and topic are subtypes of her “static” relations; Basic forms map consistently to HinM (“B is contained in A”) and Reversed forms to MinH (“A is
contained in B”)
– direction, source-result, causation, production, usage, function and
purpose are subtypes of her “dynamic” relations; Basic forms map consistently to HtoM (“B is the source of A”) and Reversed forms to MtoH (“A is the
source of B”).
As Figure 3 shows, the Hatcher-Bourque classification operates at two main levels
of granularity, labelled Bourque29 and Hatcher5, respectively. Bourque29 consists of the 17 rather granular, low-level relations, indicated by the codes in the
five boxes at the bottom of the diagram. Of these, 12 are reversible, giving a total
of 29 (24+5) low-level (directed) ‘sub-relations’ (hence, Bourque29). The low-level
relations map to the three schematic, high-level relations of Hatcher5, labelled
similarity, containment and direction. Of these, the latter two are reversible,
for a total of five high-level (directed) ‘sub-relations’ (hence Hatcher5).
Aristotle3
Similarity
Contrast
Contiguity
SIMILARITY
CONTAINMENT
DIRECTION
(new)
(“static”)
(“dynamic”)
Hatcher5
HisM
Bourque29
TAX
TAX-R
COOR
SIM
HinM
CONT
POSS
MER
LOC
TEMP
COMP
MinH
CONT-R
POSS-R
MER-R
LOC-R
TEMP-R
COMP-R
TOP-R
Figure 3: The Hatcher-Bourque classification as a hierarchy.
HtoM
MtoH
A←B
A→B
DIR
SRC
CAUS
PROD
USG
FUNC
PURP
DIR-R
SRC-R
CAUS-R
PROD-R
USG-R
Hatcher-Bourque: Towards a reusable classification of semantic relations
327
The relations of containment and direction are both based on Contiguity,
one of Aristotle’s three principles of memory, while Similarity constitutes another
of those principles (Koch 2001: 1143). The third principle, Contrast, appears to
play only a very minor role in binominal word-formation and has not yet been
investigated in detail. It is therefore not part of this initial version of Hatcher-Bourque. However, examples do exist, for example Mandarin 东西 dōng.xī [east.west]
‘thing’ (Ceccagno & Scalise 2006: 238). This justifies including a placeholder in
Figure 3.
The complete classification is documented in the Appendix in the form of
descriptions of each individual low-level relation and an at-a-glance summary
table (Table 7).
4.2 The Bourquifier: A piece of cake
Classifying large numbers of binominals can be a daunting and error-prone
task, even with a well-documented classification that includes test frames and
examples. In order to simplify the task and reduce the risk of errors, an Excel
application called the Bourquifier has been created (Pepper 2021). This tool is
designed to assist the analyst, not to replace her. The way it works is by the
analyst typing the head and modifier (and optionally the binominal itself) into
the relevant cells, upon which all 29 templates are automatically populated.
These can then be scanned in a matter of seconds to find the most appropriate
relation.
The interface of the Bourquifier (Figure 4) shows the 17 low-level relations
of Bourque29 listed under the heading Relation, and the roles associated with
each of them in the adjoining column. (Note that for the two symmetric relations,
coordination and similarity, the two roles are the same.) These relations are
grouped according to the three high-level relations of Hatcher5: similarity, containment and direction.
To the right of the column headed Roles the interface is divided into two sections, for Basic and Reversed forms of the relation, respectively. Each section consists of four columns: one for the B29 code (e.g. tax-r), one for the corresponding
H5 code (e.g. MisH), one for the template (e.g. “(an) M is a kind of H”) and one
for the example (e.g. oak tree). For symmetric and non-reversible relations, one
section is blank.
Figure 5 shows how the Bourquifier is used to analyse a specific example, here
sunburn, which it may be recalled from §2.2 was erroneously chosen by Bourque
to exemplify his Basic cause relation (see Table 2). The populated templates in the
Bourquifier make it very clear that sunburn actually belongs under the Reversed
328
Steve Pepper
Figure 4: The Bourquifier interface.
Figure 5: The Bourquifier (‘sunburn’).
form of the relation, with the paraphrase “(a) burn that (a) sun causes”. (Note
that the highlighting on caus-r is a result of the analyst typing the code into the
red box in the top right-hand corner; it does not happen automatically.)
It is worth noting at this point that sometimes more than one paraphrase will
apply to a single binominal. For example, motor car (Figure 6) may be analysed as
– “a car that contains a motor” (cont-r: Reversed containment),
– “a car that a motor is part of” (part-r: Reversed partonomy), or
– “a car that a motor is located at/near/in” (loc-r: Reversed location).
Hatcher-Bourque: Towards a reusable classification of semantic relations
329
Figure 6: The Bourquifier (‘motor car’).
When every candidate relation maps to the same high-level relation (as is the
case here, since all three relations map to MinH), we have a simple case of overlap
between very similar relations. In such cases, either relation may be used, but
the more specific relation (here, partonomy) is usually to be preferred. However,
sometimes the candidate relations map to different high-level relations – as would
be the case if “a car that uses a motor” (usg-r: Reversed usage) were considered
an appropriate paraphrase for motor car – since this relation maps to MtoH. In
such cases the combination of concepts can be considered to be “doubly motivated”; i.e. the combination motor + car is motivated both by the partonomy
relation and by the usage relation. There is nothing untoward about this, since
there is no reason to believe that every combination of concepts should be motivated by a single relation.7
7 Those that have tried the Bourquifier have found it very helpful. Readers can see for themselves
that analysing a binominal is a piece of cake by taking part in the Hatcher-Bourque Cake Challenge. Simply download the Bourquifier (see the URL in the References) and use it to analyse the
seven examples given in §1.1. Send me your results and I will buy you coffee and cake next time
we meet. The results of my own analysis are given at the end of this chapter, but don’t change
your results to fit these. The point of the exercise is to see how much inter-annotator agreement
is achieved using the Bourquifier.
330
Steve Pepper
5 Frequency of semantic relations
The Hatcher-Bourque classification was developed as part of a broader investigation into the typology and semantics of binominal lexemes (Pepper 2020), and
it was used to classify 3,738 binominals from 106 languages denoting 100 different concepts.8 Only 83 of these binominals (2.2%) resisted classification.9 79
of them were simply unanalysable, either because of a cranberry morpheme, as
in Chakali [cli] nebi.kaŋkawal [finger.??] ‘thumb’, or because the motivation is
veiled by unfamiliar beliefs or cultural practices, as in Takia [tbc] tamol sos [man
Derris_root] ‘widower’. Four binominals use a numeral modifier to denote a day
of the week (e.g. Iraqw [irk] deelór tám [day:of three] ‘Wednesday’), for which no
appropriate relation exists. However, since such cases are more properly regarded
as instances of property modification rather than object modification, they are
outside the scope of Hatcher-Bourque. The remaining 3,650 binominals were
easily analysed using the Bourquifier. The resulting data lends itself to an analysis of the relative frequency of semantic relations cross-linguistically.
It is not unreasonable to surmise that the frequency with which different
semantic relations are used to motivate the combination of concepts in binominal
word-formation could provide insights into the way in which humans conceptualize the world. This is a topic which has hardly been addressed in the typological literature at all; to my knowledge, the only researcher to even approach the question
from a cross-linguistic perspective is Bauer (2001), who has the following to say:
In a detailed survey of just three languages, Bauer (1978: 147) points out that underlying
semantic relationships of location appear to be the most common relationships in those
languages. The same is true with the sample [of 36 languages] discussed here. Compounds
in which the head is the location of the entity denoted in the modifier (e.g. English furniture store) or where the head denotes an entity located at the modifier (e.g. English bone
cancer) are the types most frequently illustrated or commented on for the languages in my
sample across all areas. The next most frequent type to be illustrated is the type where the
head is made from the material in the modifier (e.g. English sandcastle). Other meanings
are illustrated or commented on far more sporadically. While this does not show that other
meanings are not also in common use, it does suggest that compounds may be used prototypically to indicate location or source (especially if ‘made from’, ‘made by’, ‘belonging to’
and ‘coming from’ are all interpreted as sources).
8 Sources for all material mentioned in this section can be found in Pepper (2020).
9 In addition five entries were considered to be incorrect, in the sense that the form registered in
the database does not express the intended meaning. For example, the Yaqui word muumu jo’ara
[bee house] almost certainly denotes a beehive and not beeswax, as stated in the source. Such
cases could have been analysed in their own terms (in this case as Basic possession), but instead
they were simply excluded, in order not to distort the analysis of individual meanings.
Hatcher-Bourque: Towards a reusable classification of semantic relations
331
With Hatcher-Bourque, Bauer’s three examples (bone cancer, furniture store
and sandcastle) are classified as Basic location (“a cancer located at/near/
in a bone”), Reversed location (“a store that furniture is located at/near/in”)
and Reversed composition (“a castle composed of sand”). We can now use the
binominals data to test Bauer’s conjecture. The frequencies of the Bourque29
low-level relations are investigated in §5.1, and those of the Hatcher5 high-level
relations in §5.2.
5.1 Frequency of low-level relations
The overall frequency of low-level semantic relations in the database, shown in
Figure 7, can be summarized in the following scale:
part >> purp > coor > loc > comp-r, poss > usg-r > temp > . . .
By far the most frequent relation is one that Bauer does not even mention: part.
This is the Basic partonomy (or whole) relation, in which an entity is modified by
the whole of which it is part, as in car motor. The quite extreme frequency of this
relation may be due to the large number of binominals in the database that denote
body parts, which tend to be based on this relation (as in eyelid). For this reason,
Figure 7 also shows the frequencies when body parts are excluded entirely. Apart
from the greatly reduced frequency of part and a slightly reduced frequency for
loc (the location relation) the differences are minimal. So while it may be the case
that the present data overstate the prevalence of part, it is clearly one of the most
important relations, and probably more frequent that loc and loc-r combined.
Bauer’s suggestion that the next most frequent type is the material relation
(Reversed composition), e.g. sandcastle, is also not supported by the data,
which put it at joint fifth in terms of overall frequency. Instead, the next most frequent relation is purpose, also not mentioned by Bauer. As we will see below, this
relation is especially prevalent in binominals that belong to the domain Modern
World and/or denote entities that fall into the semantic type Advanced technology (or concept).
The third most frequent relation is coordination. In the binominals data, it
is mostly found in items that denote animates of a certain age (Hawaiian [haw]
kao keiki [goat child] kid), gender (Mbyá Guaraní [gun] kavaju kunha [horse
woman] mare), or both (Ket [ket] qīm.dɯ̄l [woman.child] girl). However, it
should be borne in mind that the set of meanings on which these data are based
was designed to exclude many kinds of coordination relation (such as Vietnam-
332
Steve Pepper
Figure 7: Overall frequency of low-level semantic relations.
ese [vie] bố mẹ [father mother] parents), so the prevalence of species-attribute
combinations cannot be taken as fully representative.
The location relation (Basic location), found when words denoting eye and
water are combined to denote tear, is only the fourth most frequent. Together
with its inverse, the located relation, for example Hupdë [jup] yɔ̃ ˇh mɔy [medicine
house] hospital, it is found in 428 binominals, i.e. 12% of the data. Thus Bauer’s
suggestion that this is the most common kind of relation is clearly unsupported.
Figure 8: Number of languages that exhibit a particular relation.
Hatcher-Bourque: Towards a reusable classification of semantic relations
333
We can also look at relations in terms of the number of languages in which each
relation is attested (Figure 8). The frequency scale here is:
part > loc > coor, poss, purp > comp-r > usg-r, loc-r, prod > …
The same six relations predominate in both scales, albeit with slightly different
rankings. Note that the composite relation (comp, e.g. cube sugar) is not attested
at all in the database. Note also the infrequency of a further four – caus, src-r,
temp-r and tax – in which the modifier expresses the effect (tear gas), the source
(cane sugar), the (temporally located) activity (golf season) and the supertype
(bear cub).
The distribution across meanings (Figure 9) shows a generally similar scale,
but now with the tax-r relation displaying far greater prominence. usg now
appears among the top six, with comp-r and poss relegated to joint 9th and 11th
place:
part > coor, purp, tax-r, loc > usg, sim, usg-r, poss, loc-r, comp-r…
This suggests that while the subtype relation, tax-r (oak tree) is not especially
common, it is rather versatile in terms of the range of meanings that it can express.
Conversely, while the material (comp-r) and possessor (poss) relations are rather
frequent, their scope of application is relatively limited. It is also worth noting
that of the 46 binominals that exhibit the subtype relation (Figure 7), 18 employ
the der strategy. In many cases, the gloss indicates an (apparently redundant)
nominalizer or diminutive affixed to a root whose meaning is the same as that
of the derived form, as in Lithuanian [lit] spen.elis [nipple.dim] nipple or teat.
Overall, the data indicate that the most frequent low-level semantic relations
cross-linguistically, at least as far as binominal lexemes are concerned, are as
shown in Table 5.
Table 5: Most frequent low-level semantic relations.
Relation
Modifier role
Template
example
part
whole
(an) H that is part of (an) M
car motor
purp
purpose
(an) H intended for (an) M
animal doctor
coor
coordinand
(an) H that is also (an) M
boy king
loc
location
an H that (an) M is located at/near/in
house music
poss
possessor
(an) H that (an) M possesses
family estate
comp-r
material
(an) H composed of (an) M
sugar cube
334
Steve Pepper
Figure 9: Number of meanings that exhibit a particular relation.
Figure 10 shows how many of the nine morphosyntactic strategies (as defined
in the binominal typology described in Pepper, a, this volume) are used to express
each kind of relation. Comparison with the overall frequency scale extracted
from Figure 7 (above) shows that the most frequent relations can be expressed
by any one of the nine binominal types. This very strongly suggests that there is
no overall correlation – at the cross-linguistic level – between morphosyntactic
strategies and semantic relations. However, this should not be taken to mean that
there is no such correlation at the level of individual languages. On the contrary,
studies such as Pepper (2010) show that semantic relation can be an important
explanatory factor in the study of intra-linguistic competition between binominal
strategies.
As the data become sparser, the number of strategies associated with each
relation declines; thus, at the lower end of the scale, we find temp-r, src-r and
tax, each of which is expressed by just one or two strategies. However, since each
of these three relations is represented in the database by just two or three exemplars, this does not constitute evidence against the lack of overall correlation.
The frequency of different relations varies according to the semantic type of
the referent. Figure 11 shows the proportional distribution of the six most common
relations – part, purp, coor, loc, comp-r and poss – across seven semantic
types. The results for Animal, Natural phenomenon and Location should be
approached with caution, since these semantic types represent only 7, 5 and 12
of the 100 meanings, respectively, but the variation across the other four types is
striking.
Hatcher-Bourque: Towards a reusable classification of semantic relations
335
Figure 10: Number of binominal types that exhibit each relation.
Figure 11: Low-level relations and semantic types.
In binominals denoting Body parts the whole relation (part) accounts for
85% of the data; the only significant alternative is located (loc), which is the preferred relation for naming bodily substances, such as earwax and tear. On the
other hand, part is rarely used to denote an Advanced technology (or concept),
such as bicycle pump, keyword or railway; instead, the purpose relation predominates, accounting for over 80% of the data, with material (comp-r) the most
frequently used alternative (as in many words for railway, which is often conceptualised as a road composed of iron). In short, there is a strong tendency to name
336
Steve Pepper
(secondary) body parts/fluids in terms of the (primary) body parts they are a part
of/located at, and to name advanced concepts in terms of either their intended
function or the material they are made of.
The semantic type Basic technology (or concept) is more mixed: as with
Advanced technology (or concept), purpose and material are the most widespread
relations, but the two are now equally frequent; however, in contradistinction to
the latter, the whole (part) and location (loc) relations are also quite frequent.
These are also the most widely used relations for Natural phenomena – together
with possessor (poss), which expresses the relation between a spider and its web,
or bees and their hive, as well as phenomena viewed as belonging to some supernatural being, such as Ket Albara kàŋ ‘Milky Way, lit. Alba’s hunting trail’ and
Assamese [asm] ramdhenu ‘rainbow, lit. Lord Rama’s bow’.
Figure 12: Low-level relations and semantic fields.
A similar variation is found across semantic fields. Figure 12 shows the frequency
of the six most common semantic relations across the nine most frequent semantic fields. We note again that part plays the dominant role in The body, but also in
Agriculture and vegetation and Food and drink; and, as expected, Modern world
is dominated by the purpose relation. We see also that the patterning in Animals
and Kinship is remarkably similar: binominals in these fields have an overwhelming preference for either coor or poss. The latter is also widely used in Social and
political relations. Finally, the location relation (loc) that Bauer assumed to be
most widespread is in fact largely confined to the fields of The physical world and
Clothing and grooming.
Hatcher-Bourque: Towards a reusable classification of semantic relations
337
5.2 Frequency of high-level relations
We turn now from the low-level semantic relations of Bourque29 to the high-level
relations of Hatcher5. For ease of reference, Table 6 provides a summary of the
five high-level relations and the 29 low-level relations that map to them. (Note
that containment and direction were not used to annotate the contents of the
binominals database and therefore do not figure in the statistics of the preceding
section.) As for the low-level relations, the terms in the role column will sometimes be used in the following in order to simplify the discussion. They are in
effect shorthand labels for the Hatcher5 ‘sub-relations’.
Table 6: Summary of mappings from high- and low-level.
Hatcher5
Modifier role
Bourque29
MisH
N/A
tax-r, tax, coor, sim
HinM
container
cont, poss, part, loc, temp, comp, top
MinH
contents
cont-r, poss-r, part-r, loc-r, temp-r, comp-r
HtoM
goal
dir, src, caus, prod, usg, func, purp
MtoH
origin
dir-r, src-r, caus-r, prod-r, usg-r
The first four plots in the previous section showed how the low-level relations distribute across the database as a whole (with and without body parts), and across
languages, meanings and morphosyntactic strategies.
Figure 13 provides similar information for the high-level relations. Predictably, the information content is considerably reduced; on the other hand, the
categories are much more balanced and therefore more amenable to statistical
analysis.
The first thing to note is that every one of the nine morphosyntactic strategies
is attested in the data as expressing each of the five high-level relations (plot d);
this provides additional evidence that there is no overall, cross-linguistic correlation between morphosyntactic strategies and semantic relations. (Again, this
does not mean that such correlations do not exist within individual languages.)
The high-level container relation HinM (Hatcher’s “B is contained in A”)
accounts for nearly half of the data (a). This comes as no surprise, given that
this relation subsumes part. If body parts are excluded it has roughly the same
frequency as the goal relation HtoM (Hatcher’s “A is the destination of B”), which
subsumes the rather frequent purpose relation, among others. With body parts
included, the overall scale is as follows (>> denotes very significantly more frequent than; > denotes significantly more frequent than):
338
Steve Pepper
Figure 13: High-level relations across binominals, languages, meanings and strategies.
Hatcher-Bourque: Towards a reusable classification of semantic relations
339
HinM >> HtoM > MisH MinH > MtoH
With body parts excluded, the scale is
HinM, HtoM > MisH, MinH > MtoH
The two most frequent low-level relations (HinM and HtoM) account for two-thirds
of the data and thus suggest a pronounced tendency for a complex meaning to be
conceptualized in terms of either its container or its goal – both of which should
be interpreted in Hatcher’s very broad sense.
Plot (b) tells us that HinM is ubiquitous, occurring in every language in the
sample. However, the other four low-level relations are also widespread across
languages and they are probably also ubiquitous. The fact that they are not
attested in every language is almost certainly due to the paucity of data for some
languages: it would be highly unlikely that a language that is represented by
fewer than, say, ten data points10 would exhibit all five high-level relations.
The distribution of relations across meanings (c) shows a scale similar to the
two preceding ones –
HinM > HtoM > MisH > MinH > MtoH
– but the values are more spread out: HinM is less dominant, while MisH, the similarity-based relation added to Hatcher’s original four is higher up the scale (in
the sense that it is significantly more widespread across meanings than MinH).
This reflects what was referred to above as the versatility of the subtype relation
(tax-r). More worthy of mention, though, is the fact that none of the high-level
relations appears suited for conceptualizing anything like the full range of meanings. Even HinM, which is found in every language and accounts for over 45% of
all binominals in the database, is used with only just over half of the 100 meanings: in other words, there are limits to the versatility of conceptualizations that
are based on how an entity is (in the broadest sense) “contained”.
With regard to semantic types, Figure 14 shows clearly that the container relation (HinM) is central to the conceptualization of (secondary) Body parts and also
important for concepts that express Location or that denote Basic technologies
(or concepts) and for entities in the Natural world. On the other hand, it is marginal to the conceptualization of Persons and of almost no use when it comes
10 There are five of these in the database: Gurindji [gue], Puyuma [pyu], Selice Romani [rmc],
Datooga [tcc] and Tuwari [tww].
340
Steve Pepper
Figure 14: High-level relations and semantic types.
to Animals and Advanced technologies (or concepts). With the semantic types
Animal and Person (and only those) the similarity-based HisM relation is most
important, whereas conceptualizations that are goal-oriented – indicated by the
HtoM relation – are most frequent with Advanced technologies (and concepts),
but also encountered with other semantic types (albeit only rarely with Body
parts and Natural phenomena).
Conceptualization of an entity in terms of its contents (MinH) is considerably less common than the inverse and never the dominant form; it is found most
often with semantic types that denote Basic and Advanced technologies (and concepts) and Locations, rarely with Body parts and never with Animals. As for origin-based conceptualizations, they are mostly found with Persons (in particular,
professions), Natural phenomena, and Advanced technologies (and concepts).
Similar patterns emerge with respect to semantic fields (Figure 15, the high-level
equivalent of Figure 12). Whereas the low-level plot highlights similarities between
Animals and Kinship, the new one reveals additional commonalities, in particular
between The body, The physical world and Food and drink. In all of these, the
container relation (HinM) predominates: there is a tendency for conceptualizations
where (to quote Hatcher 1960: 363–364) the target concept, B, “is somehow, to
some extent, contained, comprehended in” the modifying concept, A.
In sum, and referring back to the notion of roles, we see that the container
(HinM) is particularly important for The body, Food and drink and The physical
world; the goal (HtoM) for the Modern world; similarity (MisH) for Kinship and
Animals; contents (MinH) for Clothing and grooming and for Social and political
relations; and origin (MtoH) for Agriculture and vegetation.
Hatcher-Bourque: Towards a reusable classification of semantic relations
341
Figure 15: High-level relations and semantic fields.
5.3 Discussion
The analysis of the data provides insights into the ways in which humans tend to
conceptualize the world. It suggests, contra Bauer, that partonomy and purpose
are far more widespread, and thus more important, than the location relation.
Of the two types of partonomy – Basic (part) and Reversed (part-r) – the former
is far more frequent than the latter, which indicates that the conceptualization of
a complex meaning is much more likely to involve modification by the whole (or,
more generally, the container) than modification by the parts (or, more generally,
the contents). The Basic partonomy relation (part) occurs most frequently with
body parts and in the semantic field of agriculture and vegetation. It can express
about one third of the 100 meanings used in this survey; it is found in all 106 languages of the sample; and it can be expressed using any one of the nine nominal
modification strategies.
Bauer’s suggestion that the next most frequent type is where the head is
made from the material in the modifier is also not supported by the data: both
purpose and coordination are much more common than composition. The
purpose relation is most often encountered in the semantic field Modern world
to denote advanced technological concepts; it only occurs in 89 of the 106 languages, no doubt because some of the languages in the sample do not have words
for concepts of that kind; significantly, the only morphosyntactic strategy that
does not occur with this relation is the classifier strategy, cls, but this is also the
most sparsely populated of all strategies. coordination is used primarily to
denote animates of a certain age, gender or both; it is therefore unsurprising that
342
Steve Pepper
it occurs mostly in the domains of kinship, animals, agriculture and vegetation.
Cases such as these account for over 90% of binominals that exhibit this relation,
and once again, every morphosyntactic strategy is attested in the data.
The Basic location relation is the fourth most frequent type overall and
occurs three times as often as its reverse; in other words, it is more usual to conceptualize an object in terms of where it is located than what is located at, near
or in it. It is found in almost all of the languages of the sample (97 out of 106) and
can be expressed by any of the nine strategies. It is most often encountered in the
fields of the natural world and basic technologies and concepts.
The other fairly frequent relations are those of possessor (Basic possession) and material (Reversed composition). The range of meanings that can be
expressed by these two is limited: only 12% in each case; all the same, they can be
expressed by any strategy. On the other hand, the reversed form of possession is
uncommon, and the Basic form of composition does not occur in the data at all
Apart from the latter, every one of the 25 relations used for annotation was
found in the data, but some were very rare, in particular those involving modification by an effect (e.g. tear gas, caus), a source (cane sugar, src-r), a temporal
activity (golf season, temp-r), or a supertype (bear cub, tax). While these are
fairly peripheral in binominals, they may be more common in other types of compounds, for example those in which the head or the modifier is an action-morph
rather than a thing-morph (see Pepper, this volume, a for the precise definition of
binominal used in the present study).
The data for the low-level relations suggests that there is no overall correlation between morphosyntactic strategy and semantic relation: many relations
are expressed by every strategy, most are expressed by almost every strategy,
and those that are expressed by just a few strategies are those where the data
is sparse. This impression is confirmed by the analysis of high-level relations:
every one of the five relations of Hatcher5 are attested with every one of the nine
morphosyntactic strategies, so we can state quite categorically that there is no
such overall correlation. It is thus not the case some strategies are used to express
some relations, while other strategies are used for other relations.
However, while this applies cross-linguistically, it does not mean that there
are no such correlations within individual languages. In fact, the opposite is the
case: As I showed in Pepper (2010), the Cameroonian language Nizaa uses leftheaded and right-headed compounds for two distinct sets of relations. Zúñiga
(2014) reports something similar for Mapudungun [arn], as does Atoyebi (2010)
for Oko [oks]. Bourque himself (p. 253) compares N N and N à N binominals in
French and shows that the two constructions have very different profiles (for
example, purpose and use account for 48% of all French N à N binominals in his
database, but only 13% of his N N binominals). Some of the contributions in this
Hatcher-Bourque: Towards a reusable classification of semantic relations
343
volume start to address this issue for other languages, but there is much work to
be done. That work would be much more productive if researchers were to adopt
the same classification system, and that is the purpose of Hatcher-Bourque.
6 Summary and further work
In this chapter I started out by providing a brief overview of previous studies on
semantic relations in compounding. I then described in some detail the systems
developed by Bourque and Hatcher and how they were harnessed in the present
study. Bourque’s classification was revised and extended with two new relations,
containment and direction, for a total of 29 relations, 12 reversible and five
unidirectional. Hatcher’s classification was also revised – by the addition of a
fifth high-level relation, similarity – in order to extend its coverage to appositional as well as non-appositional compounds. The two revised classifications
were then unified to create the two-tiered Hatcher-Bourque classification, and
an Excel-based tool called the Bourquifier was developed to assist in the slippery
task of classifying individual binominals. Both Hatcher-Bourque and the Bourquifier are offered to the research community in order to promote collaboration in
the field of semantic relations.11
It is important to state that the current version of Hatcher-Bourque (29/5/
v1) is a work-in-progress. It needs to be tested against more data from more languages. It may still need refining, through improved examples and templates,
and perhaps even the addition of more relations. Certainly contrast needs to
be fleshed out, and coordination could be subdivided to better handle the
variation currently covered by this category. Perhaps it should be possible to
distinguish between partial and full composition? If so, this can be done by
increasing the granularity. Could one conceive of logical subdivisions between
the two layers of Bourque29 and Hatcher5, such as grouping source-result, causation and production (on the one hand) and use, function and
purpose (on the other) within Hatcher’s ‘dynamic’ pairing of direction-based
relations? And why exactly are some asymmetric relations apparently nonreversible?
11 Hatcher-Bourque Cake Challenge (§4.2). The results of my analysis of the seven cake examples in §1.1 are as follows: chocolate cake: material (comp-r); birthday cake: purpose (purp);
coffee cake: UK material (comp-r) / US purpose (purp); marble cake: likeness (sim); layer cake:
material (comp-r); cup cake: container (cont); urinal cake: location (loc).
344
Steve Pepper
Appendix: Documentation for Hatcher-Bourque
29/5/v1
This appendix documents version 1 of the Hatcher-Bourque 29/5 classification.12
In the following presentation, the 17 low-level relations are grouped according to
the three high-level relations, similarity, containment and direction. Reference in square brackets, e.g. [5.2.2.1], are to the extended discussion of the relation (possibly under another name) in Bourque (2014).
1 Similarity-based relations
The similarity-based relations are those found in what Hatcher (1952) calls “appositional” compounds. Hatcher identified two basic types, “species-genus” (e.g.
pumice stone) and “cross-classification” (e.g. fuel oil, butterfly table). The former
corresponds to taxonomy, and the latter to coordination and similarity,
respectively. These all map to the similarity-based high-level relation, MisH.
TAXONOMY (supertype / subtype)
Basic
tax
MisH “an H is a kind of M” bear cub
Reversed tax-r MisH “an M is a kind of H” oak tree
The relation between a type (e.g., tree) and one of its subtypes (e.g., oak). Both
constituents satisfy the ISA test: an oak tree is an oak, and an oak tree is a tree. In
addition, and crucially, every oak is a tree. In the Basic form, the superordinate
concept is denoted by the modifier (bear in bear cub), and in the Reversed form
by the head (tree in oak tree). The Reversed form of this relation is sometimes
called the species-genus relation, and compounds that exhibit it are sometimes
called pleonastic, epexegetic or subsumptive. [5.2.2.1 hyponymy; inverted]
12 See Pepper (2020) for the earlier version. Changes between the two are documented in Pepper
(2021).
Hatcher-Bourque: Towards a reusable classification of semantic relations
345
COORDINATION (coordinand)
Symmetric coor MisH “an H that is also an M” boy king
When this relation pertains, both constituents (boy and king) satisfy the ISA test:
a boy king is both a boy and a king. However, there is no type-subtype relation
between the two: it is not the case that every boy is a king, and neither is every
king a boy. This is the crucial difference between the coordination and taxonomy relations. [5.2.2.2]
SIMILARITY (likeness)
Symmetric sim MisH “an H that is similar to an M” kidney bean
In this relation the modifying concept has some characteristic feature in common
with the referent. In the case of kidney bean, it is shape: a kidney bean is a bean
shaped like a kidney. [5.2.2.3]
2 Containment-based relations
The containment-based relations are finer-grained subtypes of Hatcher’s highlevel relations, “A is somehow, to some extent, contained, comprehended in
B” (MinH), and its inverse, “B is somehow, to some extent, contained, comprehended in A” (HinM).
CONTAINMENT (container / contents)
Basic
cont
HinM “an H that is contained in an M” orange seed
Reversed cont-r MinH “an H that contains an M”
seed orange
The relation between a container and its contents: the seed is contained in the
orange and the orange contains the seed. In orange seed, the modifier denotes the
container, whereas in seed orange, the modifier denotes the contents. (See Pepper
2020: 226–227 for further discussion.)
346
Steve Pepper
POSSESSION (possessor / possessum)
Basic
poss
HinM “an H that is possessed by an M” family estate
Reversed poss-r MinH “an H that possesses an M”
career girl
The relation between a possessor and a possessum, both in the specific sense of
ownership (family estate) and the more general sense of belonging (career girl).
[5.2.2.5; inverted]
PARTONOMY (whole / part)
Basic
Reversed
part
HinM “an H that is part of an M” car motor
part-r MinH “an H that an M is part of” motor car
The relation between a whole and one of its parts. A motor can be specified in
terms of the car of which it is a part (car motor), and a car can be specified in
terms of one of its most salient parts (motor car). [5.2.2.6 part]
LOCATION (location / located)
Basic
loc
HinM “an H located at/near/in an M”
house music
Reversed loc-r MinH “an H that M is located at/near/in” music hall
The relation between an entity or activity (the thing located) and its location. A
music hall is a hall in which (a certain kind of) music is (or was) performed. The
origin of the term ‘house music’ is unclear, but it is likely that ‘house’ refers to
the location in which the music was either created or performed. This relation
may be restricted to spatial locations; relations involving a temporal location use
temporality. [5.2.2.7]
TEMPORALITY (time / activity)
Basic
temp
HinM “an H that occurs at/during an M” summer job
Reversed temp-r MinH “an H at/during which M occurs” golf season
The relation between an entity or activity and the time period during which it
occurs, i.e. its temporal location. A summer job is something performed during
Hatcher-Bourque: Towards a reusable classification of semantic relations
347
the summer; a golf season is the time period during which golf is pursued. [5.2.2.12
time]
COMPOSITION (composite / material)
Basic
comp
HinM “an H that an M is composed of” cube sugar
Reversed comp-r MinH “an H composed of an M”
sugar cube
The relation between a composite entity and the material of which it is composed.
The relation inherent in cube sugar and sugar cube is one and the same (the cube
is composed of sugar). The difference is that the one denotes the material (sugar),
the other, the composite object (cube). [5.2.2.8; inverted]
TOPIC (entity / topic)
Basic
Reversed top MinH “an H that is about an M” history book
The relation between an entity or event and the topic that it is “about”: a history
book is a book that is about history. An alternative template – “an H that is concerned with an M” – may produce a more felicitous paraphrase, as in the case of
history department: a department that is concerned with history. [5.2.2.11]
3 Direction-based relations
The direction-based relations in this section are finer-grained subtypes of Hatcher’s high-level relations “A is somehow the source of B” (MtoH) and its inverse “B
is somehow the source of A” (HtoM).
DIRECTION (goal / origin)
Basic
dir
HtoM “an H whose goal is an M”
sun worship
Reversed dir-r MtoH “an H that is the goal of an M” sales target
348
Steve Pepper
The relation between a point of origin (usually an activity) and its goal. In sun
worship, the sun is the goal towards which the worship is directed, and a sales
target is that towards which a sales activity is directed. (See Pepper 2020: 227–228
for further discussion.)
SOURCE-RESULT (result / source)
Basic
src
HtoM “an H that is a source of an M” sugar cane
Reversed src-r MtoH “an H whose source is an M” cane sugar
The relation between a source and a result – in a general sense that does not
involve either causation or production; in sugar cane, while the cane is the source
of the sugar, it cannot felicitously be said to cause or produce it. [5.2.2.9 source;
inverted]
CAUSATION (effect / cause)
Basic
caus
HtoM “an H that causes an M” tear gas
Reversed caus-r MtoH “an H that an M causes” sunburn
The relation between a cause and an effect. Tear gas is a gas that causes tears;
sunburn is a burn that is caused by the sun. [5.2.2.10 cause]
PRODUCTION (product / producer)
Basic
prod
HtoM “an H that produces an M” song bird
Reversed prod-r MtoH “an H that an M produces” birdsong
The relation between a product and its producer. Both song bird and birdsong
involve the production of song by a bird, but whereas in the former, the modifier
denotes the product, in the latter it denotes the producer. [5.2.2.10]
USAGE (used / user)
Basic
usg
HtoM “an H that an M uses” lamp oil
Reversed usg-r MtoH “an H that uses an M” oil lamp
Hatcher-Bourque: Towards a reusable classification of semantic relations
349
The relation between something that is “used” and the entity (“user”) that uses
it. An oil lamp uses oil, and its oil is used by the lamp. In lamp oil the modifier
denotes the user, while the modifier of oil lamp denotes the thing used. [5.2.2.13
use; inverted]
FUNCTION (function / entity)
Basic
func
Reversed
HtoM “an H that serves as an M” buffer state
The relation between an entity and its function: a buffer state is a state that serves
as a buffer. Unlike purpose (below), this relation does not involve any element of
intentionality. Despite being asymmetric, it does not appear to be reversible. [5.2.2.4]
PURPOSE (purpose / entity)
Basic
purp HtoM “an H that is intended for an M” animal doctor
Reversed
The relation between an entity and its purpose: an animal doctor is a doctor
whose skills are directed towards animals. Unlike function (above), this relation involves an element of intentionality. Despite being asymmetric, it does not
appear to be reversible. [5.2.2.14]
350
Steve Pepper
Table 7: The Hatcher-Bourque classification.
Bourque29
B29
H5
Template
Example
taxonomy
supertype, subtype
tax-r
MisH
an M is a kind of H
oak tree
tax
MisH
an H is a kind of M
bear cub
coordination
coordinand, coordinand
coor
MisH
an H that is also an M
boy king
similarity
likeness, likeness
sim
MisH
an H that is similar to M
kidney bean
containment
container, contents
cont
HinM
an H that is contained in an M
orange seed
cont-r
MinH
an H that contains an M
seed orange
possession
possessor, possessum
poss
HinM
an H that is possessed by an M
family estate
poss-r
MinH
an H that possesses an M
career girl
partonomy
whole, part
part
HinM
an H that is part of an M
car motor
part-r
MinH
an H that an M is part of
motor car
location
location, located
loc
HinM
an H located at/near/in an M
house music
loc-r
MinH
an H that M is located at/near/in
music hall
temporality
time, event
temp
HinM
an H that occurs at/during an M
summer job
temp-r
MinH
an H at/during which M occurs
golf season
composition
composite, material
comp
HinM
an H that an M is composed of
cube sugar
comp-r
MinH
an H composed of an M
sugar cube
topic
entity, topic
top-r
MinH
an H that is about an M
history book
direction
goal, origin
dir
HtoM
an H whose goal is an M
sun worship
dir-r
MtoH
an H that is the goal of an M
sales target
source
result, source
src
HtoM
an H that is a source of an M
sugar cane
src-r
MtoH
an H whose source is an M
cane sugar
causation
effect, cause
caus
HtoM
an H that causes an M
tear gas
caus-r
MtoH
an H that an M causes
sunburn
production
product, producer
prod
HtoM
an H that produces an M
song bird
prod-r
MtoH
an H that an M produces
birdsong
usage
user, used
usg
HtoM
an H that an M uses
lamp oil
usg-r
MtoH
an H that uses an M
oil lamp
function
function, entity
func
HtoM
an H that serves as an M
buffer state
purpose
purpose, entity
purp
HtoM
an H that is intended for an M
animal
doctor
Hatcher-Bourque: Towards a reusable classification of semantic relations
351
References
Adams, Valerie. 1973. An introduction to modern English word-formation. London: Longman.
Arnaud, Pierre J.L. 2003. Les composés timbre-poste. Lyon: Presses Universitaires de Lyon.
Arnaud, Pierre J.L. 2016. Categorizing the modification relations in French relational
subordinative [NN]n compounds. In Pius ten Hacken (ed.), The semantics of compounding,
71–93. Cambridge: Cambridge University Press.
Atoyebi, Joseph Dele. 2010. A reference grammar of Oko: A West Benue-Congo language of
North-Central Nigeria. Rüdiger Köppe.
Baron, Irène & Michael Herslund. 2001. Semantics of the verb HAVE. In Irène Baron, Michael
Herslund & Finn Sørensen (eds.), Dimensions of possession, 85–98. Amsterdam: John
Benjamins.
Bauer, Laurie. 1978. The grammar of nominal compounding: With special reference to Danish,
English and French. Odense: Odense University Press.
Bauer, Laurie. 1979. On the need for pragmatics in the study of nominal compounding. Journal
of Pragmatics 3(1). 45–50.
Bauer, Laurie. 2001. Compounding. In Martin Haspelmath, Ekkehard König, Wolfgang
Oesterreicher & Wolfgang Raible (eds.), Language typology and language universals:
An international handbook, 695–707. Berlin: Mouton de Gruyter.
Bergsten, Nils. 1911. A study on compound substantives in English. Uppsala University PhD
dissertation.
Bourque, Yves. 2014. Toward a typology of semantic transparency: The case of French
compounds. University of Toronto PhD dissertation.
Butnariu, Cristina, Su Nam Kim, Preslav Nakov, Diarmuid Ó Séaghdha, Stan Szpakowicz
& Tony Veale. 2009. SemEval-2010 Task 9: The interpretation of noun compounds
using paraphrasing verbs and prepositions. In Proceedings of the Workshop on
Semantic Evaluations: Recent Achievements and Future Directions (DEW ’09), 100–105.
Stroudsburg, PA: Association for Computational Linguistics. http://dl.acm.org/citation.
cfm?id=1621969.1621987.
Carr, Charles Telford. 1939. Nominal compounds in Germanic. London: University of Oxford
Doctoral dissertation.
Ceccagno, Antonella & Sergio Scalise. 2006. Classification, structure and headedness of
Chinese compounds. Lingue e linguaggio V(2). 233–260.
Downing, Pamela. 1977. On the creation and use of English compound nouns. Language 53(4).
810–842.
Eiesland, Eli-Anne. 2016. The semantics of Norwegian noun-noun compounds: A corpus-based
study. University of Oslo PhD dissertation.
Girju, Roxana, Dan Moldovan, Marta Tatu & Daniel Antohe. 2005. On the semantics of noun
compounds. Computer Speech & Language 19(4). 479–496 (Special Issue on Multiword
Expression). https://doi.org/10.1016/j.csl.2005.02.006.
Girju, Roxana, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney & Deniz Yuret.
2009. Classification of semantic relations between nominals. Language Resources and
Evaluation 43(2). 105–121.
Grimm, Jacob. 1826. Deutsche Grammatik: 2. Göttingen: Dieterichsche Buchhandlung.
Hatcher, Anna Granville. 1952. Modern appositional compounds of inanimate reference.
American Speech 27(1). 3–15.
352
Steve Pepper
Hatcher, Anna Granville. 1960. An introduction to the analysis of English noun compounds.
Word 16(3). 356–373.
Jackendoff, Ray. 2009. Compounding in the Parallel Architecture and Conceptual Semantics. In
Rochelle Lieber & Pavol Štekauer (eds.), The Oxford handbook of compounding, 105–128.
Oxford: Oxford University Press.
Jackendoff, Ray. 2010. The ecology of English noun-noun compounds. In Ray Jackendoff,
Meaning and the lexicon: The parallel architecture 1975–2010, 413–451. Oxford: Oxford
University Press.
Jackendoff, Ray. 2016. English noun-noun compounds in Conceptual Semantics. In Pius ten
Hacken (ed.), The semantics of compounding, 15–53. Cambridge: Cambridge University
Press.
Jespersen, Otto. 1942. A modern English grammar on historical principles. Part 6: Morphology.
London: George Allen and Unwin.
Koch, Peter. 2001. Lexical typology from a cognitive and linguistic point of view. In Martin
Haspelmath, Ekkehard König, Wolfgang Oesterreicher & Wolfgang Raible (eds.), Language
typology and language universals: an international handbook, 1142–1178. Berlin: Mouton
de Gruyter.
Lauer, Mark. 1995. Designing statistical language learners: Experiments on compound nouns.
Macquarie University PhD dissertation.
Lees, Robert B. 1960. The grammar of English nominalizations. Bloomington: Indiana
University.
Levi, Judith N. 1978. The syntax and semantics of complex nominals. New York: Academic Press.
Marchand, Hans. 1960. The categories and types of present-day English word-formation.
Wiesbaden: Harrassowitz.
Mätzner, Eduard. 1860. Englische Grammatik. Vol. 1 Die Lehre vom Worte. Berlin:
Weidmannsche Buchhandlung.
Moldovan, Dan, Adriana Badulescu, Marta Tatu, Daniel Antohe & Roxana Girju. 2004. Models
for the semantic classification of noun phrases. In Proceedings of the HLT-NAACL Workshop
on Computational Lexical Semantics, 60–67. Association for Computational Linguistics.
Nakov, Preslav. 2013. On the interpretation of noun compounds: Syntax, semantics, and
entailment. Natural Language Engineering 19(3). 291–330.
Noailly, Michèle. 1990. Le substantif épithète. Paris: Presses Universitaires de France.
Ó Séaghdha, Diarmuid. 2008. Learning compound noun semantics. University of Cambridge,
Computer Laboratory.
Pepper, Steve. 2010. Nominal compounding in Nizaa: A cognitive perspective. SOAS University
of London Master’s thesis. https://www.academia.edu/4237937.
Pepper, Steve. 2020. The typology and semantics of binominal lexemes: Noun-noun
compounds and their functional equivalents. Oslo: University of Oslo PhD dissertation.
https://www.academia.edu/42935602.
Pepper, Steve. 2021. The Bourquifier: An application for applying the Hatcher-Bourque
classification. MS Excel. https://www.academia.edu/83122396.
Pepper, Steve. This volume, a. Defining and typologizing binominal lexemes. In Steve Pepper,
Francesca Masini & Simone Mattiola (eds.), Binominal lexemes in cross-linguistic
perspective. Berlin: Mouton de Gruyter.
Rosario, Barbara & Marti A. Hearst. 2001. Classifying the semantic relations in noun
compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference
on Empirical Methods in Natural Language Processing, 82–90.
Hatcher-Bourque: Towards a reusable classification of semantic relations
353
Ryder, Mary Ellen. 1994. Ordered chaos: The interpretation of English noun-noun compounds.
Berkeley: University of California Press.
Santen, A. van. 1979. Een nieuw voorstel voor een transformationelle behandeling van
composita en bepaalde adjectief-substantief kombinaties. Spectator 9. 240–262.
Schäfer, Martin. 2018. The semantic transparency of English compound nouns. Berlin:
Language Science Press.
Shoben, Edward J. 1991. Predicating and nonpredicating combinations. In Paula J.
Schwanenflugel (ed.), Psychology of word meanings, 117–135. Hillsdale, NJ: Psychology
Press.
Søgaard, Anders. 2005. Compounding theories and linguistic diversity. In Zygmunt Frajzyngier,
Adam Hodges & David S. Rood (eds.), Linguistic diversity and language theories, 319–337.
Amsterdam: John Benjamins.
Szubert, Andrzej. 2012. Zur internen Semantik der substantivischen Komposita im Dänischen.
Wydawnictwo Naukowe UAM.
Toquero, Luis Miguel. 2018. The semantics of Spanish compounding: An analysis of NN
compounds in the Parallel Architecture. West Virginia University MA thesis.
Tratz, Stephen & Eduard Hovy. 2010. A taxonomy, dataset, and classifier for automatic noun
compound interpretation. In 48th Annual Meeting of the Association for Computational
Linguistics, 678–687. Uppsala: Association for Computational Linguistics.
Vanderwende, Lucy. 1994. Algorithm for automatic interpretation of noun sequences. In
Proceedings of the 15th conference on Computational linguistics, vol. 2, 782–788.
Association for Computational Linguistics.
Warren, Beatrice. 1978. Semantic patterns of noun-noun compounds. Gothenburg: Acta
Universitatis Gothoburgensis.
Zúñiga, Fernando. 2014. Nominal compounds in Mapudungun. In Swintha Danielsen, Katja
Hannss & Fernando Zúñiga (eds.), Word formation in South American languages, 11–31.
Amsterdam: John Benjamins.