Internal reconstruction

Last updated December 14, 2024

Internal reconstruction is a method of reconstructing an earlier state in a language's history using only language-internal evidence of the language in question.^[1]

The comparative method compares variations between languages, such as in sets of cognates, under the assumption that they descend from a single proto-language, but internal reconstruction compares variant forms within a single language under the assumption that they descend from a single, regular form. For example, they could take the form of allomorphs of the same morpheme.

The basic premise of internal reconstruction is that a meaning-bearing element that alternates between two or more similar forms in different environments was probably once a single form into which alternation has been introduced by the usual mechanisms of sound change and analogy.^[2]^{[ better source needed ]}

Language forms that are reconstructed by internal reconstruction are denoted with the pre- prefix, as in Pre-Old Japanese, like the use of proto- to indicate a language reconstructed by means of the comparative method, as in Proto-Indo-European. (However, the pre- prefix is sometimes used for an unattested prior stage of a language, without reference to internal reconstruction.)^[3]

It is possible to apply internal reconstruction even to proto-languages reconstructed by the comparative method. For example, performing internal reconstruction on Proto-Mayan would yield Pre-Proto-Mayan. In some cases, it is also desirable to use internal reconstruction to uncover an earlier form of various languages and then submit those pre- languages to the comparative method. Care must be taken, however, because internal reconstruction performed on languages before the comparative method is applied can remove significant evidence of the earlier state of the language and thus reduce the accuracy of the reconstructed proto-language.

Role in historical linguistics

When undertaking a comparative study of an underanalyzed language family, one should understand its systems of alternations, if any, before one tackles the greater complexities of analyzing entire linguistic structures. For example, Type A forms of verbs in Samoan (as in the example below)^{[ example needed ]} are the citation forms, which are in dictionaries and word lists, but in making historical comparisons with other Austronesian languages, one should not use Samoan citation forms that have missing parts. (An analysis of the verb sets would alert the researcher to the certainty that many other words in Samoan have lost a final consonant.)

In other words, internal reconstruction gives access to an earlier stage, at least in some details, of the languages being compared, which can be valuable since the more time has passed, the more changes have been accumulated in the structure of a living language. Thus, the earliest known attestations of languages should be used with the comparative method.^{[ citation needed ]}

Internal reconstruction, when it is not a sort of preliminary to the application of the comparative method, is most useful if the analytic power of the comparative method is unavailable.^{[ citation needed ]}

Internal reconstruction can also draw limited inferences from peculiarities of distribution. Even before comparative investigations had sorted out the true history of Indo-Iranian phonology, some scholars had wondered if the extraordinary frequency of the phoneme /a/ in Sanskrit (20% of all phonemes together, an astonishing total) might point to some historical fusion of two or more vowels. (In fact, it represents the final outcome of five different Proto-Indo-European syllabics whose syllabic states of /m/ and /n/ can be discerned by the application of internal reconstruction.) However, in such cases, internal analysis is better at raising questions than at answering them. The extraordinary frequency of /a/ in Sanskrit hints at some sort of historical event but does not and cannot lead to any specific theory.

Issues and shortcomings

Neutralizing environments

One issue in internal reconstruction is neutralizing environments, which can be an obstacle to historically correct analysis. Consider the following forms from Spanish, spelled phonemically rather than orthographically:

Infinitive		Third person singular
bolbér	(re)turn	buélbe
probár	test	pruéba
dormír	sleep	duérme
morír	die	muére
ponér	place	póne
doblár	fold	dóbla
goθár	enjoy	góθa
korrér	run	kórre

One pattern of inflection shows alternation between /o/ and /ue/; the other type has /o/ throughout. Since those lexical items are all basic, not technical, high-register or obvious borrowings, their behavior is likely to be a matter of inheritance from an earlier system, rather than the result of some native pattern overlaid by a borrowed one. (An example of such an overlay would be the non-alternating English privative prefix un- compared to the alternating privative prefix in borrowed Latinate forms, in-, im, ir-, il-.)

One might guess that the difference between the two sets can be explained by two different native markers of the third-person singular, but a basic principle of linguistic analysis is that one cannot and should not try to analyze data that one does not have. Also, positing such a history violates the principle of parsimony (Occam's Razor) by unnecessarily adding a complication to the analysis whose chief result is to restate the observed data as a sort of historical fact. That is, the result of the analysis is the same as the input. As it happens, the forms as given yield readily to real analysis and so there is no reason to look elsewhere.

The first assumption is that in pairs like bolbér/buélbe, the root vowels were originally the same. There are two possibilities: either something happened to make an original */o/ turn into two different sounds in the third-person singular, or the distinction in the third-singular is original and the vowels of the infinitives are in what is called a neutralizing environment (if an original contrast is lost because two or more elements "fall together", or coalesce into one). There is no way of predicting when /o/ breaks to /ué/ and when it remains /ó/ in the third-person singular. On the other hand, starting with /ó/ and /ué/, one can write an unambiguous rule for the infinitive forms: /ué/ becomes /o/. One might notice further, upon looking at other Spanish forms, that the nucleus /ue/ is found only in stressed syllables even other than in verb forms.

That analysis gains plausibility from the observation that the neutralizing environment is unstressed, but the nuclei are different in stressed syllables. That fits with vowel contrasts often being preserved differently in stressed and unstressed environments and that the usual relationship is that there are more contrasts in stressed syllables than in unstressed ones since previously-distinctive vowels fell together in unstressed environments.

The idea that original */ue/ might fall together with original */o/ is unproblematic and so internally, a complex nucleus *ue can be reconstructed that remains distinct when it is stressed and coalesces with *o when it is unstressed.

However, the true history is quite different: there were no diphthongs in Proto-Romance. There was an *o (reflecting Latin ŭ and ō) and an *ɔ (reflecting Latin ŏ). In Spanish the two fell together in unstressed syllables, as in all other Romance languages, but *ɔ broke into the complex nucleus /ue/ in stressed syllables. Internal reconstruction accurately points to two different historical nuclei in unstressed /o/ but gets the details wrong.

Shared innovations

When applying internal reconstruction to related languages prior to applying the comparative method, one must check that the analysis does not remove the shared innovations that characterize subgroups. An example is consonant gradation in Finnish, Estonian, and Sami. A pre-gradation phonology can be derived for each of the three groups by internal reconstruction, but it was actually an innovation in the Finnic branch of Uralic, rather than the individual languages. Indeed, it was one of the innovations defining that branch. That fact would be missed if the comparanda of the Uralic family included as primary data the "degraded" states of Finnish, Estonian, and Sami.^[4]^[5]

Lost conditioning factors

Not all synchronic alternation is amenable to internal reconstruction. Even if a secondary split (see phonological change) often results in alternations that signal a historical split, the conditions involved are usually immune to recovery by internal reconstruction. For example, the alternation of voiced and voiceless fricatives in Germanic languages, as described in Verner's law, cannot be explained only by examining the Germanic forms themselves.

Despite that general characteristic of secondary split, internal reconstruction can occasionally work. A primary split is, in principle, recoverable by internal reconstruction whenever it results in alternations, but later changes can make the conditioning irrecoverable.

Examples

English

English has two patterns for forming the past tense in roots ending in apical stops: /td/.

Type I
Present	Past
adapt	adapted
fret	fretted
greet	greeted
note	noted
reflect	reflected
regret	regretted
rent	rented
wait	waited
waste	wasted
abide	abided
blend	blended
end	ended
found	founded
fund	funded
grade	graded
plod	plodded

Type II
Present	Past
cast	cast
cut	cut
put	put
set	set
meet	met
bleed	bled
read /rid/	read /red/
rid	rid
shed	shed
bend	bent
lend	lent
send	sent

Although Modern English has very little affixal morphology, its number includes a marker of the preterite, other than verbs with vowel changes of the find/found sort, and almost all verbs that end in /td/ take /ɪd/ as the marker of the preterite, as seen in Type I.

Comparing between the verbs of Type I and Type II, those in Type II are all basic vocabulary (This is a claim about Type II verbs and not about basic verbs since there are basic verbs in Type I also). However, no denominative verbs (those formed from nouns like to gut, to braid, to hoard, to bed, to court, to head, to hand) are in Type II. There are no verbs of Latin or French origin; all stems like depict, enact, denote, elude, preclude, convict are Type I. Furthermore, all new forms are inflected as Type I and so all native speakers of English would presumably agree that the preterites of to sned and to absquatulate would most likely be snedded and absquatulated.

That evidence shows that the absence of a "dental preterite" marker on roots ending in apical stops in Type II reflects a more original state of affairs. In the early history of the language, the "dental preterite" marker was in a sense absorbed into the root-final consonant when it was /t/ or /d/, and the affix /ɪd/ after word-final apical stops then belonged to a later stratum in the evolution of the language. The same suffix was involved in both types but with a total reversal of "strategy." Other exercises of internal reconstruction would point to the conclusion that the original affix of the dental preterites was /Vd/ (V being a vowel of uncertain phonetics). A direct inspection of Old English would certainly reveal several different stem-vowels involved. In modern formations, stems that end in /td/ preserve the vowel of the preterite marker. As oddly as it might seem, the loss of the stem vowel had taken place already whenever the root ended in an apical stop before the first written evidence.

Latin

Latin has many examples of "word families" showing vowel alternations. Some of them are examples of Indo-European ablaut: pendō "weigh", pondus "a weight"; dōnum "gift", datum "a given", caedō "cut" perf. ce-cid-, dīcō "speak", participle dictus, that is, inherited from the proto-language (all unmarked vowels in these examples are short), but some, involving only short vowels, clearly arose within Latin: faciō "do", participle factus, but perficiō, perfectus "complete, accomplish"; amīcus "friend" but inimīcus "unfriendly, hostile"; legō "gather", but colligō "bind, tie together", participle collectus; emō "take; buy", but redimō "buy back", participle redemptus; locus "place" but īlicō "on the spot" (< *stloc-/*instloc-); capiō "take, seize", participle captus but percipiō "lay hold of", perceptus; arma "weapon" but inermis "unarmed"; causa "lawsuit, quarrel" but incūsō "accuse, blame"; claudō "shut", inclūdō "shut in"; caedō "fell, cut", but concīdō "cut to pieces"; and damnō "find guilty" but condemnō "sentence" (verb). To oversimplify, vowels in initial syllables never alternate in this way, but in non-initial syllables short vowels of the simplex forms become -i- before a single consonant and -e- before two consonants; the diphthongs -ae- and -au- of initial syllables alternate respectively with medial -ī- and -ū-.

As happened here, reduction in contrast in a vowel system is very commonly associated with position in atonic (unaccented) syllables, but Latin's tonic accent of reficiō and refectus is on the same syllable as simplex faciō, factus, which is true of almost all of the examples given (cólligō, rédimō, īlicō (initial-syllable accent) are the only exceptions) and indeed for most examples of such alternations in the language. The reduction of contrast points in the vowel system (-a- and -o- fall together with -i- before a single consonant, with -e- before two consonants; long vowels replace diphthongs) must not have had anything to do with the location of the accent in attested Latin.

The accentual system of Latin is well-known, partly from statements by Roman grammarians and partly from agreements among the Romance languages on the location of tonic accent: the tonic accent in Latin fell three syllables before the end of any word with three or more syllables unless the second-last syllable (called the penult in classical linguistics) was "heavy" (contained a diphthong or a long vowel or was followed by two or more consonants). Then, that syllable had the tonic accent: perfíciō, perféctus, rédimō, condémnō, inérmis.

If there is any connection, between word-accent and vowel-weakening, the accent in question cannot be that of Classical Latin. Since the vowels of initial syllables do not show that weakening (to oversimplify a bit), the obvious inference is that in prehistory, the tonic accent must have been an accent that was always on the first syllable of a word. Such an accentual system is very common in the world's languages (Czech, Latvian, Finnish, Hungarian, and, with certain complications, High German and Old English) but was definitely not the accentual system of Proto-Indo-European.

Therefore, on the basis of internal reconstruction within Latin, a prehistoric sound-law can be discovered that replaced the inherited accentual system with an automatic initial-syllable accent, which itself was replaced by the attested accentual system. As it happens, Celtic languages also have an automatic word-initial accent that is subject, like the Germanic languages, to certain exceptions, mainly certain pretonic prefixes. Celtic, Germanic and Italic languages share some other features as well, and it is tempting to think that the word-initial accent system was an areal feature, but that would be more speculative than the inference of a prehistoric word-initial accent for Latin specifically.

There is a very similar set of givens in English but with very different consequences for internal reconstruction. There is pervasive alternation between long and short vowels (the former now phonetically diphthongs): between /aɪ/ and /ɪ/ in words like divide, division; decide, decision; between /oʊ/ and /ɒ/ in words like provoke, provocative; pose, positive; between /aʊ/ and /ʌ/ in words like pronounce, pronunciation; renounce, renunciation; profound, profundity and many other examples. As in the Latin example, the tonic accent of Modern English is often on the syllable showing the vowel alternation.

In Latin, an explicit hypothesis could be framed on the location of word-accent in prehistoric Latin that would account for both the vowel alternations and the attested system of accent. Indeed, such a hypothesis is hard to avoid. By contrast, the alternations in English point to no specific hypothesis but only a general suspicion that word accent must be the explanation, and that the accent in question must have been different from that of Modern English. Where the accent used to be and what the rules, if any, are for its relocation in Modern English cannot be recovered by internal reconstruction. In fact, even the givens are uncertain: it is not possible to tell even whether tonic syllables were lengthened or atonic syllables were shortened (actually, both were involved).

Part of the problem is that English has alternations between diphthongs and monophthongs (between Middle English long and short vowels, respectively) from at least six different sources, the oldest (such as in write, written) dating back to Proto-Indo-European. However, even if it were possible to sort out the corpus of affected words, sound changes after the relocation of tonic accent have eliminated the necessary conditions for framing accurate sound laws. It is actually possible to reconstruct the history of the English vowel system with great accuracy but not by internal reconstruction.

In short, during the atonic shortening, the tonic accent was two syllables after the affected vowel and was later retracted to its current position. However, words like division and vicious (compare vice) have lost a syllable in the first place, which would be an insuperable obstacle to a correct analysis.

Notes

↑ Matthews, P.H. (2014). The Concise Oxford Dictionary of Linguistics (3.ed). Oxford University Press. ISBN 9780191753060.
↑ Smith, Jennifer L. (2012-10-31). "LING 202 Lecture Outline" (PDF). The University of North Carolina at Chapel Hill (PDF). p. 5. Archived from the original (PDF) on 2014-01-08. Retrieved 7 January 2014.
↑ Campbell, Lyle (2013). Historical Linguistics (3rd ed.). Edinburgh University Press. p. 199. ISBN 978-0-7486-7559-3.
↑ Anttila, Raimo (1989). Historical and Comparative Linguistics . John Benjamins. p. 274. ISBN 978-90-272-86086.
↑ Campbell (2013), pp. 211–212.

Related Research Articles

In linguistics, the Indo-European ablaut is a system of apophony in the Proto-Indo-European language (PIE).

Proto-Germanic is the reconstructed proto-language of the Germanic branch of the Indo-European languages.

Proto-Balto-Slavic is a reconstructed hypothetical proto-language descending from Proto-Indo-European (PIE). From Proto-Balto-Slavic, the later Balto-Slavic languages are thought to have developed, composed of the Baltic and Slavic sub-branches, and including modern Lithuanian, Polish, Russian and Serbo-Croatian, among others.

Proto-Celtic, or Common Celtic, is the hypothetical ancestral proto-language of all known Celtic languages, and a descendant of Proto-Indo-European. It is not attested in writing but has been partly reconstructed through the comparative method. Proto-Celtic is generally thought to have been spoken between 1300 and 800 BC, after which it began to split into different languages. Proto-Celtic is often associated with the Urnfield culture and particularly with the Hallstatt culture. Celtic languages share common features with Italic languages that are not found in other branches of Indo-European, suggesting the possibility of an earlier Italo-Celtic linguistic unity.

Spanish verbs are a complex area of Spanish grammar, with many combinations of tenses, aspects and moods. Although conjugation rules are relatively straightforward, a large number of verbs are irregular. Among these, some fall into more-or-less defined deviant patterns, whereas others are uniquely irregular. This article summarizes the common irregular patterns.

The phonology of the Persian language varies between regional dialects, standard varieties, and even from older varieties of Persian. Persian is a pluricentric language and countries that have Persian as an official language have separate standard varieties, namely: Standard Dari (Afghanistan), Standard Iranian Persian (Iran) and Standard Tajik (Tajikistan). The most significant differences between standard varieties of Persian are their vowel systems. Standard varieties of Persian have anywhere from 6 to 8 vowel distinctions, and similar vowels may be pronounced differently between standards. However, there are not many notable differences when comparing consonants, as all standard varieties have a similar number of consonant sounds. Though, colloquial varieties generally have more differences than their standard counterparts. Most dialects feature contrastive stress and syllable-final consonant clusters. Linguists tend to focus on Iranian Persian, so this article may contain less adequate information regarding other varieties.

Ancient Greek phonology is the reconstructed phonology or pronunciation of Ancient Greek. This article mostly deals with the pronunciation of the standard Attic dialect of the fifth century BC, used by Plato and other Classical Greek writers, and touches on other dialects spoken at the same time or earlier. The pronunciation of Ancient Greek is not known from direct observation, but determined from other types of evidence. Some details regarding the pronunciation of Attic Greek and other Ancient Greek dialects are unknown, but it is generally agreed that Attic Greek had certain features not present in English or Modern Greek, such as a three-way distinction between voiced, voiceless, and aspirated stops ; a distinction between single and double consonants and short and long vowels in most positions in a word; and a word accent that involved pitch.

Proto-Indo-European nominals include nouns, adjectives, and pronouns. Their grammatical forms and meanings have been reconstructed by modern linguists, based on similarities found across all Indo-European languages. This article discusses nouns and adjectives; Proto-Indo-European pronouns are treated elsewhere.

The phonology of the Proto-Indo-European language (PIE) has been reconstructed by linguists, based on the similarities and differences among current and extinct Indo-European languages. Because PIE was not written, linguists must rely on the evidence of its earliest attested descendants, such as Hittite, Sanskrit, Ancient Greek, and Latin, to reconstruct its phonology.

Gothic is an extinct East Germanic language that was spoken by the Goths. It is known primarily from the Codex Argenteus, a 6th-century copy of a 4th-century Bible translation, and is the only East Germanic language with a sizeable text corpus. All others, including Burgundian and Vandalic, are known, if at all, only from proper names that survived in historical accounts, and from loanwords in other, mainly Romance, languages.

Proto-Indo-European accent refers to the accentual (stress) system of the Proto-Indo-European language.

Lithuanian has 11 vowels and 45 consonants, including 22 pairs of consonants distinguished by the presence or absence of palatalization. Most vowels come in pairs which are differentiated through length and degree of centralization.

The Proto-Italic language is the ancestor of the Italic languages, most notably Latin and its descendants, the Romance languages. It is not directly attested in writing, but has been reconstructed to some degree through the comparative method. Proto-Italic descended from the earlier Proto-Indo-European language.

Proto-Sámi is the hypothetical, reconstructed common ancestor of the Sámi languages. It is a descendant of the Proto-Uralic language.

French exhibits perhaps the most extensive phonetic changes of any of the Romance languages. Similar changes are seen in some of the northern Italian regional languages, such as Lombard or Ligurian. Most other Romance languages are significantly more conservative phonetically, with Spanish, Italian, and especially Sardinian showing the most conservatism, and Portuguese, Romanian, Catalan, and Occitan showing moderate conservatism.

Proto-Slavic is the unattested, reconstructed proto-language of all Slavic languages. It represents Slavic speech approximately from the 2nd millennium BC through the 6th century AD. As with most other proto-languages, no attested writings have been found; scholars have reconstructed the language by applying the comparative method to all the attested Slavic languages and by taking into account other Indo-European languages.

Proto-Finnic or Proto-Baltic-Finnic is the common ancestor of the Finnic languages, which include the national languages Finnish and Estonian. Proto-Finnic is not attested in any texts, but has been reconstructed by linguists. Proto-Finnic is itself descended ultimately from Proto-Uralic.

Proto-Slavic accent is the accentual system of Proto-Slavic and is closely related to the accentual system of some Baltic languages with which it shares many common innovations that occurred in the Proto-Balto-Slavic period. Deeper, it inherits from the Proto-Indo-European accent. In modern languages the prototypical accent is reflected in various ways, some preserving the Proto-Slavic situation to a greater degree than others.

This article is about the phonology and phonetics of the Galician language.

Proto-Romance is the result of applying the comparative method to reconstruct the latest common ancestor of the Romance languages. To what extent, if any, such a reconstruction reflects a real état de langue is controversial. The closest real-life counterpart would have been (vernacular) Late Latin.

References

Philip Baldi, ed. Linguistic change and reconstruction methodology. Berlin-NY: Mouton de Gruyter, 1990.
Campbell, Lyle (2004). Historical Linguistics: An Introduction (2nd ed.). Cambridge (Mass.): The MIT Press. ISBN 0-262-53267-0..
Anthony Fox. Linguistic Reconstruction: An Introduction to Theory and Method. Oxford: Oxford University Press, 1995. ISBN 0-19-870001-6.
T. Givón. “Internal reconstruction: As method, as theory”, Reconstructing grammar: comparative linguistics and grammaticalization, ed. Spike Gildea. Amsterdam–Philadelphia: John Benjamins, 2000, pp. 107–160.
Jerzy Kuryłowicz. “On the Methods of Internal Reconstruction”, Proceedings of the Ninth International Congress of Linguists, Cambridge, Mass., August 27–31, 1962, ed. Horace G. Lunt. The Hague: Mouton, 1964.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Matthews, P.H. (2014). The Concise Oxford Dictionary of Linguistics (3.ed). Oxford University Press. ISBN 9780191753060.

[LING-PPT-2] Smith, Jennifer L. (2012-10-31). "LING 202 Lecture Outline" (PDF). The University of North Carolina at Chapel Hill (PDF). p. 5. Archived from the original (PDF) on 2014-01-08. Retrieved 7 January 2014.

[3] Campbell, Lyle (2013). Historical Linguistics (3rd ed.). Edinburgh University Press. p. 199. ISBN 978-0-7486-7559-3.

[4] Anttila, Raimo (1989). Historical and Comparative Linguistics . John Benjamins. p. 274. ISBN 978-90-272-86086.

[5] Campbell (2013), pp. 211–212.

[1]

[2]

[3]

[4]

[5]

v t e Long-range comparative linguistics
Concepts	Comparative method Etymological dictionary Glottochronology Lexicostatistics Linguistic reconstruction Internal reconstruction Linguistic universal Macrofamily Mass comparison Origin of language Paleolinguistics Proto-language Swadesh list Dolgopolsky list Leipzig–Jakarta list
Language families	Proto-human Borean Amerind Nostratic Elamo-Dravidian Eurasiatic Altaic Ural-Altaic Indo-Uralic Sino-Uralic Dené–Caucasian North Caucasian Austric Indo-Pacific
Linguists	John Bengtson Václav Blažek Allan R. Bomhard Svetlana Burlak Aharon Dolgopolsky Vladimir Dybo Harold C. Fleming Joseph Greenberg Eugene Helimski Murray Gell-Mann Vladislav Illich-Svitych Frederik Kortlandt Alexis Manaster Ramer Sergei Nikolaev Sorin Paliga Holger Pedersen Ilia Peiros Martine Robbeets Merritt Ruhlen Vitaly Shevoroshkin Georgiy Starostin Sergei Starostin Alfredo Trombetti
Journals	Journal of Language Relationship Mother Tongue
Books	Etymological Dictionary of the Altaic Languages The Languages of Africa
Institutions and schools	Evolution of Human Languages Institute of Linguistics of the Russian Academy of Sciences Moscow School of Comparative Linguistics Russian State University for the Humanities Santa Fe Institute
Linguistics portal Category