Jump to content

Manual:Representing arrays

From mediawiki.org
Revision as of 19:08, 8 September 2024 by Pppery (alt) (talk | contribs) (Clean up links to Meta,)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
This page was originally written on Meta-Wiki in 2010 as m:Help:Array, before the installation or development of Scribunto. It was minimally reviewed and transferred to MediaWiki in 2024 as part of Project:MediaWiki documentation on Meta-Wiki. While the techniques described here should in theory still work, it is generally recommended to use Scribunto for complicated computations rather than the series of templates described here

This page deals with storage of data in pages, and retrieval of these data by the same and other pages. One advantage can be that making a change at one place automatically changes occurrences of the data item at more places. Also it can be easier to make multiple changes if several data are on a single page. This applies in particular if data likely to be changed at the same time (because updates become available together, or because an editor reviews these data together) are on the same page.

A versatile way of storing a data item is as the include part of a page (see also Arrays with a template for each element, below). It can then be used on the same and other pages, independent of other data. Less versatile variants are storing a data item as the part of a page applicable for the representation on the page itself only (such as the noinclude part, with the rest of the page includeonly), and storing a data item B between data items A and C, so that B can in principle not be retrieved separately, but only in the combination ABC. There is some more versatility if the data are stored as parameter values of a template call: by changing the content of the template the use of the data on the page can be changed.

Below are methods to extract subdata from a data item, or put differently, to construct a larger data item (a data structure) from smaller ones such that the smaller ones are independently retrievable. Where a matrix or 2D array is referred to, other terms like record and field may also be applicable. The term table is not used for a data collection, but reserved for the display format.

Extraction of data from a string or number

[edit]

A substring of up to 500 characters starting at the start of a string can easily be extracted. For other substrings (including single characters other than the first) this is expensive, and therefore mainly possible if they are short, somewhere in the first part of the string, and/or with for each position a small character set usually containing the character in that position. See Help:String functions. Whenever possible, if substrings of a string are needed it can best be stored as a collection of substrings. E.g., if for string ABCD we need the substring CD, then we can store (or pass on as parameter values) AB and CD as separate data items.

Data can also be stored in the form of numbers. For example, the date 17 July 2010 can be stored as 20100717, which is still fairly human-readable. Subdata such as digits (and in the example the day of the month, the month number, and the year) can be extracted by a computation. If the numbers represent discrete values, care should be taken that rounding does not cause wrong retrieval.

An expression of type integer such as trunc(9134567890e9)+trunc123456789 (length: 34, or if negative 35) can hold 64 bits. For non-negative ones we can use Template:Digit to retrieve the 63 bits, or 16 decimal digits (and also most 17-digit numbers).

An integer of 15 digits (corresponding to 50 bits) can be stored as float, or 53 bits can be stored in a 16-digit number. Again we can use Template:Digit to retrieve bits, decimal digits, etc).

  • {{digit|12345678901234|4}} gives 1 [1] (4th digit from the right)

Retrieval of 2-digit part:

  • {{digit|12345678901234|4|100}} gives 78 (4th 2-digit number from the right)

If a 2-digit number represents a character according to something like the ASCII code, we can decode it with something like Template:Chrfn:

  • {{chrfn|{{digit|12345678901234|4|100}}}} gives N

Thus a string can be stored in coded form, with (in the case of a character set of up to 100 characters) up to 7 characters in one number:

  • "{{chr7|23091109}}" → "wiki" [2]

If a data set is stored in coded form as a collection of numbers we can always after retrieving a number apply some final extraction and/or decoding.

Parameter selection templates

[edit]

Templates that simply return one of the parameters (parameter selection templates, mathematically projections) are e.g. Template:P1, Template:P2, Template:P3,.. (list). They can be used to store data in wikitext of the form {{pindex|data1|data2|..}} where index is something that can vary, e.g. a template parameter, taking some positive integers as values.

Example: {{p{{{1}}}|73130107091405|32013223151812}}

If this is the content of a template, retrieving one value gives a preprocessor node count of 8 + the number of named parameters in the call of template p{{{1}}} (so here just 8)[1], a post-expand include size of twice the size of the result, and a template argument size equal to the length of the index plus the size of the result.

Alternatively the generic Template:Pp can be used, with the index value one more because it takes itself a parameter position:

{{pp|{{{1}}}|73130107091405|32013223151812}}

If this is the content of a template, retrieving one value gives a preprocessor node count of 10 + the number of named parameters in the call of template pp (so here just 10)[1], a post-expand include size of twice the size of the result, and a template argument size equal to twice the length of the index plus the size of the result.

Of the return values only the actual one, if any, is expanded. This is relevant for the template limits and the response time.

A limitation compared to #switch is that no default can be specified. However, an outer #ifexpr can take care of that.

#switch

[edit]

Two data items (numbers or strings) can be stored in an if-then-else construct, e.g {{#if:{{{1}}}|73130107091405|32013223151812}}. More can be stored by nesting these, or using #switch, e.g. {{#switch:{{{1}}}|1=73130107091405|2=32013223151812|4320914322308}}.

Of the return values only the actual one, if any, is expanded. The preprocessor node count has a minimum of 8 in the case of immediate match, and is 2 more for each extra step[1]; if all cases are equally frequent the average node count for a large switch is half the number of cases; if not it is advantageous to order the cases by decreasing frequency; the post-expand include size is twice the size of the result; the template argument size is the length of the index.

In if-then-else constructs (also nested #ifeq's equivalent with #switch), similarly only the items needed for evaluating the conditions and the actual return value are expanded. However, if a switch is written as nested if-then-else construct, the equivalent of what in the switch is the index is used multiple times, and counted accordingly for the post-expand include size.

For example, {{#switch:A|B=C|D=E|F=G|H=I}} is equivalent with {{#ifeq:A|B|C|{{#ifeq:A|D|E|{{#ifeq:A|F|G|{{#ifeq:A|H|I}}}}}}}}. If A=F then the include size is s(A)+2s(B)+2s(D)+2s(F)+s(G) and 3s(A)+s(B)+s(D)+s(F)+s(G) respectively, with s(T) the include size of T, so either one can be larger, but in the case that B, D, F, and H are simple numbers, like above, this reduces to s(A)+s(G) and 3s(A)+s(G) respectively.

A disadvantage of #switch compared with the projection templates is that the preprocessor node count increases by 2 for every item in the switch list until the match. Retrieving all items from a switch with 1000 items would therefore give a preprocessor node count of 1,000,000, the maximum for a page. This can be reduced considerably by using nested switches, but this is limited due to the expansion depth limit. With the projection templates there is no need for nesting.

If the preprocessor node count is a concern, multiple use of any particular result of #switch should only be done through multiple use of a template parameter to which this result is assigned.

#titleparts

[edit]

Up to to 25 strings or numbers separated by slashes can be stored as a single string, with the individual items and subsequences retrievable by parser function #titleparts. Since it was designed for a page title some restrictions apply (in particular, the amount of memory including that for the slashes is limited to 255 bytes) and some conversions to a canonical form may be applied. Care should be taken that undesired changes do not occur. This may be done by choosing the order of the data (with for example a page name or number in the first position), or not using the first position, leaving it empty.

A limitation compared to #switch is that no default can be specified.

Examples:

  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|3}}" gives "Wed" [3]
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|5}}" gives "Fri" [4]
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|0}}" gives "Mon" [5] (index 0 gives the first item)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|8}}" gives "" [6] (index > (no. of items) gives nothing)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|-2}}" gives "Sat" [7] (index < 0 counts from the right)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|-8}}" gives "Mon" [8] (index ≤ - (no. of items) gives the first item)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|-9}}" gives "Mon" [9] (ditto)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|3.8}}" gives "Wed" [10] (index is rounded toward 0 to an integer)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|1|-3.8}}" gives "Fri" [11] (ditto)
  • "{{#titleparts:Mon/Tue/Wed/Thu/Fri/Sat/Sun|3|2}}" gives "Tue/Wed/Thu" [12] (three items starting with the second)
  • "{{#titleparts:w:Earth/Earth/63e5/6e24|1|1}}" gives "w:Earth" [13]
  • "{{#titleparts:w:Earth/Earth/63e5/6e24|1|2}}" gives "Earth" [14]
  • combined: [[{{#titleparts:w:Earth/Earth/63e5/6e24|1|1}}|{{#titleparts:w:Earth/Earth/63e5/6e24|1|2}}]] has an average density of {{#expr:{{#titleparts:w:Earth/Earth/63e5/6e24|1|4}}/4*3/pi/{{#titleparts:w:Earth/Earth/63e5/6e24|1|3}}^3round0}} kg/m<sup>3</sup> gives "Earth has an average density of 5729 kg/m3". If we put this in a template to do the same for any spherical body we can use "w:Earth/Earth/63e5/6e24" as a single parameter.

A difference between #titleparts on one hand and projection templates and #switch on the other hand is that the list of data items with slashes is itself a single data item. Thus it can e.g. be passed on as a single parameter. A disadvantage of that is it that regardless of which items are required, the whole string is evaluated first.

For example, if A expands to 1, 2, 3 or 4, {{#titleparts:C/E/G/I|1|A}} is, in terms of the result, approximately equivalent with {{#switch:A|1=C|2=E|3=G|I}} and {{#ifeq:A|1|C|{{#ifeq:A|2|E|{{#ifeq:A|3|G|I}}}}}}. However, if e.g. A=3 then the include size is s(A)+s(C)+s(E)+s(G)+s(I), s(A)+s(G) and 3s(A)+s(G) respectively.

On the other hand, since the slashes need not be explicit in the wikitext (they can be produced by expansion), there is extra flexibility.

Due to #titleparts the largest amount of data in a single data item from which all details are independently retrievable without expensive string templates is 240 digits, in a string of 16 numbers of 15 digits each, or 238 digits, in a string of 17 numbers of 14 digits each, or 15 numbers of 16 digits, representing 53 bits each, together 795 bits. In each case the numbers are separated by slashes. (This applies because digits and slashes each use 1 byte). With a character set of 99 characters it can represent a string of 120 or 119 characters, with a character set of 999 characters a string of 80 characters. (The codes 00 and 000 are not assigned to a character and displayed as the empty string since e.g. 0012 represents the same number as 12.)

For example:

  • "{{chr119|73130107091405/32013223151812/04320914322308/09030832052205/18253219091407/12053216051819/15143215143220/08053216120114/05203209193207/09220514320618/05053201030305/19193220153220/08053219211332/15063201121232/08211301143211/14152312050407/0546}}" → "Imagineaworldinwhicheverysinglepersonontheplanetisgivenfreeaccesstothesumofallhumanknowledge." [15][2]

Extracting the 100th character:

  • {{chrfn|{{digit|{{#titleparts:73130107091405/32013223151812/04320914322308/09030832052205/18253219091407/12053216051819/15143215143220/08053216120114/05203209193207/09220514320618/05053201030305/19193220153220/08053219211332/15063201121232/08211301143211/14152312050407/0546|1|{{#expr:ceil(100/7)}}}}|7-(100-1)mod7|100}}}} gives "u".

Compare:

  • 17 data items instead of a single one:
    • {{chrfn|{{digit|{{pp|{{#expr:ceil(100/7)+1}}|73130107091405|32013223151812|04320914322308|09030832052205|18253219091407|12053216051819|15143215143220|08053216120114|05203209193207|09220514320618|05053201030305|19193220153220|08053219211332|15063201121232|08211301143211|14152312050407|0546}}|7-(100-1)mod7|100}}}} gives "u".

If the titleparts method is applied to a coded wikitext it is not expanded and xml-style tags are not applied, but table and formatting code are applied as usual:

  • "{{chr119|32272720032929/32393939161718/39393932192021/32601301200862/50942749482960/471301200862}}" → "{{tc}}pqrstu<math>2^{10}</math>" [16]

The ExpandTemplates link shows the rendering after expanding the expanded wikitext again, which is in these cases not the same as the rendering after the regular one-pass expansion.

Nesting data structures

[edit]

Data structures can be nested. For example, a matrix element can be extracted from a matrix with an outer data structure specifying the row number and for each row an inner data structure specifying a column number.

#titleparts is less suitable as the outer data structure, because that would cause expansion of branches which are not applicable (in terms of the matrix: it would expand the whole column).

For example, we have the matrix

73130107091405 32013223151812
 4320914322308  9030832052205

and extract the number in row {{{1}}} and column {{{2}}}; the remaining 6 methods are:

  • {{p{{{1}}}|{{p{{{2}}}|73130107091405|32013223151812}}|{{p{{{2}}}|4320914322308|9030832052205}}}}
  • {{#switch: {{{1}}}|1={{p{{{2}}}|73130107091405|32013223151812}}|{{p{{{2}}}|4320914322308|9030832052205}}}}
  • {{p{{{1}}}|{{#switch: {{{2}}}|1=73130107091405|32013223151812}}|{{#switch: {{{2}}}|1=4320914322308|9030832052205}}}}
  • {{#switch: {{{1}}}|1={{#switch: {{{2}}}|1=73130107091405|32013223151812}}|{{#switch: {{{2}}}|1=4320914322308|9030832052205}}}}
  • {{p{{{1}}}|{{#titleparts:73130107091405/32013223151812|1|{{{2}}}}}|{{#titleparts:4320914322308/9030832052205|1|{{{2}}}}}}}
  • {{#switch: {{{1}}}|1={{#titleparts:73130107091405/32013223151812|1|{{{2}}}}}|{{#titleparts:4320914322308/9030832052205|1|{{{2}}}}}}}

The switches with only two cases can also be written with #ifeq, e.g.:

  • {{#ifeq: {{{1}}}|1|{{#ifeq:{{{2}}}|1|73130107091405|32013223151812}}|{{#ifeq:{{{2}}}|1|4320914322308|9030832052205}}}}

Using the generic projection templates we have e.g.:

  • {{pp|{{#expr:{{{1}}}+1}}|{{pp|{{#expr:{{{2}}}+1}}|73130107091405|32013223151812}}|{{pp|{{#expr:{{{2}}}+1}}|4320914322308|9030832052205}}}}

or

  • {{ppp|p={{{1}}}|{{ppp|p={{{2}}}|73130107091405|32013223151812}}|{{ppp|p={{{2}}}|4320914322308|9030832052205}}}}

However, unlike for #switch (see above), with projection templates we can also write all data in one list (even if it is a long list):

  • {{p{{#expr:2*{{{1}}}+{{{2}}}-2}}|73130107091405|32013223151812|4320914322308|9030832052205}}}}
  • {{pp|{{#expr:2*{{{1}}}+{{{2}}}-1}}|73130107091405|32013223151812|4320914322308|9030832052205}}}}
  • {{ppp|p={{#expr:2*{{{1}}}+{{{2}}}-2}}|73130107091405|32013223151812|4320914322308|9030832052205}}}}

Labeled section transclusion

[edit]

Section transclusion requires the use of [|Extension:Labeled Section Transclusion]].


With Template:Short DOW with labeled section transclusion we get:

Single day:

  • "{{#lst:Template:Short DOW with labeled section transclusion|3}}" → "Wed" [17]
  • "{{#lst:Template:Short DOW with labeled section transclusion|5}}" → "Fri" [18]
  • Undefined section:
    • "{{#lst:Template:Short DOW with labeled section transclusion|7}}" → "" [19]

Range of days:

  • "{{#lst:Template:Short DOW with labeled section transclusion|2|4}}" → "TueWedThu" [20]
  • "{{#lst:Template:Short DOW with labeled section transclusion|4|2}}" → "ThuFriSat" [21]

All days except one:

  • "{{#lstx:Template:Short DOW with labeled section transclusion|3}}" → "SunMonTueThuFriSat" [22]
  • Undefined section:
    • "{{#lstx:Template:Short DOW with labeled section transclusion|7}}" → "SunMonTueWedThuFriSat" [23]
  • Text replaced:
    • "{{#lstx:Template:Short DOW with labeled section transclusion|3|-}}" → "SunMonTue-ThuFriSat" [24]

Note that ranges do not work cyclically, day 4 (Thu) through day 2 (Tue) only gives Thu through Sat.

Producing multiple array elements in specified order

[edit]

Template:For loop allows not only to produce one array element, like above, but also more.

Examples:

{{For loop| |call=Short DOW|3}} gives "Wed"

{{For loop| |call=Short DOW|6|3|5|3}} gives "Sat Wed Fri Wed"

Producing multiple array elements in standard order, without duplicates

[edit]

In this method the results are in the index order according to the template content, not in the order of the parameters in the template call. Duplicate occurrences are ignored.

Using Template:Short DOW ipv:

{{short DOW ipv| |3}} gives "Wed"

{{short DOW ipv| |6|3|5|3}} gives "Wed Fri Sat"

The template uses Template:Bintodec, which computes an integer of which the binary representation forms the Boolean array indicating which array elements are selected, and Template:Dectobin. This makes it easier to determine where separators are needed (after every selected item, provided that more items are following).

Arrays with a template for each element

[edit]

Using an array with a template for each element, the advantage that it can be easier to make multiple changes if several data are on a single page is not applicable. However, compared with using #switch etc. it has also advantages.

In this case a 1D array has elements which are templates with names easily derived from the index, in particular by concatenation, such as with a name of the form array-name index, and for a 2D array (matrix) names of the form array-name index1 separator index2. The indexes can be any text.

The preprocessor node count is 2[1], the post-expand include size is the size of the result, the template argument size is 0.

One example is the set of system messages MediaWiki:msg-id/xx.

In the case of a 2D array separator and the possible values of index1 and index2 should be chosen such that there is no ambiguity. The separator is needed if just concatenating variable-length indexes could give the same result for different pairs, like (p,qr) and (pq,r). No separator is needed if at least one index is of fixed length, or e.g. if the first index consists of letters and the second of digits. The software does not need to parse index1 separator index2, but for convenient human parsing a separator such as a blank space may be preferred in some cases where it is not strictly needed.

Elements of a 1D array can e.g. be referred to inside a template with {{array-name {{{index}}}}} using parameter index, or similarly with a variable. Also a page can successively call each array element using {{array-name index}} with varying index.

In the case of a 2D array, a template may have a row index as parameter and produce a list based on that row of the matrix, or similarly for columns. See e.g. Template:List of Languages.

A disadvantage of having a template for each array element is that it is extremely cumbersome to copy a large array to another project, unless a bot is used; a sysop may also apply export and import, if those features are enabled.

An advantage is that the absence of a data value shows up as a link to a non-existing template, allowing an individual data value to be added easily. Adding multiple values may be slower than when fewer templates have to be edited. Also, changes affecting multiple array elements are more convenient if they are can be made in the wikitext of a single page.

Associative array

[edit]

Information like "Paris is the capital of France" can be stored in the form of an array (or row or column of an array) with Paris as first and France as second element (or conversely), using one of the methods mentioned above. However, it can also be stored in forms like, on one hand, {{#switch:{{{1}}}|..|Paris=France|..}} or {{{{{1}}}|..|Paris=France|..}}, or with a template Country Paris containing France, or, on the other hand in forms like {{#switch:{{{1}}}|..|France=Paris|..}} or {{{{{1}}}|..|France=Paris|..}}, or with a template Capital France containing Paris. For humans reading the wikitext, or if string functions are available to extract data from it, and in the case of the method of storing one data item in an array, used in combination with Special:AllPages, these methods are equivalent, but otherwise the functionality is quite different: one has to come up with the name of a capital to find the country, or conversely. In a way this functionality is more limited: to find all stored data one can only make a data dump like with msgnw; on the other hand, if the mentioned functionality is what one needs it is more convenient than e.g. searching through the first column of a 2D matrix A(i,j) to find an i with A(i,1)=Paris, which would require an extra switch operation.

The method {{{{{1}}}|..|Paris=France|..}} requires a template {{{1}}} to have a parameter Paris. This means that adding or changing a data item like the word Paris requires at least two changes: one in the template that holds the data, and one in the page/template that uses them.

This method is for example applied in w:Wikipedia:WikiProject Flag Template. w:Category:Country data templates contains ca. 1250 templates, e.g. w:Template:Country data Georgia, containing {{{{{1}}}|..|flag alias=Flag of Georgia.svg|..}}. Thus the set of templates forms an associative array A of 1250 rows with e.g. A(Georgia,flag alias)=Flag of Georgia.svg. Parameter names like "flag alias" (i.e. names of columns of the matrix) cannot easily be changed, as this requires a change in all 1250 templates, as well as in the templates {{{1}}} that use the data (or other pages/templates if this name is passed on as parameter). Alternatively one can introduce a second name for the column, e.g. "flag image", and change the pages/templates that refer to the column name to check both names for a value (unless one wants to reuse the old name for another matrix column). Change of a row name like Georgia requires renaming the template for that row; using the automatically created redirect, changing the templates {{{1}}} that use the data, or other pages/templates if this name is passed on as parameter, is not necessary, unless one wants to reuse the old name for another matrix row.

Summary of counts for template limits

[edit]
technique preprocessor node count[1] post-expand include size template argument size extra for undefined item[3]
maximum 1,000,000 2,048,000 2,048,000
call of a template that provides the data items as parameter values of a variable template, retrieving one or more data items (possibly with additional content from the variable template) 6, plus 1 for each defined parameter used (regardless of how many times each is used) plus 1 for each use of a parameter (including an attempt to use an undefined parameter, getting the default or the code with braces), plus 1 for each named parameter (used or not) twice the size of the total result length of the name of the variable template plus the total size of the retrieved data items preprocessor node count 1 (nothing else if the empty string is the default)
in particular, applying this with a parameter selection template, i.e., retrieving just one data item and no other content 7, plus 1 if the parameter is defined, plus 1 if it is named, plus 1 for each of the (other) named parameters twice the size of the result length of the name of the variable template plus the size of the result not applicable (retrieving one item at a time means the extra cost is the cost mentioned)
call of a template that provides the data items as possible result values of a switch 8 in the case of an immediate match, and 2 more for each extra step twice the size of the result length of the index preprocessor node count 6, plus 2 for each case (excl. the default)
call of a template containing a single data item and no other content 2 size of the result 0 inclusion producing a red link: preprocessor node count 2, post-expand include size: 5+length of full page name; checking existence: more, or expensive parser function count 1/500

The last column refers to data items that are missing, or not applicable but not marked as such. It shows that in the case of a sparse matrix (i.e., one where most cells are empty) the first method is preferable.

Note that for the first technique, for row-wise retrieval of a 2D array (or a sub-array consisting of multiple columns), it is advantageous if each data template contains a row of the matrix, rather than a column.

In the case of retrieval according to a regular pattern one may want to use a template anyway (e.g. a row template for a table). In comparing the methods, take into account that the required template of the first technique can serve as such, while using the other techniques the use of a template increases the counts.

Template for a set of data items:

technique preprocessor node count[1] post-expand include size template argument size
maximum 1,000,000 2,048,000 2,048,000
call of a template that provides the data items as parameter values of a variable template, retrieving one or more data items (possibly with additional content from the variable template) 6, plus 1 for each defined parameter used (regardless of how many times each is used) plus 1 for each use of a parameter (including an attempt to use an undefined parameter, getting the default or the code with braces), plus 1 for each named parameter (used or not) twice the size of the total result length of the name of the variable template plus the total size of the retrieved data items
call of a template that calls a separate template multiple times which provides each time a data item, using a switch 2, plus for each data item: 8 in the case of an immediate match, and 2 more for each extra step size of the total result, counting the data items three times for each data item the length of the index
call of a template that calls separate templates containing a single data item each 2, plus 2 for each use of a data item (including an attempt to use an undefined one, getting a link to the non-existing template) size of the total result, counting the data items twice 0

Thus in the last case (separate templates) one count gives 0 and the other two are not greater than proportional to the amount of data, while in the second method the preprocessor node count increases with the product of the number of data tems used and the total number of data items in the "matrix row". In the first method the preprocessor node count has a term proportional to the amount of retrieved data, and an additional term equal to the number of data retrievals (where in one data retrieval multiple data items can be retrieved), multiplied by the total number of named parameters in the call of the variable template. This does not increase faster than the amount of data retrieved if whole matrix rows are retrieved at once, or if unnamed parameters are used.

Thus the counts are independent of unused data in the following cases:

  • in the first method if unnamed parameters are used
  • in the second method only on a page that retrieves data at the start of the list, and only if all data the page tries to retrieve are defined
  • in the third method

The first and third method allow retrieval of ca. 500,000 different data items on a page, provided that they are small.

Even if we do not need so many data on one page (for example, we just want one column of a matrix with 1000 rows and on average 1000 defined values per row) it follows from the above that if we want to use a data template for each row then the data items have to be values of unnamed parameters. Thus in the case of a sparse matrix many dummy values such as the empty string have to be specified.

If we want on a page one column of a matrix with 1000 rows and on average between 500 and 1000 defined values per row we can also use named parameters. If switch is used, only any one of the first 500 columns can be retrieved. Thus even though on average the count is the same as with named parameters (preprocessor node count of 1 for each element), the maximum is more important than the average in this case, and named parameters are preferable.

See also w:Wikipedia:Template limits.

Comparison of named parameters and switch

[edit]

The bulk of the code for storing data as values for named parameters is the same as that for a switch, provided that there is no fall-through. As we have seen, the first gives only half the preprocessor node count of the maximum of the second.

Other differences:

  • in the case of duplicate left sides, the last one counts in the first method, and the first one with a switch
  • in the first method a left side with leading zero(s) is distinguished from the same number without them; with switch they are considered the same

Arranging data in data templates

[edit]

Many data can be considered to form a 2D matrix, with for example a row representing an entity and a column a property. If matrix elements are not put in separate templates and not all in one template, then we can choose between two possibilities: each row in a template, or each column, or put differently, we put each row in a template but have to decide whether to use a particular matrix or its transpose. Considerations:

  • It is convenient if data likely to be changed at the same time (because updates become available together, or because an editor reviews these data together) are in the same data template. For example, statistics for all municipalities in a country may periodically be provided together by a national organization, while population data of countries may separately come from the countries themselves, so not together for all countries.
  • With the named parameter method, for row-wise retrieval of a 2D array (or a sub-array consisting of multiple columns), it is advantageous if each data template contains a row of the matrix, rather than a column.

Examples of data arranged by entity:

Examples of data arranged by property:

Only for large-scale use of data on one page the page counts matter. In these cases an entity usually forms a row and a property a column (this way also the sorting feature makes more sense: we can sort entities based on a property). This suggests that with the named parameter method templates should preferably be arranged by entity, not by property. This is the case in the country examples. However, for the reasons of convenience mentioned above this is usually not done for municipalities. Thus with 400 to 500 municipalities we can at best have a maximum of circa 4 columns.

Redundancy

[edit]

Just like the need for multiple storage of a data item is avoided by making it independently retrievable, also a data item that can easily be derived from independently retrievable data items is typically not stored, but derived from the stored data. Examples of simple derivations are:

  • concatenation
  • simple computations, like addition of a few values, and division to find a percentage or density, or to do a unit conversion

This makes it easier to update data, avoids having larger or more data templates than necessary, and is less prone to errors.

In the case of a complicated derivation data redundancy may be useful to avoid a long wikitext, or a wikitext that is expensive to expand (in terms of page limits or slowing down the page). For example:

  • the population of a country is usually stored even if the population of each province is also stored
  • the country in which a town is located may be stored, even though the province is stored, and in another data template the country to which the province belongs is stored

With respect to the inconvenience of a long wikitext it makes a difference, of course, whether this would be needed only in a few templates, or in many places. To make the wikitext shorter, also an additional template can be created.

Modifying templates and/or template calls to use a system of automatic retrieval of data

[edit]

Often a set of articles uses a system where each article calls a common infobox template, specifying data as parameter values, in a way in which the data are not independently retrievable. When the values for a particular parameter are put in a data template, methods for introducing it include:

  • modifying the infobox template to use the data template; the parameter value is ignored; the parameter definition may be removed when convenient
  • modifying the infobox template to use the data template; if the data is missing in the data template the parameter value is used; allows introduction even if the data template is not, or may not be, complete; the parameter definition may be removed when convenient
  • modifying the infobox template to use the data template if the parameter is undefined; thus only after editing the article the data template is used, allowing one to check that things work as expected
  • specifying in the articles as parameter value a call to the data template; this can be useful if the infobox template is used in a larger set of articles than for which one wants to make the change, for example one wants to introduce the system for the municipalities of one country, while the infobox is in use internationally

Alternatively the data are made independently retrievable, but remain arranged by entity (if each article is about an entity) rather than by property, e.g. all data about Amsterdam are together, not all population data about municipalities in the Netherlands. This is done by changing a call to the infobox template into a data template about the subject of the article, replacing the name of the infobox template by a parameter. If the infobox uses {{PAGENAME}}, this is filled in or replaced by a parameter that identifies the article within the set, and possibly a fixed part. In the article the call is replaced by a call to this data template, with just the name of the infobox template as parameter, and, if applicable, the extra parameter mentioned.

A mix of the two systems is also possible: a data template about Amsterdam could itself contain some data about Amsterdam, but for the population call a population data template.

Instead of moving the infobox call to a data template it can also be made the include-part of the page, with similar modifications. In this case the name of the infobox template is replaced by a wikitext that is effectively a parameter on inclusion, but remains the name of the infobox template on the page itself. This is simply done by making the name of the infobox template the default of the parameter.

Wikidata

[edit]

Wikidata (d:) is a special wiki to store data. A strong point is that on all sites on which it has been deployed the data can be used. At present a weak point is that each "claim" (item-property-value triple) can be used on only one page on each wiki: the page on the item. For example, the fact that Berlin is the capital of Germany is available on each wikipedia (in localized form, e.g. in Dutch the fact that Berlijn is the capital of Duitsland, but only on the page about Germany.

See also

[edit]
  1. 1.0 1.1 1.2 1.3 1.4 1.5 Excluding the fixed amount of 1 for the page.
  2. From m:Vision.
  3. The extra counts for retrieving an extra item, in the case that it turns out not to be defined. Excluding, in a table, a post-expand include size of 4 for 2 pipes, counted twice.