3.2. Integer Coordinates
The algorithm is using integer coordinates. This is the only way to guarantee that all calculations are exact and that there are no inconsistencies between topology and geometry. To do this, the vertex coordinates of the geometric features must first be converted into integers. The coordinates are scaled and translated in such a way that the original relative spatial relations are preserved.
The coordinate systems used are not described in detail, Cartesian coordinates are assumed. For projected coordinates, a conformal coordinate system should be used to avoid distorting the spatial relationships. In the numerical examples it is assumed that meter is used as the unit of length.
The source data, the set of geometric features (
), are represented as WKT simple features. Thus, the coordinates are presented as string. These strings have to be converted lossless into a number format for calculations. There are different options for this depending on the computer platform used. The important issue is to ensure that there is no loss of quality during this step. For example, it would be sufficient to use the
decimal128 type [
17] or an equivalent. It is essential not to use binary floating point numbers, since in this case the coordinates can already be corrupted when they are imported. This can be seen easily in the examples in
Table 2. Caused by the base change it comes with rational numbers inevitably to conversion losses, since many decimal finite rational numbers become infinitely periodic in the binary system.
The imported OGC Simple Features are stored in an equivalent data type (
Point,
Linestring,
Multilinestring,
Polygon). The coordinates are imported lossless (e.g., with
decimal128). In addition, the minimum bounding box of all points from the features in
and their maximum scaling factor
of the decimal places are determined (e.g., The factor
s for the number 12.345 is 1000). The bounding box directly defines the smallest coordinate value (
) and the largest extent
(Equation (
1)).
To store the coordinates as integers, it is necessary to specify the integer type to be used. One could simply use an arbitrary large integer type for the integer coordinates and have no problems converting the coordinates [
6] (pp. 79–81). However, this would not be practical because it would lead to an unpredictable increase in the amount of memory needed which also effects the computation time. So it is reasonable to use native integer types. Consequently, it must first be checked whether this is possible with the usual 32 bit and 64 bit integer types.
For this, first consider the essential formula for the following algorithm. This is the determinant calculation of a matrix of three points (Equation (
2)). It is used to determine whether three points are collinear, or if they are not, to determine the orientation of a triangle made of the three points [
3,
4].
The geometric origin can be used to determine the maximum coordinate value for the determinants calculation.
Figure 3 shows triangles which represent the set of all possible largest triangles within the positive integer coordinate range. The formula for calculating the triangle area (Equation (
3)) makes it easy to deduce that all triangles have the same area.
Thus, the determinant can result in a maximum of twice the value (positive and negative) of the triangle area (Equation (
4)). From this, Equation (
5)) can be derived, by which the maximum possible coordinate value
can be determined from the integer type used for the determinant calculation.
Now we have to consider how integer coordinates are calculated from the given coordinates (Equation (
6)).
Equation (
7)) illustrates that the size of the integer coordinates depends on the maximum extent
and the maximum scale
. If their product is larger than the maximum available coordinate value
, data loss occurs because the coordinates are truncated by their remaining decimal places after scaling. The conversion is no longer bijective in this case.
If we use the 64 bit integer type for the determinant calculation, with a number range from
to
, we get
for the largest possible coordinate value. This number is greater than the maximum positive numerical value of 32 bit integers. Thus, they can be used for the coordinates. Assuming a necessary computational accuracy in the millimeter range (
), the extent of an region must not be larger than 2,147,483.645 m, i.e., about 2000 km, in order to still calculate correctly with 32 bit integers. Thus, this range is only recommended for small-scale applications.
However, in order to be able to use the algorithm in all geospatial cases, it is advisable to work with 64 bit integers for the coordinates. This would even allow an extent of approx. 9,000,000,000,000 km with millimeter accuracy. Since the largest distance possible on Earth is approx. 40,000 km, 64 bit integers can map all coordinates up to a computational accuracy of 11 decimal places (10 pm). Thus, this number range is sufficient for all use cases in the GIS area. The disadvantage is that for the determinants 128 bit integers are needed, which are not always available natively in all programming languages or platforms.
After the conversion, the geometric features are stored with their integer coordinates, and a point set of all integer coordinates
is formed (
Figure 1b). During the generation of integer objects it must be checked that none of the features in
is degenerated (
Figure 4a) or if the orientation has changed (
Figure 4b). A degeneration of a feature can be recognized by the fact that the end points of a line become congruent or it gets an area of zero, where a change of orientation is associated with a sign change of the area. In this case the feature must be discarded and cannot be analyzed with this method. For this reason, we use 64 bit integers. Alternatively, it is possible to decide based on the spatial extent (Equation (
7)). Due to the previously explained usability of 64 bit integers, it can be assumed that degenerated features were already degenerated before the conversion.
3.3. Intersections as Rational Positions
Integer numbers are used for the coordinates of in the basic mesh , but integer numbers cannot be used for intersection points, when reconstructing edges in , since intersection points are integer numbers only in exceptional cases. Even floating point numbers could not represent their positions correctly. Rational numbers are required.
We have decided not to calculate the intersection points as coordinates. We represent them as ordered positions on an edge (
Figure 5) of the base mesh
or a feature
. However, this also requires rational numbers which are represented as a fraction of two integers. The given end points of the edges have integer values as coordinates. As a consequence, the position of the intersection point on these edges can be exactly expressed by a fraction of two integer values. This is true to all intersection points on an edge so that the order of these intersection points on an edge can be determined exactly as well.
Equations (
9) and (
10) demonstrate how these positions can be calculated in the case of two intersecting edges (
and
). The values of
and
represent the relative position of the intersection point on the edges
and
. Thus the fractions are always in the range from 0–1.
Equation (
11) shows an example for the position calculation, which is also depicted in
Figure 6.
Figure 6 also shows the geometric meaning of
and
, whose values were derived from the ratio of the partial distances to the total distances.
Since determinant calculations (similar to Equation (
2)) are needed, the integer type resulting from this calculation (64 bit for 32 bit coordinates resp. 128 bit for 64 bit coordinates) must be used.
It is important to note that the coordinates of the intersection points are not required for our algorithm. The presented algorithm uses only relative positions on edges. However, absolute xy-coordinates can be calculated, for example, for display purposes. But the calculated xy-coordinates of the intersection points should not be used for further calculations, due to potential computational inconsistencies between geometry and topology.
3.4. Space Partition
The integer coordinates
now form the basis of a Delaunay triangulation (
Figure 1c). Theoretically, a simple triangulation would suffice, but it was proved in [
18] that a Delaunay triangulation reduces the number of intersections. This triangulation provides the base mesh and remains structural unchanged in the next upcoming steps.
Then, successively the edges
of the geometric features
can be inserted into the base mesh
(
Figure 7). This step is subdivided into the following substeps:
Since all points are already contained in the base mesh
, the starting point of the edge has to be searched first (
Figure 7a).
Then the base mesh edge
to the right of or on the feature edge
is searched for (
Figure 7b).
If the feature edge lies on a base mesh edge , there are two possibilities. If the end point of the feature edge is identical to the end point of the base mesh edge , the loop for this edge is finished. Otherwise, the position of the endpoint of the base mesh edge on the feature edge is calculated and the loop jumps to item1 with this point as the new start point.
Otherwise, go to the subsequent step with the next edge (in the triangle of the base mesh).
The base mesh edge
intersects the feature edge
(
Figure 7c), thus the positions of the intersection on both edges are calculated.
Then, the opposite point in the neighboring triangle of the base mesh edge is determined and its location in relation to the feature edge is calculated.
If the point is on the feature edge
(
Figure 7d), it is either the end point of the feature edge
and this is finished. If it is not, the position on the feature edge
will be calculated and with this point as new starting point jump to item 1.
Otherwise, with the right respectively left base mesh edge this step (item 3) is processed again.
Once all sector triangle edges are inserted, the intersection points of these edges within the individual triangles of the basic mesh can be determined. The positions of the intersection points are stored as rational numbers as explained in
Section 3.3.
In
Figure 8 the individual steps for identifying edge intersections
with
within the triangles of
are represented. For this purpose, an iteration is performed over the triangles of the base mesh. For each triangle the following substeps are processed:
Determine all intersection points of feature edges on the triangle edge and sort counterclockwise around the triangle. (
Figure 8a).
Determine the inner intersection points according to the scheme in (
Figure 8b). No geometric calculations are necessary for this. Intersections always result when the endpoint indices of one edge lie between the endpoint indices of the other edge.
Calculate the intersection (again with rational numbers) position on both feature edges
(
Figure 8c).
All edges (basic mesh , geometric features ) and the positions of the intersections are then used to create half-edges(), which are then connected to form polygons (facets ).
Now each point (
), each half-edge (
) and each facet (
) is checked to which geometric feature it belongs. Thereby an element can belong to none, one or more geometric features.
Figure 9 shows how to find all elements of an areal geometric feature. First, all half-edges that lie on the boundary of the feature are determined (
Figure 9a). The adjacent facets in the interior (
Figure 9b) belong to the geometric feature. Then, the set of facets is extended by their neighboring facets until no new ones are found (
Figure 9c). Finally, all points, half-edges and facets that were detected or are part of the detected elements are assigned to the feature.
At this stage we have a complete, gap-less and non-overlapping two-dimensional decomposition, generated fully automatically from the original data as seen in
Figure 1.
3.5. Point, Line and Region Types
To increase the efficiency in using the space decomposition to determine the DE-9IM matrices, a preparation step is required.
The comparison to determine the DE-9IM matrices operates on three different types:
Point,
Line, and
Region. They contain fields representing the set of Interior and Boundary elements (
Table 3). The types indicate the geometric dimension of the geometric features to be represented.
Dimension 0Point Points are objects without an extension. They have an inside (the point itself), no border and an outside (the rest). They can be derived directly from the OGC type Point and have only one Interior0 field, which consists of the vertex itself.
Dimension 1Line Lines are line-shaped objects composed of one or more line segments. The interior of these objects are the edges of the line segments and the end vertices of the line segments that belong to multiple line segments. The boundary are the remaining end vertices. Objects without a boundary are also possible if the ring of line segments is closed. They are derived from the OGC types LineString and MultiLineString. Line objects have an Interior0 field with all end vertices that do not belong to the boundary, an Interior1 field with all half-edges, and a Boundary0 field with the associated end vertices.
Dimension 2Region Regions are two-dimensional objects consisting of one or more facets. The interior consists of the regions between a closed ring of line segments. The boundary are all line segments and vertices of the object that are not inside. They are derived directly from the OGC type polygon. These objects have an Interior0 field with all vertices of the facets that are inside, an Interior1 field with all half-edges that are inside, and a 2-dimensional Interior field (Interior2) with the associated facets. Additionally, it has the Boundary1 field with the line segments that are not inside and a Boundary0 field with their end vertices.
Since in the last step of the spatial decomposition the elements of the base mesh were assigned to belong to a geometric feature, now the Point, Line and Region objects can be easily formed, depending on the OGC type of the associated features.
At this stage, the preparation for generating the DE-9IM matrices is complete. Only the generated Point, Line and Region objects with their associated elements are needed.
3.6. DE-9IM Matrices
With the now existing
Point,
Line and
Region objects, one can easily determine the DE-9IM matrices. They have 0, 1, and 2 dimensional Interior (
I) fields, and 0 and 1 dimensional Boundary (
B) fields (
Table 3). The Exterior (
E) is made up of those elements that do not belong to the object.
In Equation (
12) it can be seen how DE-9IM matrices are defined. The set operations of the definition can be applied directly to the previously created
Point,
Line and
Region objects.
The
operator can be implemented in such a way that it always returns the highest dimension that has a non-empty intersection between the comparison objects. The tests in the last column and row of the matrix have to be adapted, because the exterior is not stored directly. However, the tests can be easily adapted. Since we are working in a bounded region, the exterior is simply that part of the overall set which is not part of the interior and the boundary. Therefore, the types always have combined
InteriorBoundary fields to allow easy querying (
Table 3).
With this, the DE-9IM matrix can be easily created. If a simple binary 9IM matrix is needed, it can be easily converted. At all positions occupied by a 0, 1 or 2, a T (true) is set.
Figure 10 demonstrates how these set operations work.
Figure 10a shows the
region object of feature 2 in
Figure 1a and
Figure 10b shows the
region object of feature A.
Figure 10c then depicts the determined intersection. The DE-9IM matrix for this example is presented in Equation (
13). In words, this is an
overlaps relation. The individual cells of the matrix are created as shown in
Table 4.