Understanding URLs and database stanzas
Overview
The relationship between URL components determines what resources need their own database stanzas in the config.txt file and what resources can be added to the larger umbrella stanza under which they fall. Generally this can be determined by first examining whether a content provider offers access to multiple resources and the relationship between the content provider's URL and the URLs of the individual resources within that provider's database. Many content providers operate databases containing numerous journals or other resources, and as such, a carefully constructed, single database stanza will provide users with access to both the homepage of the database and all the journals it contains. This also means that you may not need to update your config.txt file and database stanzas every time you subscribe to a new resource.
The first step in determining how that single database stanza should be created is to examine the URLs of your content provider's homepage and the resources you subscribe to and then to determine the relationship between them.
URL terminology
The following definitions are used to describe the different parts of a URL. The simplified definitions given are adequate to understand these terms' use within EZproxy documentation, but they are over-generalized from the terms' exact meanings. Understanding the components of a URL will help you to determine the relationship between the databases you subscribe to and the individual resources, and create better database stanzas.
Term | Definition | Examples |
---|---|---|
scheme | The protocol used for retrieval of the URL |
Note: Although many other schemes exist, for the purposes of this document, only these two schemes will be used. |
hostname | The name or address of the webserver to be accessed. Hostname is not case sensitive |
Note: Because hostnames are not case sensitive, the two hostnames above are equivalent. |
port | A number used to identify a specific webserver at the provided hostname. When omitted, a scheme-specific default value is used. |
|
origin | The unique combination of a scheme, hostname, and port, combined as scheme://hostname:port. |
|
path | The portion of the URL from a slash (/) following the origin up to the query or fragment. When omitted, the default path / is used. |
|
query | The portion of the URL from the first question mark (?), following the path, and up to the fragment. If the first question mark in a URL appears after a hash (#), that section is not the query, but rather part of the fragment. |
|
fragment | The portion of the URL from a hash (#) through the end. |
|
Examples
How EZproxy reads URL components
The following discussion provides an introduction to similarities and differences between URLs, based on the terminology in the previous tab. These characteristics impact on the way in which EZproxy determines whether to proxy a resource or not when reading the config.txt file is covered as well. For a more detailed discussion of the different directives used within config.txt and how they impact proxying, please see Config.txt Directives: An Introduction to Database Stanzas.
In general, EZproxy ignores the path, query, and fragment when reading the config.txt file and determining whether to proxy a resource. These additional URL components are only needed when creating the Starting Point URLs. For more information about Starting Point URLs, please see Starting point URLs and config.txt.
Sample URLs and their components | Relationships between URLs | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
URL 1: http://www.somedb.com
|
URLs 1 and 2 http://www.somedb.com = http://www.somedb.com:80 are functionally equivalent even though URL 1 uses the default port and URL 2 uses the default path. (Because no port is listed, and the scheme for URL 1 is http, the port defaults to 80, and thus the origin for URL 1, http://www.somedb.com:80 looks just like URL 2). Creating a database stanza using URL 1 would also provide your users with access to URL 2, and vice versa, with URL 2 providing access to URL 1. |
||||||||||||||
URL 2: http://www.somedb.com:80
|
URLs 1, 2 and 3 URLS 1, 2, and 3 http://www.somedb.com http://www.somedb.com:80 http://www.somedb.com/search all use the same origin, even though 1 and 3 depend on the default port, 2 has an explicit port, and 3 has a path. Creating a database stanza using URL 1, 2, or 3 would provide your users with access to any of these URLs (1, 2, or 3). |
||||||||||||||
URL 3: http://www.somedb.com/search?q=ancient
|
URLs 3 and 4 http://www.somedb.com/search?q=ancient and https://www.somedb.com/search?q=ancient are not functionally equivalent as they use different schemes. These URLs would need to be listed separately in a database stanza in order for users to access them. |
||||||||||||||
URL 4: https://www.somedb.com/search?q=ancient
|
|||||||||||||||
URL 5: http://www.somedb.com:8080/history?era=darkages
|
URLs 5 and 6 http://www.somedb.com:8080/history?era=darkages and http://search.somedb.com:8080/history?era=darkages are not functionally equivalent as they use different hostnames. Providing access to both of these URLs would require multiple directive lines within a single stanza. |
||||||||||||||
URL 6: http://search.somedb.com:8080/history?era=darkages
|
|||||||||||||||
URL 7: http://search.somedb.com:8080/history#?modern
|
URL 7 http://search.somedb.com:8080/history#?modern does not have a query since the first question mark (?) appears after the first hash (#). To allow EZproxy to process a URL containing a fragment, please see How to encode a fragment for use with EZproxy. Note: |