Finding a value for DC.identifier

Preamble

One of my tasks is to look at metadata that can be sourced from existing Drupal code or the Drupal database. This will be used to provide default values for pages which may be over-written by the user, if required. To begin at the beginning, I believe that we need to find a value for DC.identifier, as the URI is the most fundamental piece of data that we have about a page. (Without a URI, there is no way to get to the page in the first place.)

URI Schemes

To work out the URI of a page, we first need to know what URI scheme we are using. These are the things that I think we need to check:

  1. Are we using human (and search-engine) friendly URIs such as http://drupal/node/2 or are do we have a crippled installation only capable of working through the query string like this: http://drupal/?q=node/2 ?
  2. Are we using the default http://drupal/node/[node number] scheme, or have we enabled URI aliases? We can determine this by doing: SELECT status FROM system WHERE filename='modules/path/path.module'; A value of 1 appears to mean that URI aliases are enabled.

The url_alias Table

If an alias has been set for a URI, it will appear in the url_alias table. The Drupal default fragment of the URI, such as node/1 appears in the src column and the alias in the dst column. So, if we call the Drupal default fragment $DURI, we could say (in pseudo-code):


$DURI_LOOKUP=("SELECT dst FROM url_alias where src='${DURI}';");
if (${DURI_LOOKUP})
$DC.identifier=${DURI_LOOKUP};
else
$DC.identifier=${DURI};
endif

Having looked at the code a little closer, it appears that it is possible to add more than one URI alias (yuck!) per node. The above probably needs to be changed so that if ("SELECT count(*) from url_alias where src='${DURI}';") > 1, we just ignore the aliases and go along with ${DURI}. The user can then change the value to whichever alias they want, by hand.

The Rest of the URI

What we put into $DC.identifier, above, is actually missing the domain and possibly part of the path of the URI. This would appear to be available from the global variable $base_url, which is defined for us in our site's settings.php. This variable should lack a trailing slash, so we need to use $base_url, a slash and then the value we put into DC.identifier above, to come up with what we need.

Comments

Re: Finding a value for DC.identifier

What I failed to cover in the above, is how I get the URI fragment to do the lookup in url_alias. Looking through the database for this site, it appears that all I really need is the Drupal node ID (nid):


SELECT dst from url_alias where src='node/${nid}';

I am, however, getting the impression that it might be less than easy to get the DC data into non-node pages like user/1, etc.