If anyone has a large Web site with pages marked up with Dublin Core metadata that they wouldn't mind having crawled by the DCMetaSpider robot, please get in touch via the contact form.
I would like to crawl a fairly large corpus of pages in order to test some database (shares a schema with the Dublin Core for Drupal project) queries. The robot is well behaved, won't make a mess on the carpet and can be constrained to crawl as slowly as 1 page per minute.
The software is only part-written at the moment and my immediate priority is the Drupal module; I'd probably be looking to do the crawling in about three or four weeks from now.
Comments
Re: Wanted: a Dublin Core Metadata-Rich Web Site
This is a really good initiative!! It would be really nice if this was done more as a generic metadata module. For different Dublin Core profiles, IMS LOM, MARC, etc
Keep up the good work.
Re: Wanted: a Dublin Core Metadata-Rich Web Site
The software itself is actually fairly generic; whilst the database is structured to take into account predicates (terms), language and scheme - based on the requirements of Dublin Core - other vocabularies may be added simply by inserting extra rows into database tables.
If anyone wants to see a particular set of metadata incorporated, I am setting up a template whereby they can contribute the metadata set which will then be bundled as a piece of SQL. These lumps of SQL can then be run so that the database will "know" about the new metadata set. I won't be incorporating this as something that is accessible through the user interface as I don't see it as a day-to-day task, more something that is done at the time of setup.