Building a case for semantic URLs (draft post)

(note: this is a work in progress!)

When you look at a finding aid generated through ArchivesUM, the page URL looks something like this:

digital.lib.umd.edu/archivesum/actions.DisplayEADDoc.do?source=/MdU.ead.histms.0011.xml&style=ead

Aside from its length and lack of aesthetic beauty, this provides the person viewing the page with a confusing array of numbers and letters that will be of absolutely no use to them in their research. The display and style commands reference EAD, which means little or nothing to the non-archivist viewer. And the XML file title inserted in the middle, which is drawn from our Fedora back end, is a unique identifier that will be of no use to any researcher or archivist, as it does not match any of the other identifiers used for the collection.

By way of contrast, the URL for this blog post probably looks something like this:

icantiemyownshoes.wordpress.com/2014/03/19/date-formats/

WordPress, like many other sites, creates semantic URLs for each of the pages it generates. It clearly identifies the source of the page, the date it was originally posted, and some human-readable form of the title, which can be altered by the author of the post.
Like the ArchivesUM URL, it is a unique, static identifier for the information contained therein. Unlike its ArchivesUM counterpart, it provides the viewer with several important pieces of information. This has an impact on its findability, both on its website and when it appears on a Google search results page. Users are quickly able to determine whether they find the source trustworthy, how new it is, and that it references the topic they are interested in. There is also some evidence that Google’s algorithms give preference to URLs with more human-readable information.

As a large part of my project has been comparing other finding aids, I took a look at what others were doing:

Princeton: findingaids.princeton.edu/collections/C0159

Duke: library.duke.edu/rubenstein/findingaids/africanamericanmisc/

In both cases, the last part of the URL is inserted using the <eadid> tag. Princeton uses its collection number, while Duke uses a shortened version of the collection title. Both are clean and easy to read. It is arguable how useful these URLs are to the average user, but they would certainly be useful to the reference archivist. The same cannot be said for the ArchivesUM URL standard.

I am arguing for a new URL standard that looks something like this:

digital.lib.umd.edu/archivesum/<findingaiddate>/<unitid>/

Pros:
-Quickly conveys information about the repository, date of finding aid creation, and collection name
-Provides level of trust to user (which is admittedly hard to quantify)
-Elements for URL are already present in EAD file, so easier to implement
-Easy for reference archivists and researchers to identify collection by URL
-Removes “sausage making” display calls currently in URL

Cons:
-Can be confusing if collections have similar titles
-Has to be a permanent URL to work
-Would need to ensure that this works in ArchivesSpace
-Our legacy finding aid dates may not be accurate

Advertisements

I'm a student in the Archives, Records and Information Management program at the University of Maryland's iSchool. I got to this point after separate careers in theatre, food service, blueprint making, and the corporate sector. I'm as surprised as you are.

Tagged with: , ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: