In creating the linked data platform in MarcEdit, I made the decision that nearly all interactions with linked data endpoints and bibliographic data would be controlled via a rules file. The reason for this approach is that rules files could then be created for different flavors of MARC and work equally well, without having to recompile code. So, while I am actively adding and testing new endpoints as they become available, users have the ability to customize their linked data rules files and add their own endpoints to MarcEdit for use. This document will provide some details on how this customization works.
Rules file:
The rules file is located in the application configuration path, in the file: linked_data_profile.xml. The structure of the file is fairly straightforward. The rules are broken into two parts.
- Part 1: Field Rules
Field rules are defined within the <rules> block, with the rules for each field to be processed, defined with a <field> block. The field block is where the user can define how the data should be processed. The rules/field blocks follow the following criteria:
rules block: top level: field Attributes: type: authority, bibliographic, authority|bibliographic tag (required): Value: Field value Description: field to process subfield (required): Value: Subfield codes Description: subfields to use for matching ind2 (optional): Attributes: value: second indicator value vocab: matches a valid vocabulary option index (optional): Values: subfield code or empty Description: field that denotes index sticky (optional): Values: subfield codes that should always be part of an atomized field (i.e., abc) Description: When processing atomized data, some subfields are applicable to all information within a field block. These are sticky values, and should be marked to ensure that they replicate between atomized versions of a field set. atomize(optional): Values: 1 or empty Description: determines if field should be broken up for uri disambiguation special_instructions (optional): Values: name|subject|mixed|linking Description: special instructions to improve normalization for names and subjects. uri (required): Values: subfield code to include a url Description: Used to determine which subfield is used to embed a URI vocab (optional): Values (see supported vocabularies section) Description: when no index is supplied, you can predefine a supported index
- Part II: Collections
The collections definitions define how MarcEdit can interact with a webservice to extract data. Currently, the tool requires endpoints to be SPARQL, and return JSON in order to be user configured. The <collection> block uses the following rules:
collections block top level: collection attributes: none name (required): Value: Defines the name of the service label (required): Value: Defines the index name used within the record to identify the vocabulary. Example: =650 \7$aTest$2fast The label would be defined as uri (required): Value: The URL MarcEdit will use to query the endpoint. Use {search_terms} to denote the placeholder where MarcEdit should include search terms. Please note that arguements need to be encoded. Use: http://string-functions.com/urlencode.aspx or other tools to URL encode the string to ensure proper communication. MarcEdit will only encode the Search Terms as it injects them into the URI. The tool assumes you, the user, have properly encoded any other required data. For examples, see the Japanese Diet Configuration below: Japanese Diet Library http://id.ndl.go.jp/auth/ndla?query=PREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%20SELECT%20*%20WHERE%20%7B%20%3Fsubj%20rdfs%3Alabel%20%22{search_terms}%22%7D&output=json results.bindings[0].subj.value pattern (optional) Value: For use when replacing an identifier in the $0 with a full URI. Examples, see FAST and GND. path (required): Value: For user defined collections, path is required. This defines the json object path to the URI. Example: results.bindings[0].subj.value
It is recommended when developing collections that you test your profiles in a browser or SPARQL tool to determine the proper construction of the URI.
Contributing endpoints:
Ideally, if you create a definition to a new endpoint, or would like to have a major controlled vocabulary added to the MarcEdit rules file, please contact: reeset@gmail.com and provide documentation related to the specified service and sample data for testing.