Editing MarcEdit’s Linked Data Rules File

In creating the linked data platform in MarcEdit, I made the decision that nearly all interactions with linked data endpoints and bibliographic data would be controlled via a rules file.  The reason for this approach is that rules files could then be created for different flavors of MARC and work equally well, without having to recompile code.  So, while I am actively adding and testing new endpoints as they become available, users have the ability to customize their linked data rules files and add their own endpoints to MarcEdit for use.  This document will provide some details on how this customization works.

Rules file:

The rules file is located in the application configuration path, in the file:  linked_data_profile.xml.  The structure of the file is fairly straightforward.  The rules are broken into two parts.

  • Part 1: Field Rules
    Field rules are defined within the <rules> block, with the rules for each field to be processed, defined with a <field> block.  The field block is where the user can define how the data should be processed.  The rules/field blocks follow the following criteria:

 

    rules block:
top level: field
Attributes:
type: authority, bibliographic, authority|bibliographic
tag (required): 
Value: Field value
Description: field to process
subfield (required): 
Value: Subfield codes 
Description: subfields to use for matching
ind2 (optional):
Attributes:
value: second indicator value
vocab: matches a valid vocabulary option
index (optional): 
Values: subfield code or empty
Description: field that denotes index
sticky (optional):
Values: subfield codes that should always be part of an atomized field (i.e., abc)
Description: When processing atomized data, some subfields are applicable to all information within 
a field block.  These are sticky values, and should be marked to ensure that they replicate between 
atomized versions of a field set.
atomize(optional): 
Values: 1 or empty
Description: determines if field should be broken up for uri disambiguation
special_instructions (optional): 
Values: name|subject|mixed|linking
Description: special instructions to improve normalization for names and subjects.  
uri (required):
Values: subfield code to include a url
Description: Used to determine which subfield is used to embed a URI
vocab (optional): 
Values (see supported vocabularies section)
Description: when no index is supplied, you can predefine a supported index
  • Part II: Collections
    The collections definitions define how MarcEdit can interact with a webservice to extract data.  Currently, the tool requires endpoints to be SPARQL, and return JSON in order to be user configured.  The <collection> block uses the following rules:

 

    collections block
top level: collection
attributes: none
name (required):
Value: Defines the name of the service
label (required):
Value: Defines the index name used within the record to identify the vocabulary.  
Example: =650  \7$aTest$2fast
The label would be defined as 
uri (required):
Value: The URL MarcEdit will use to query the endpoint.  Use {search_terms} to denote 
the placeholder where MarcEdit should include search terms.  Please note that arguements 
need to be encoded.  Use: http://string-functions.com/urlencode.aspx or other tools to 
URL encode the string to ensure proper communication.  MarcEdit will only encode the 
Search Terms as it injects them into the URI.  The tool assumes you, the user, have properly 
encoded any other required data.  For examples, see the Japanese Diet Configuration below:
Japanese Diet Library

http://id.ndl.go.jp/auth/ndla?query=PREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%20SELECT%20*%20WHERE%20%7B%20%3Fsubj%20rdfs%3Alabel%20%22{search_terms}%22%7D&output=json
results.bindings[0].subj.value
pattern (optional)
Value: For use when replacing an identifier in the $0 with a full URI.  Examples, see FAST and GND.
path (required): 
Value: For user defined collections, path is required.  This defines the json object path
to the URI.  
Example: results.bindings[0].subj.value

It is recommended when developing collections that you test your profiles in a browser or SPARQL tool to determine the proper construction of the URI.

Contributing endpoints:

Ideally, if you create a definition to a new endpoint, or would like to have a major controlled vocabulary added to the MarcEdit rules file, please contact: reeset@gmail.com and provide documentation related to the specified service and sample data for testing.