As libraries continue to move into the linked data spaces, one of the areas that I believe will become more and more important, is the ability to resolve against local control vocabularies. These local vocabularies could be local to one’s institution, or they could be local to another institution, or, they could be large national vocabularies that have yet to be defined in the master list of Source Terms maintained by the Library of Congress: Subject Heading and Term Source Codes. Since MarcEdit’s linked data resolution has been developed around the concept of a rules file and collections, the tool requires that each collection have a unique “term code” specified, for automatic resolution. In MARC, these codes reside in the $2 in subject fields, or can be defined explicitly when processing against any other field utilizing the <vocab> rules element in MarcEdit. But recently, this situation has come up while doing work at The Ohio State University (OSU), and in thinking about how we develop local terms and namespaces to enable MarcEdit’s linked data reconciliation services, we have developed a process that I would like to share.
Background
The OSU Libraries has had a long history of experimenting with different metadata conventions within the catalog. Our head of technical services, Magda El-Sherbini, quite literally, wrote the book on RDA, so it should come as no surprise that she would be interested in working on strategies for embedding linked data. As part of this experimentation, the Libraries began to work with linked data services and National Libraries that were not yet defined in the Subject Terms Codes, making it difficult to utilize MarcEdit as the method for handling automatic URI linking. So, we started to think about the problem.
In MarcEdit, collections and linked data resolution are handled through the creation of rules. On the field side, rules are created to tell MarcEdit which subfields may be utilized when resolving a specific element. Likewise, special processing notes can be added to tell MarcEdit if an item is a subject, personal name, or mixed content. So, for example, let’s take the 650 field in MARC21 as an example. The current rule would look something like this:
<field type=”bibliographic”>
<tag>650</tag>
<subfields>abvxyz</subfields>
<ind2 value=”0″ vocab=”lcsh”/>
<ind2 value=”1″ vocab=”lcshac”/>
<ind2 value=”2″ vocab=”mesh”/>
<ind2 value=”7″ vocab=”none”/>
<index>2</index>
<uri>0</uri>
<special_instructions>subject</special_instructions>
</field>
The field rules establish the processing guidelines that MarcEdit will utilize when evaluating a particular field. In this case, the field rules are limited to bibliographic records, limit subfield processing to the named elements in the subfields tag, and then notes that specific indicators are utilized to name specific vocabularies. In the final case, the second indicator of 7 sets no processing rules, and the subfield 2 (defined in the index tag), will be utilized to tell MarcEdit which vocabulary was utilized for the term. It’s these vocab values that are linked to the collection definitions. Here is an example of a collection definition for the Japanese Diet Library:
<collection>
<name>Japanese Diet Library</name>
<label>jlabsh/4</label>
<uri>http://id.ndl.go.jp/auth/ndla?query=PREFIX+rdfs%3a+%3chttp%3a%2f%2fwww.w3.org%2f2000%2f01%2frdf-schema%23%3e+%0d%0aPREFIX+xl%3a+%3chttp%3a%2f%2fwww.w3.org%2f2008%2f05%2fskos-xl%23%3e%0d%0aSELECT+%3fsubj+WHERE+%7b+%0d%0a+%7b%3fsubj+rdfs%3alabel+%3fheading%7d+UNION%0d%0a+%7b%3fsubj+xl%3aaltLabel+%5b+xl%3aliteralForm+%3fheading%5d%7d%0d%0a+filter+(%3fheading+%3d+%22{search_terms}%22)%0d%0a+%7d&output=json</uri>
<path>results.bindings[0].subj.value</path>
</collection>
<collection>
In this case, the label tag is the tag that must match the values being utilized in the $2 in the record. So, for example, this field would automatically initiate resolution against the Japanese Diet Library linked data endpoint:
=880 07$6650-05$a日本-歴史-昭和時代(1945年以後).$2jlabsh/4
While this would not:
=880 07$6650-05$a日本-歴史-昭和時代(1945年以後).
The difference between the two fields in the $2, which includes the label for the collection. In order for MarcEdit to automatically discover the vocabulary being utilized, it must be named, and then defined within the rules file.
Defining Local Namespaces
So let’s go back to the issue at OSUL. The Korean National Library has established an endpoint that can be utilized to resolve Korean subjects and names. However, there is currently no entry in the official terms list. This means that no value has been established for use with the $2. Since this is an important value for use within MarcEdit, and it is likely that this value will be established in the Master Terms list in the future, we developed a local namespace for use in our system that could be utilized to create a unified label structure for local or undefined vocabularies. In our case, we established the following local namespace: local/OSU. Using this, I created the following rule in the MarcEdit rules file: local/OSU/kdl. This allows us to create rules for vocabularies not yet established in the national registry, while still being able to utilize MarcEdit for URI resolution. Using this process, I created (and this is in the current rules file distributed in the tool), the following definition for the Korean National Library endpoint:
<collection>
<name>The National Library of Korea</name>
<label>local/OSU/kdl</label>
<uri>http://lod.nl.go.kr/sparql?query=select+%3Furi+where+%7B+%3Furi+rdfs%3Alabel+%22{search_terms}%22+%7D%0D%0A&type=json&request_method=get</uri>
<path>results.bindings[0].uri.value</path>
</collection>
</collections>
Now, as catalogers create localized subject headings for terms, we can automatically go back and generate URIs as the final step in the process. So, for this record:
=001 ocn948964079
=003 OCoLC
=005 20170426084620.1
=008 160523s2016 ko a b 001 0 kor
=010 \\$a 2016423183
=040 \\$aDLC$beng$erda$cDLC$dHMY$dOCLCO$dLOA$dOSU$dCNUTO$dRBN$dEYM$dOSU$dOCLCO$dNDD$dBNG
=066 \\$cHani$c{dollar}1
=020 \\$a9791159050503
=035 \\$a(OCoLC)948964079
=042 \\$apcc
=043 \\$aa-kr—
=050 00$aHJ1400.5$b.C55476 2016
=082 \\$aCD934.6
=049 \\$aMAIN
=100 1\$6880-01$aCho, Yŏng-jun$c(Professor of Economics),$eauthor.
=245 10$6880-02$aChosŏn hugi wangsil chaejŏng kwa Sŏul sangŏp =$bRoyal finance and procurement in late Choson Korea /$cCho Yŏng-jun.
=246 31$aRoyal finance and procurement in late Chosun Korea
=250 \\$6880-03$aCh’op’an.
=264 \1$6880-04$aSŏul-si :$bSomyŏng Ch’ulp’an,$c2016.
=300 \\$a365 pages :$billustrations ;$c23 cm.
=336 \\$atext$2rdacontent
=337 \\$aunmediated$2rdamedia
=338 \\$avolume$2rdacarrier
=490 1\$6880-05$aKyujanggak haksul ch’ongsŏ ;$v11
=504 \\$aIncludes bibliographical references and index.
=546 \\$aText in Korean and Korean$b(Hanmun).
=650 \0$aFinance, Public$zKorea$xHistory$y18th century.
=650 \0$aFinance, Public$zKorea$xHistory$y19th century.
=650 \0$aGovernment purchasing$zKorea$xHistory.
=651 \0$aKorea$xKings and rulers$xFinance, Personal.
=651 \0$aKorea$xPolitics and government$y1637-1864.
=650 \7$6880-06$aFinance, Public$2fast
=651 \7$6880-07$aKorea$2fast
=650 \7$6880-08$aGovernment purchasing$2fast
=650 \7$6880-09$aKings and rulers$2fast
=650 \7$6880-10$aFinance, Personal$2fast
=650 \7$6880-11$aPolitics and government$2fast
=648 \7$a1700-1799$2local/osu
=648 \7$a1800-1899$2local/osu
=648 \7$a1637-1864$2local/osu
=655 \7$6880-12$aHistory$2fast
=787 0\$nOCLC Work Id$ohttp://worldcat.org/entity/work/id/3000468792
=830 \0$6880-13$aKyujanggak haksul ch’ongsŏ ;$v11.
=880 1\$6100-01/{dollar}1$a조 영준$c(Professor of Economics),$eauthor.
=880 10$6245-02/{dollar}1$a조선 후기 왕실 재정 과 서울 상업 =$bRoyal finance and procurement in late Choson Korea /$c조 영준.
=880 \\$6250-03/{dollar}1$a초판.
=880 \1$6264-04/{dollar}1$a서울시 :$b소명 출판,$c2016.
=880 1\$6490-05/{dollar}1$a규장각 학술 총서 ;$v11
=880 \7$6650-00/{dollar}1$a조선(국명)[朝鮮]$2local/OSU/kdl
=880 \7$6650-06/{dollar}1$a국가 재정[國家財政]$2local/OSU/kdl
=880 \7$6650-08/{dollar}1$a정부 구매[政府購買]$2local/OSU/kdl
=880 \7$6650-09/{dollar}1$a왕(국왕)[王]$2local/OSU/kdl
=880 \7$6650-10/{dollar}1$a개인 금융[個人金融]$2local/OSU/kdl
=880 \7$6650-11/{dollar}1$a정부(행정부)[政府]$2local/OSU/kdl
=880 \7$6651-07/{dollar}1$a한국(국명)[韓國]$2local/OSU/kdl
=880 \7$6655-12/{dollar}1$a역사[歷史]$2local/OSU/kdl
=880 \0$6830-13/{dollar}1$a규장각 학술 총서 ;$v11.
I can now run the build links tool, utilizing the new rules file, and automatically generate URIs for all elements in the record:
=001 ocn948964079
=003 OCoLC
=005 20170426084620.1
=008 160523s2016 ko a b 001 0 kor
=010 \\$a 2016423183
=040 \\$aDLC$beng$erda$cDLC$dHMY$dOCLCO$dLOA$dOSU$dCNUTO$dRBN$dEYM$dOSU$dOCLCO$dNDD$dBNG
=066 \\$cHani$c{dollar}1
=020 \\$a9791159050503
=035 \\$a(OCoLC)948964079
=042 \\$apcc
=043 \\$aa-kr—
=050 00$aHJ1400.5$b.C55476 2016
=082 \\$aCD934.6
=049 \\$aMAIN
=100 1\$6880-01$aCho, Yŏng-jun$c(Professor of Economics),$eauthor.$0http://id.loc.gov/authorities/names/n2014011732
=245 10$6880-02$aChosŏn hugi wangsil chaejŏng kwa Sŏul sangŏp =$bRoyal finance and procurement in late Choson Korea /$cCho Yŏng-jun.
=246 31$aRoyal finance and procurement in late Chosun Korea
=250 \\$6880-03$aCh’op’an.
=264 \1$6880-04$aSŏul-si :$bSomyŏng Ch’ulp’an,$c2016.
=300 \\$a365 pages :$billustrations ;$c23 cm.
=336 \\$atext$2rdacontent$0http://id.loc.gov/vocabulary/contentTypes/txt
=337 \\$aunmediated$2rdamedia$0http://id.loc.gov/vocabulary/mediaTypes/n
=338 \\$avolume$2rdacarrier$0http://id.loc.gov/vocabulary/carriers/nc
=490 1\$6880-05$aKyujanggak haksul ch’ongsŏ ;$v11
=504 \\$aIncludes bibliographical references and index.
=546 \\$aText in Korean and Korean$b(Hanmun).
=650 \0$aFinance, Public$zKorea$xHistory$y18th century.
=650 \0$aFinance, Public$zKorea$xHistory$y19th century.
=650 \0$aGovernment purchasing$zKorea$xHistory.
=651 \0$aKorea$xKings and rulers$xFinance, Personal.
=651 \0$aKorea$xPolitics and government$y1637-1864.$0http://id.loc.gov/authorities/subjects/sh2010009461
=650 \7$6880-06$aFinance, Public$2fast$0http://id.worldcat.org/fast/924477
=651 \7$6880-07$aKorea$2fast$0http://id.worldcat.org/fast/1206791
=650 \7$6880-08$aGovernment purchasing$2fast$0http://id.worldcat.org/fast/945538
=650 \7$6880-09$aKings and rulers$2fast$0http://id.worldcat.org/fast/987694
=650 \7$6880-10$aFinance, Personal$2fast$0http://id.worldcat.org/fast/924449
=650 \7$6880-11$aPolitics and government$2fast$0http://id.worldcat.org/fast/983330
=648 \7$a1700-1799$2local/osu
=648 \7$a1800-1899$2local/osu
=648 \7$a1637-1864$2local/osu
=655 \7$6880-12$aHistory$2fast$0http://id.worldcat.org/fast/1411628
=787 0\$nOCLC Work Id$ohttp://worldcat.org/entity/work/id/3000468792
=830 \0$6880-13$aKyujanggak haksul ch’ongsŏ ;$v11.
=880 1\$6100-01/{dollar}1$a조 영준$c(Professor of Economics),$eauthor.$0http://id.loc.gov/authorities/names/n2014011732
=880 10$6245-02/{dollar}1$a조선 후기 왕실 재정 과 서울 상업 =$bRoyal finance and procurement in late Choson Korea /$c조 영준.
=880 \\$6250-03/{dollar}1$a초판.
=880 \1$6264-04/{dollar}1$a서울시 :$b소명 출판,$c2016.
=880 1\$6490-05/{dollar}1$a규장각 학술 총서 ;$v11
=880 \7$6650-00/{dollar}1$a조선(국명)[朝鮮]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00058311
=880 \7$6650-06/{dollar}1$a국가 재정[國家財政]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00028603
=880 \7$6650-08/{dollar}1$a정부 구매[政府購買]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00597253
=880 \7$6650-09/{dollar}1$a왕(국왕)[王]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00021841
=880 \7$6650-10/{dollar}1$a개인 금융[個人金融]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00011647
=880 \7$6650-11/{dollar}1$a정부(행정부)[政府]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00010204
=880 \7$6651-07/{dollar}1$a한국(국명)[韓國]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00116553
=880 \7$6655-12/{dollar}1$a역사[歷史]$2local/OSU/kdl$0http://lod.nl.go.kr/resource/KSH00006486
=880 \0$6830-13/{dollar}1$a규장각 학술 총서 ;$v11.
The process allows us to integrate local control vocabulary resolution into our current workflow process, and provides a unit label that can then be utilized to update the $2 with the official Term Code when it is established (and we are assuming that it will be).
However, beyond this example, I’ve become convinced (through our own local work), that as libraries move into the linked data spaces, local vocabularies will be as or more important than those provided by the large national organizations. Mostly, because this is where the expertise lives. LCSH for example, is great for general subjects — but localized vocabularies for specific domain areas exist elsewhere, and enrich and enhance the description of our records. Today, these tend to be placed into local subject fields — but imagine a linked data world where these vocabularies could be made available, and placed along-side their national cousins. It could be an exciting future…and for those walking down that path, this the way that MarcEdit can be utilized to support it.