{"id":448,"date":"2016-01-05T19:14:08","date_gmt":"2016-01-05T19:14:08","guid":{"rendered":"http:\/\/marcedit.reeset.net\/learning_marcedit\/?page_id=448"},"modified":"2016-01-05T21:15:27","modified_gmt":"2016-01-05T21:15:27","slug":"chapter-4-merging-marc-data-sets","status":"publish","type":"page","link":"https:\/\/marcedit.reeset.net\/learning_marcedit\/9-2\/chapter-4-merging-marc-data-sets\/","title":{"rendered":"Chapter 4: Merging MARC Data Sets"},"content":{"rendered":"<h2>In this Chapter:<\/h2>\n<ul>\n<li><strong>Getting Started<\/strong><\/li>\n<li><strong>Use Cases<\/strong><\/li>\n<li><strong>Merging Data<\/strong><\/li>\n<\/ul>\n<h3>Getting Started<\/h3>\n<p> With metadata originating from so many different sources, one of the many challenges faced by catalogers is the ability to merge new data with existing records and do so in a way that doesn&#8217;t require a significant amount of manual editing. \u00a0New records may add new subjects, new descriptive information, new control numbers &#8212; finding ways to automate the capture and enhancement of existing records is an important workflow. \u00a0And it&#8217;s a hard one. \u00a0While OCLC and other cooperative catalogs have made cataloging materials much easier &#8212; these systems introduce their own problems as they merge records and deprecate control numbers. \u00a0There was a time when the OCLC control number could reliably counted on to be the best match in a record &#8212; but as OCLC cleans and merges data &#8212; the 001 has become less meaningful &#8212; at least on its own, as other fields like the 019 become even more important when doing automated record evaluation.<\/p>\n<p>To help catalogers automated workflows around record and data merging, MarcEdit introduced a Merge Records tool. \u00a0This function has undergone a number of different revisions, providing users with a wide range of options and record matching options. \u00a0Is it a perfect tool &#8212; not by a long-shot, but it provide catalogers with a reliable way to merge record data. <\/p>\n<h3>Use Cases<\/h3>\n<p> The Merge Records tool has been created to support a very specific set of use cases, and while users have found ways to extend and use the program to support other parts of their workflows, the most common use cases for using this tool are as follows:<\/p>\n<p><em>As a cataloger, I receive multiple e-journal files. \u00a0These MARC records cover many of the same titles, and rather than load an individual record for each vendor, I&#8217;d like to just merge the record sets together and keep the different URLs. \u00a0<\/em><\/p>\n<p><em>As a cataloger, I received a set of modified records from a vendor. \u00a0We already have a local file in our database with lots of local information. I&#8217;d like to merge information from the vendor record to my local record.<\/em><\/p>\n<p><em>As a consortial manager, I have a set of records from new member that need to be merged into the catalog. \u00a0These records are almost all duplicates, but the OCLC numbers don&#8217;t match. \u00a0Many of these records have updated OCLC numbers that no longer match the older value in our catalog. \u00a0<\/em><\/p>\n<p><em>As a cataloger, I&#8217;ve received a single file that contains a lot of duplication. \u00a0Some of these duplicates have unique data in the 700 and 856 fields. \u00a0I&#8217;d like to merge the records in the file together, and only merge unique data from these two fields into the final record set.<\/em><\/p>\n<p>The above use cases reference common questions asked on the MarcEdit ListServ, or that I have received personally from users. \u00a0The general thread of these queries is&#8230;I have a set of records that are like an existing set that includes data I don&#8217;t want to lose. \u00a0Is it possible to merge data from the new records into the old records. Fortunately, the answer is generally yes.<!--nextpage--><\/p>\n<p>MarcEdit&#8217;s Merge Records tool was developed specifically for the purpose of finding like records and merging the bibliographic information. \u00a0The tool provides a couple of different options for defining how the tool will make a &#8220;match&#8221;. \u00a0The most common would be matching by a control number or controlled key. \u00a0The tool predefined a series of control values that may be used when matching record data. \u00a0The user also have the option to select their own field\/subfield combination for matching.<\/p>\n<p>[table]<a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2013\/05\/tip.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-26\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2013\/05\/tip.png\" alt=\"tip\" width=\"85\" height=\"75\" \/><\/a>[attr style=&#8221;width:90px&#8221;],&#8221;When defining your own field\/subfield combination for record matching &#8212; it is important to note that MarcEdit treats this data as a hybrid control number. \u00a0This configures the matching algorithm to work much like an the way it handles ISBN\/ISSN matching, though without the added matching logic reserved for working with ISBN\/ISSN\/control data. &#8220;[\/table]<\/p>\n<p>The final matching option is defined as MARC21. \u00a0This is a set of ~20 values that are queried against a set of records to determine a confidence match. \u00a0This option is most commonly used when there is no shared conntrol data between a particular set of records. <\/p>\n<h3>Merge Records Tool<\/h3>\n<p> The Merge Records Tool can be accessed from the Main MarcEdit Window, selecting the Tools menu option. \u00a0This tool can also be added to the Main Windows as a shortcut by making the appropriate changes to the preferences in the Main Window section.<\/p>\n<div id=\"attachment_455\" style=\"width: 655px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/memergerecords1.png\" rel=\"attachment wp-att-455\"><img aria-describedby=\"caption-attachment-455\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-455\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/memergerecords1.png\" alt=\"Figure 1: Merge Records Dialog\" width=\"645\" height=\"380\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/memergerecords1.png 645w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/memergerecords1-300x177.png 300w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/memergerecords1-624x368.png 624w\" sizes=\"(max-width: 645px) 100vw, 645px\" \/><\/a><p id=\"caption-attachment-455\" class=\"wp-caption-text\">Figure 1: Merge Records Dialog<\/p><\/div>\n<h4><strong>Processing Records<\/strong><\/h4>\n<p> When utilizing the Merge Records Tool, MarcEdit makes a handful of assumptions based on how the user defines their data within the Source, Merge, and Save File fields. \u00a0These assumptions are as follows:<\/p>\n<p><strong>Source File:\u00a0<\/strong> The Source File represents the primary record. \u00a0This is the record that will have data merged into it.<\/p>\n<p><strong>Merge File:\u00a0<\/strong> The Merge file represents the tertiary record. \u00a0This is the record that contains data that should be merged into the source record. \u00a0When the <strong>Source <\/strong>and\u00a0<strong>Merge\u00a0<\/strong>files are different, the tool only will merge data into the <strong>Source\u00a0<\/strong>file from the\u00a0<strong>Merge\u00a0<\/strong>file\u00a0<strong>*if*\u00a0<\/strong>the data matches. \u00a0The tool will not copy records missing from the\u00a0<strong>Source\u00a0<\/strong>file, but present in the\u00a0<strong>Merge\u00a0<\/strong>file. \u00a0However, if the\u00a0<strong>Source <\/strong>and\u00a0<strong>Merge\u00a0<\/strong>file are identical &#8212; the program will assume that the user wants to collapse and merge like records within the file together.<\/p>\n<p><strong>Save File:\u00a0<\/strong> Save file is where the Merged record are saved to. \u00a0MarcEdit will not change either the <strong>Source<\/strong> or the\u00a0<strong>Merge\u00a0<\/strong>data files.<\/p>\n<p>Once the user has defined their data sources for merge, the user must determine how the application should determine if a record should be merged. \u00a0Presently, the tool provides seven options: <\/p>\n<ol>\n<li>001<\/li>\n<li>010$a<\/li>\n<li>020$a<\/li>\n<li>022$a<\/li>\n<li>035$a<\/li>\n<li>MARC21<\/li>\n<li>User-defined<\/li>\n<\/ol>\n<p> To provide a better set of outcomes related to merging records, MarcEdit predefines some common field match points. \u00a0This is done so that the program can add additional logic around the merging of records when one of these items is selected. \u00a0When selecting from the predefined options, MarcEdit makes the following assumptions:<!--nextpage--><\/p>\n<p><strong>001: <\/strong>MarcEdit recognizes this option as a match on OCLC record number. \u00a0When this match point is selected, MarcEdit will expand the fields queried when evaluating records for like OCLC numbers, looking at both the 035$a field with an OCLC qualifier, and the 019 field. \u00a0The 019 field is especially important, as this field notes all the OCLC numbers that have been associated with this record. \u00a0This allows MarcEdit to match on records who&#8217;s OCLC numbers have changed to do record merging or deletion due to duplication by OCLC&#8217;s Quality Processing tools.<\/p>\n<p><strong>010$a: <\/strong>The tool recognizes this field as being the LCCN control number. \u00a0This allows MarcEdit to match on records using rules related to how LCCN&#8217;s have been formatted over time.<\/p>\n<p><strong>020$a, 022$a, 035$a:<\/strong> MarcEdit treats these fields roughly the same, in that they are treated as keyed indexes. \u00a0The tool normalizes the data found in the records, removing qualifiers found in the 020 and 022 &#8212; allow the tool to match on a wide range of variations.<\/p>\n<p><strong>MARC21<\/strong><strong>:<\/strong> The MARC21 option is a highly customized algorithm that examines specific areas of a bibliographic record to determine the probability that the records are the same. \u00a0Users have the ability to customize the depth of that interrogation by customizing which fields are examined by the tool.<\/p>\n<div id=\"attachment_457\" style=\"width: 389px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/mergemarc21options.png\" rel=\"attachment wp-att-457\"><img aria-describedby=\"caption-attachment-457\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-457\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/mergemarc21options.png\" alt=\"Figure 2: MARC21 Merge Options\" width=\"379\" height=\"303\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/mergemarc21options.png 379w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/mergemarc21options-300x240.png 300w\" sizes=\"(max-width: 379px) 100vw, 379px\" \/><\/a><p id=\"caption-attachment-457\" class=\"wp-caption-text\">Figure 2: MARC21 Merge Options<\/p><\/div>\n<p>[table]<a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2013\/05\/tip.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-26\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2013\/05\/tip.png\" alt=\"tip\" width=\"85\" height=\"75\" \/><\/a>[attr style=&#8221;width:90px&#8221;],&#8221;When using the MARC21 merge option, it is generally recommended that users not change the matching criteria. \u00a0The tool is smart enough to only evaluate data present in both records. \u00a0When customizing the matching criteria, users should closely examine the results. \u00a0MarcEdit&#8217;s MARC21 matching algorithm will attempt to rebalance itself each time a value is removed from the matching criteria. \u00a0Depending on the data in the records, removing specific criteria (which is record dependent) may unbalance the equation causing the tool to fail.&#8221;[\/table]<\/p>\n<p>Once a user has selected their data and defined how the data is to be matched for merging, they will need to select what data will be merged. \u00a0Click Next takes the user to a screen where they define the data to be merged.<\/p>\n<div id=\"attachment_458\" style=\"width: 654px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/merged_windows.png\" rel=\"attachment wp-att-458\"><img aria-describedby=\"caption-attachment-458\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-458\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/merged_windows.png\" alt=\"Figure 3: Select Merge Data\" width=\"644\" height=\"365\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/merged_windows.png 644w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/merged_windows-300x170.png 300w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/merged_windows-624x354.png 624w\" sizes=\"(max-width: 644px) 100vw, 644px\" \/><\/a><p id=\"caption-attachment-458\" class=\"wp-caption-text\">Figure 3: Select Merge Data<\/p><\/div>\n<p>Here the user has a handful of options available to them. \u00a0On the left, there is a list of fields. \u00a0These are the fields that can be identified for merge. \u00a0This list includes field LDR, 001-999. \u00a0If a user has records that include other alphanumeric values, they simply need to add that data to the Select Field box, and click the green button.<\/p>\n<p>Users can select individual\u00a0records from the list or enter the field number that they wish to capture in the Select Field Textbox. \u00a0The Field list includes fields ranging from LDR, 001-999. \u00a0For users that have records using other alphanumerical field labels not found in the list, they can capture that data by just entering the field label into the Select Field Textbox and then clicking the green button moving the selected fields or data defined in the Select Field box into the Merge Fields list.<!--nextpage--><\/p>\n<p>If the user is merging data that might already be found in the merged record, the user has the option to select the Merge Unique Items checkbox. \u00a0This will configure MarcEdit to test all instances of a merged field to determine that the data doesn&#8217;t duplication information already found in a record.<\/p>\n<p>Once the user has finished configuring the application, they simply click Next and the program will process the data, outputting the results to the Save File Location.<\/p>\n<h3>Step-by-Step<\/h3>\n<p>So, that was a lot of text. \u00a0In a nutshell, what are the steps to making this work?<\/p>\n<ol>\n<li>Select your Source File (this is the file data is merged into)<\/li>\n<li>Select your Merge File (this is the file data is merged from)<\/li>\n<li>Select your Save File (this is where the final merged records are saved. \u00a0This\u00a0<strong>must\u00a0<\/strong>be different than the Source or Merge File paths.<\/li>\n<li>Select your merge criteria<\/li>\n<li>Click Next<\/li>\n<li>Select the Fields to Merge<\/li>\n<li>Determine if the Merge Unique Items checkbox is applicable<\/li>\n<li>Click Next<\/li>\n<li>Wait for the results<\/li>\n<\/ol>\n<p>And that&#8217;s it. \u00a0The Merge Records tool will provide a running status message so users can monitor the matching and merging process.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Chapter: Getting Started Use Cases Merging Data Getting Started With metadata originating from so many different sources, one of the many challenges faced by catalogers is the ability to merge new data with existing records and do so in a way that doesn&#8217;t require a significant amount of manual editing. \u00a0New records may [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":9,"menu_order":3,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/448"}],"collection":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/comments?post=448"}],"version-history":[{"count":7,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/448\/revisions"}],"predecessor-version":[{"id":461,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/448\/revisions\/461"}],"up":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/9"}],"wp:attachment":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/media?parent=448"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}