{"id":356,"date":"2016-01-04T21:59:02","date_gmt":"2016-01-04T21:59:02","guid":{"rendered":"http:\/\/marcedit.reeset.net\/learning_marcedit\/?page_id=356"},"modified":"2016-01-05T22:00:53","modified_gmt":"2016-01-05T22:00:53","slug":"slice-dice-and-join-your-records-again","status":"publish","type":"page","link":"https:\/\/marcedit.reeset.net\/learning_marcedit\/9-2\/slice-dice-and-join-your-records-again\/","title":{"rendered":"Chapter 3:  Slice, Dice, and Join your Records Again"},"content":{"rendered":"<h3>In this Chapter<\/h3>\n<ul>\n<li><strong>Getting Started<\/strong><\/li>\n<li><strong>Splitting MARC Records<\/strong><\/li>\n<li><strong>Joining MARC Records<\/strong><\/li>\n<li><strong>Beyond Split and Join<\/strong><\/li>\n<\/ul>\n<h3>Getting Started<\/h3>\n<p> If your library is like most libraries I work with, you get a lot of your bibliographic metadata from content providers. \u00a0E-Books, E-Journals, Shelf-Ready&#8230;it seems like libraries are constantly having to deal with files of MARC records. \u00a0Some of these files are incredibly large, with thousands of records, while others may have just a couple of records, but may be part of a zip with with thousands of these individual record files. \u00a0Whatever the case, catalogers have developed lots of strategies for handling vendor data &#8212; and one of those strategies is slicing, dicing, and joining record data together. \u00a0Whether we are talking about a really large file that needs to be cut into more manageable chunks, or a thousands of individual files that need to be joined into one single file &#8212; catalogers need tools and MarcEdit provide them with options. <\/p>\n<h3>Splitting MARC Files<\/h3>\n<p> Imaging the following use case:<\/p>\n<p><em>I&#8217;m a cataloger that has just received a vendor file containing a current set of purchased e-books. \u00a0The file contains approximately 10,000 records, but are of questionable quality. \u00a0I&#8217;d like to have the ability to break this file into 5 smaller files to simplify editing.<\/em><\/p>\n<p>The above use case tends to be pretty common. \u00a0The size of the files will vary, but one of the more common questions that I receive deal with turning large files into more manageable files. \u00a0This is what the MARCSplit tool was designed to do.<\/p>\n<div id=\"attachment_360\" style=\"width: 655px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcsplit.png\" rel=\"attachment wp-att-360\"><img aria-describedby=\"caption-attachment-360\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-360\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcsplit.png\" alt=\"Figure 1: MARCSplit Window\" width=\"645\" height=\"365\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcsplit.png 645w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcsplit-300x170.png 300w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcsplit-624x353.png 624w\" sizes=\"(max-width: 645px) 100vw, 645px\" \/><\/a><p id=\"caption-attachment-360\" class=\"wp-caption-text\">Figure 1: MARCSplit Window<\/p><\/div>\n<p>The MARCSplit tool can be called from the MARC Tools Window, from the Main MarcEdit Program Window (Tools menu\/MARCSplit) or can be added to the Main Window as a link via the preferences. \u00a0And as one might guess from the name of the tool &#8212; the purpose of this tool is to break large MARC files into groups of smaller files.<\/p>\n<p>The MARCSplit utility has a couple of different modes that are worth highlighting. \u00a0First, by default, the tool breaks records based on the number of records defined by the user. \u00a0By default, the tool sets a Record per file limit of 1000. \u00a0This means that the tool will generate files of up to 1000 records. \u00a0Yours can adjust this value to meet their particular need.<\/p>\n<p>Secondly, users have an option to select the # of files to split. \u00a0This option allows users to only generate a specified number of files via a split operation. \u00a0What it does not do is automatically split an entire file into a set number of files. \u00a0Let me explain&#8230;<\/p>\n<p>Say that we have a file with 100,000 records. \u00a0I&#8217;d like to split that into files of 1000, but I only want to see the 10 files. \u00a0Rather than creating 100 files, with 1000 records in each file &#8212; I can check the # of files option, set the Records per file to 1000, and tell MarcEdit that I only want 10 files. \u00a0MarcEdit with then generate just 10 files, each with 1000 records.<!--nextpage--><\/p>\n<p>Say I have that same file of 100,000 records and I&#8217;d like to generate 5 files from this larger record set. \u00a0What I cannot do is check #of files, and set it to 5 and expect MarcEdit&#8217;s MARCSplit tool to automatically calculate how many records that would be per file. \u00a0The tool never knows how many total records are in a file until the process has completed. \u00a0In this case, you&#8217;d have to know that you have 100,000 records, and set the Records per File to 20,000 in order to get just 5 smaller files.<\/p>\n<p>Finally, when MarcEdit&#8217;s MARCSplit tool generates new files &#8212; it uses the filename: msplit0000000.mrc. \u00a0The tool increments the file number till it completes the process. \u00a0However, if you are splitting a large file into a bunch of individual files with one record per file, MarcEdit will prompt you on process to see if you&#8217;d like to utilize one of the values in the MARC record to name the file (i.e., a control number like the 001, or the 245$a). <\/p>\n<h3>Joining MARC Data<\/h3>\n<p> It would stand to reason that if catalogers want to split large files into smaller ones, they may need to be able to join smaller files into a large one. \u00a0The MARCJoin tool provides a simple interface for joining individual files or entire directories of files into a single file. \u00a0The tool also provides the ability to append data into an existing file, not just overwrite existing data. \u00a0For example:<\/p>\n<p><em>As a cataloger, I need to join files in 10 directories together. \u00a0Using the MARCJoin tool, I join all the files in my first directory into a new file. \u00a0I then join all the files in subsequent directories, joining the data into the existing file defined in step one. \u00a0When finished, I&#8217;ve run the join tool 10 times, but I&#8217;ve only created one file as each process appends the new data into the existing file.<\/em><\/p>\n<div id=\"attachment_362\" style=\"width: 655px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin1.png\" rel=\"attachment wp-att-362\"><img aria-describedby=\"caption-attachment-362\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-362\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin1.png\" alt=\"Figure 2: MARC Join Window\" width=\"645\" height=\"365\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin1.png 645w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin1-300x170.png 300w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin1-624x353.png 624w\" sizes=\"(max-width: 645px) 100vw, 645px\" \/><\/a><p id=\"caption-attachment-362\" class=\"wp-caption-text\">Figure 2: MARC Join Window<\/p><\/div>\n<p>By default, MARCJoin is configured to join individual files together. \u00a0The Join Individual Files button configures the File(s) to join button to open an individual file selection dialog box &#8212; this box allows users to individually selected multiple files for join. \u00a0However, if I have a directory of content that I want to join, I can forgo the file selection by unchecking the Join Individual Files checkbox. \u00a0This reconfigures that File(s) to Join button to open a folder selection dialog box. \u00a0I&#8217;m also given the option to define the file extension of the files I wish to join together (Figure 3).<\/p>\n<div id=\"attachment_363\" style=\"width: 655px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin2.png\" rel=\"attachment wp-att-363\"><img aria-describedby=\"caption-attachment-363\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-363\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin2.png\" alt=\"Figure 3: MARCJoin by directory\" width=\"645\" height=\"365\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin2.png 645w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin2-300x170.png 300w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/marcjoin2-624x353.png 624w\" sizes=\"(max-width: 645px) 100vw, 645px\" \/><\/a><p id=\"caption-attachment-363\" class=\"wp-caption-text\">Figure 3: MARCJoin by directory<\/p><\/div>\n<p>This option is much more efficient when working with large sets of files that you wish to join. \u00a0While large, File(s) to Join textbox has a limit to the number of characters that it can contain. \u00a0This limit is approximately 2 GB (in theory), but the gist here is that there are limits (and performance implications) if using the select individual files option and then selecting thousands of files to join. \u00a0In those cases, isolating the data to join into a directory and then just joining the data in the directory together represents the most efficient and performant option. <\/p>\n<h3>Beyond Split and Join<\/h3>\n<p> MarcEdit provides a number of other options for users looking to isolate specific records for edit. \u00a0Probably the most versitle is the Extract Selected Records Tool.<!--nextpage--><\/p>\n<div id=\"attachment_365\" style=\"width: 579px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/extractselectedrecords.png\" rel=\"attachment wp-att-365\"><img aria-describedby=\"caption-attachment-365\" decoding=\"async\" loading=\"lazy\" class=\"size-full wp-image-365\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/extractselectedrecords.png\" alt=\"Figure 4: Extract Selected Records Tool\" width=\"569\" height=\"529\" srcset=\"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/extractselectedrecords.png 569w, https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2016\/01\/extractselectedrecords-300x279.png 300w\" sizes=\"(max-width: 569px) 100vw, 569px\" \/><\/a><p id=\"caption-attachment-365\" class=\"wp-caption-text\">Figure 4: Extract Selected Records Tool<\/p><\/div>\n<p>The Extract Selected Records tool is accessible from the Main program Window. \u00a0Users should Select Tools\/Select MARC Records\/Extract Selected Records. \u00a0The tool allows users to search across their file and pull together groups of data that can then be extracted for further edits. \u00a0The tool works by defining a Display Field. \u00a0By default, this value is set to the defined Title tag in the preferences. \u00a0Once defined, the user selects the file that they want to evaluate and then imports the records. \u00a0The user then has the option to search across the display field, or across all record data, for specific data elements. \u00a0Users can stack queries by running a search, and then checking the &#8220;Retain Checked Items&#8221; option. \u00a0This tells MarcEdit that the user wants to combine the search results together. \u00a0In this way, users could do multiple queries for groups of items, and then extract them all at the end.<\/p>\n<p>When searching for data using this tool, MarcEdit provides a number of different search options. \u00a0Users can query using the following options\/search mnemonics: <\/p>\n<ol>\n<li>By default, search happens across the display field and as an in-string match.<\/li>\n<li>Users can query the display field data using a regular expression by checking the use regular expression option<\/li>\n<li>Users can query across all record data (not just the information in the display field) by selecting the Query all Record data option. \u00a0By checking use regular expressions, users can make this query a regular expression.<\/li>\n<li>Record Number search: \u00a0By placing: R#:[<em>record numbers<\/em>] into the search box, the tool will select the defined record number or range of numbers. \u00a0Example: R#:1,3-7,55,77-90 \u00a0will select record 1, 3 through 7, 55, and 77 through 90. \u00a0Please remember, record number count starts at zero.<\/li>\n<li>Field\/Subfield Data search: By default, the Delete Selected Records utility searches just the data in the 245$a.\u00a0 However, you can a search on just a field or a specific subfield within a field by using the following syntax &#8212; F#:[fielddata]$[subfield][space].\u00a0\u00a0 <b>Please note: This search is case-insensitive.<\/b>\u00a0 The below example would select all records that contain a 650$v with Maps. <i>Example: F#:650$v Maps<\/i><\/li>\n<li>Batch Search: FILE#:[Path to file].\u00a0 Enter each search on its own line.<\/li>\n<\/ol>\n<p> &nbsp;<\/p>\n<p>[table]<a href=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2013\/05\/tip.png\"><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/marcedit.reeset.net\/learning_marcedit\/wp-content\/uploads\/2013\/05\/tip.png\" alt=\"tip\" width=\"85\" height=\"75\" \/><\/a>\u00a0[attr style=&#8221;width:90px&#8221;], &#8220;One of the most common questions that come up when using this tool is how to find records that do not have a specific field or subfield. \u00a0The Extract Selected Records tool can easily help a cataloger identify these records. \u00a0First, set the display field to the field or field\/subfield that one wants to find. \u00a0Select the file and import the data. \u00a0In the record list, the Display Field will be represented with the text, Display field not found &#8212; if the display field isn&#8217;t present. \u00a0To capture all records missing the display field, you can either search for &#8216;Display field not found&#8217; or you can click on the Does not match link. \u00a0This will select all the items that are missing the display field. \u00a0If you want all the items with the display field &#8212; excluding those that are missing, you can get that data by clicking on the Invert Selections&#8230;this will change the selected data from Display field not found, to those records where the display field was present.&#8221;[\/table]<!--nextpage--><\/p>\n<p>Once the user has selected their content, they can export their data. \u00a0The tool provides two export options: Export, and Export Random. \u00a0Export Random will prompt users to give a percentage of the records to export, and then will randomly select records for export till that percentage has been selected. \u00a0Export will just export the selected items from the file and generate a new file. \u00a0Lastly, users are prompted on export if they wish to delete the extracted data from the source file.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this Chapter Getting Started Splitting MARC Records Joining MARC Records Beyond Split and Join Getting Started If your library is like most libraries I work with, you get a lot of your bibliographic metadata from content providers. \u00a0E-Books, E-Journals, Shelf-Ready&#8230;it seems like libraries are constantly having to deal with files of MARC records. \u00a0Some [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":9,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/356"}],"collection":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/comments?post=356"}],"version-history":[{"count":9,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/356\/revisions"}],"predecessor-version":[{"id":464,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/356\/revisions\/464"}],"up":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/9"}],"wp:attachment":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/media?parent=356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}