$\"Figure$ <\/a>

Figure 1: Characterset Conversions Window<\/p><\/div>\n

The MarcEdit Characterset Conversions tool provides a batch method for changing data from one characterset to another. \u00a0The program makes a couple of assumptions:<\/p>\n

That all the data in the file is in the same characterset<\/li>\n

That the user knows the original characterset encoding<\/li>\n<\/ol>\n
[table] $\"tip\"$ <\/a>\u00a0[attr style=”width:90px”], “MarcEdit do not have an automated tool for determining the characterset for your file. \u00a0This is partly because at the binary level, many charactersets look the same. \u00a0“[\/table]<\/p>\n
By default, MarcEdit’s Original Encoding and Final Encoding dropdown boxes are populated with the most common charactersets encountered by the MarcEdit user community.<\/p>\n
$\"Figure$ <\/a>
Figure 2: Available Charactersets<\/p><\/div>\n
Charactersets are defined by the codepage (i.e., 1252) and then their human readable description. \u00a0The codepage part is important — while this list represents the most common set of charactersets encountered by MarcEdit users, it is by no means an exhaustive list. \u00a0Each operating system provides a set of supported charactersets…MarcEdit can utilize any characterset supported by the operating system so long as the codepage number is entered into the Original or Final encoding. \u00a0For Windows users, Microsoft makes a list of supported codepages available in the Windows knowledge-base[ref]https:\/\/msdn.microsoft.com\/en-us\/library\/dd317756(VS.85).aspx<\/a>[\/ref].<\/p>\n
To use this function, follow these steps:<\/p>\n
1. Set the Final Encoding. \u00a0This is the encoding that the data should end up in<\/li>\n<\/ol>\n
  Important Notes<\/h4>\n
  Switching between charactersets can be tricky, meaning that a handful of issues can cause hard to find problems.<\/p>\n
  \n
  When converting to a characterset that is not UTF8 or MARC8, it is generally better to convert your data to UTF8 first, and then convert to the final encoding. \u00a0You can go straight to your desired encoding, but if something goes wrong, its easier to debug the process.<\/li>\n
  MarcEdit’s character conversion tool will not work if any structural errors exist in the record. Encoding changes require the evaluation of very precise sequences – if the file structure is incorrect, I can guarantee the conversion likely will be as well.<\/li>\n
  Remember, MARC field length and record length is calculated by bytes, not characters. \u00a0In MARC8, these two are the same thing. \u00a0When dealing with UTF8 data, they are not. \u00a0For records already approaching record or field limits — translating large amounts of diacritics to UTF8 could potentially cause structural errors (though this doesn’t happen often — and mostly occurs when moving data from XML into MARC).<\/li>\n<\/ol>\n
  Getting Help<\/h3>\n
  So you have a data file and you want to encoding it into UTF8 but don’t know the original character encoding….that is a problem. \u00a0Unfortunately, MarcEdit can’t automatically detect the file’s characterset. \u00a0This is partly because many charactersets look identical at the binary level. However, I work with enough of them to maybe be able to give you some help.<\/p>\n
  \n
  From Z39.50<\/em>: Surprisingly, many Z39.50 servers return data in\u00a028591 (ISO-8859-1) format (especially Innovative Interfaces Servers unless requested otherwise). \u00a0This looks a lot like MARC8, but it isn’t. \u00a0If you try processing data to UTF8 using MARC8 and the data isn’t correct, trying using this setting (or 1252 (US\/Western European) — if the 28591 codepage doesn’t work).<\/li>\n
  My codepage no longer is in use<\/em>: \u00a0This sometimes happens. \u00a0If you look at the MarcEdit supported codepages, you’ll see one such codepage on this list:\u00a05426 (ISO-5426). \u00a0Codepage 5426[ref]http:\/\/www.iso.org\/iso\/iso_catalogue\/catalogue_tc\/catalogue_detail.htm?csnumber=11468<\/a>[\/ref] is an extension of the Latin alphabet used primary in minor European language and obsolete typography. \u00a0However, there is a significant number of UNIMARC records in France that are stuck in this legacy format. \u00a0To support the transition to UTF8, I added this to MarcEdit. \u00a0So, if you run across a codepage that is no longer supported, let me know. \u00a0There may be options.<\/li>\n<\/ul>\n
  Finally, the best place to get information about a files characterset encoding is to ask the individual or agency that provided you with the file. \u00a0Good luck!<\/p>\n
  <\/p>\n","protected":false},"excerpt":{"rendered":"
  In this Chapter Getting Started Working with Charactersets Available Charactersets Getting Help Getting Started Characterset support has long been a thorn in the side of many technical services librarians, because MARC data comes in so many different flavors and local character encodings. \u00a0For catalogers in North America — two primary character encodings are dominant: MARC8 […]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":9,"menu_order":1,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/343"}],"collection":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/comments?post=343"}],"version-history":[{"count":13,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/343\/revisions"}],"predecessor-version":[{"id":552,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/343\/revisions\/552"}],"up":[{"embeddable":true,"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/pages\/9"}],"wp:attachment":[{"href":"https:\/\/marcedit.reeset.net\/learning_marcedit\/wp-json\/wp\/v2\/media?parent=343"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}

Important Notes<\/h4>\nSwitching between charactersets can be tricky, meaning that a handful of issues can cause hard to find problems.<\/p>\n