DEV Community

Cover image for Making Life Easier for Programmers: A Simple Solution to Diacritical Marks in Text Files (66 languages)
Łukasz Wójcik
Łukasz Wójcik

Posted on • Edited on

Making Life Easier for Programmers: A Simple Solution to Diacritical Marks in Text Files (66 languages)

A program designed to replace diacritical marks in unformatted text files encoded in UTF-8 format.

Have you ever had trouble with diacritics when working with text files? If so, this program can help you! The replacement program is a simple utility written in Go that automatically replaces diacritics (accent marks / ogonki / diacritiques / tildes / tails / acentos) with the appropriate letters in text files. Thanks to this program, you can avoid errors resulting from incorrect display of diacritics on various operating systems, for example, when compiling programs.

Application

The diacritics swap program can be useful in the following situations:

  1. Working with Multiple Text Files:
    If you have multiple text files that contain diacritics and you want to modify them all, a diacritics replacer can speed up the process considerably. Just pass the paths to the files as arguments to the program, and it will do diacritics swaps in all files.

  2. Working with different languages:
    This program supports languages such as: Albanian, Ao language, Aromanian, Aymara, Basque, Basque, Bosnia and Herzegovina, Breton, Catalan, Cornish, Cornish, Corsican, Creole, Crimean Tatar, Croatian, Czech, Dagestan, Danish, Dutch, Estonian, Faroese, French, Finnish, Frisian, Galician, Gagauz, Gascon, Greek, Hungarian, Icelandic, Irish, Italian, Karaim, Kashubian, Kazakh, Kurdish, Curinian, Latin, Latvian, Lemko, Lithuanian, Livonian, Luxembourgish, Maltese, Malta, Moldavian, Mixe-Zoque, Moksha, Norwegian, Occitan, Old Prussian, Polish, Portuguese, Quechua, Romanian, Romansh, Romani, Sardinian, Scottish Gaelic, Serbian, Slovak, Slovenian, Spanish, Swiss German, Swiss Italian, Turkish, Upper Sorbian, Võro, Vietnamese, Welsh.

The -lang parameter specifies the final language to be replaced. For example, if you want to replace diacritics in Polish text files, use the -lang pl flag when starting the program and similarly to other languages. To find "your" language, just run the program with the help flag, e.g.: OGONKI -help.

  1. Process automation: If diacritics marks swapping is a frequent task in your work or project, you can script a diacritic swapping program to automate it. You can use it as part of a larger data process or as part of continuous integration (CI) or continuous delivery (CD).

Usage

To use the diacritics swap program, just follow a few simple steps:

  1. Download the program:
    Download the diacritics swap program from the GitHub repository or compile it yourself if you have Go installed on your computer.

  2. Specify input files:
    Run the program with arguments that contain paths to files where you want to replace diacritic marks. You can specify multiple files at once by separating them with a space.

  3. Choose a language:
    The program uses the Polish language by default. If you want to replace diacritics in other languages, you can use the -lang flag and specify the language code (ao, ay, ba, baq, br, ch-it, co, cr, crh, crn, csb, cy, cz, di, dk, el, es, et, eu, fi, fiu, fo, fr, fy, ga, gd, gl, gn, gsw, hr, hsb, hu, is, it, kaz, kv, ku, kw, la, lb, liv, lt, lv, mdf, md, mxz, nl, no, oc, op, pl, pt, qu, rm, rom, rup, sc, sk, sl, sq, sr, sv, tr, tut, vi, x-lmk). For example, -lang cz will replace diactric marks in Czech language text files.

  4. Run the program:
    Run the program and see how it replaces the diacritics in the given files. The program will create new output files with swapped diacritics. The new filename will have _modified added to the end of the original filename.

PACKAGES

The code imports several packages that are used in the program:

flag: The flag package provides functions for parsing command-line arguments.

fmt: The fmt package provides functions for formatting and printing data.

time: The time package provides functions for time (time parsing, time formatting, time zone operations and others).

io/ioutil: The ioutil package provides functions for I/O operations such as reading and writing files.

strings: The strings package provides functions for manipulating text strings.

path/filepath: The filepath package provides functions for working with file paths.

import (
    "flag"
    "fmt"
    "time"
    "io/ioutil"
    "strings"
    //adding support for access paths
    "path/filepath"
)
Enter fullscreen mode Exit fullscreen mode

MAP

It defines the variable ogonki as a map, where the key is a string representing the language and the value is a string representing diacritics or special characters specific to the language.

Four languages are defined in this map:

sq: contains Albanian diacritics and special characters, such as çéëÇËÉ,
eu: contains Basque diacritics and special characters, such as áéíóúüñÁÉÍÓÚÜÑ,
br: contains Breton diacritics and special characters, such as âãäæçéèêëêôöœûüùñŷýÿìïîŕśÂÄÃÆÇÉÈÊËÔÖŒÑÙÛÜŶÌÏÎŔŚÝŸ,
ca: contains Catalan diacritics and special characters, such as áàà́èéè́íìòóò́ïúüùýÁÀÀ́ÈÉÈ́ÏÍÌÒÒ́ÓÚÜÙÝ,
hr: contains Croatian diacritics and special characters, such as čćđšžČĆĐŠŽ,
cz: contains Czech diacritics and special characters, such as áčďéěíňóřšťúůýžÁČĎÉĚÍŇÓŘŠŤÚŮÝŽ,
dk: contains Danish diacritics and special characters, such as æøåÆØÅ,
nl: contains Dutch/Netherlands diacritics and special characters, such as äáéëïíóöüúÄÁÉËÍÏÓÖÜÚ,
et: contains Estonian diacritics and special characters, such as äöõüšžÄÖÕÜŠŽ,
fi: contains Finnish diacritics and special characters, such as äåöÄÅÖ,
fr: contains French diacritics and special characters, such as àâäçéèêëîïôûùüÿœæÀÄÂÇÉÈÊËÎÏÔÛÙÜŸŒÆ,
gl: contains Galician diacritics and special characters, such as áéíóúñüÁÉÍÓÚÑÜ,
hu: contains Hungarian diacritics and special characters, such as áéíóöőúüűÁÉÍÓÖŐÚÜŰ,
is: contains Icelandic diacritics and special characters, such as áðéíóöúýþæÁÐÉÍÓÖÚÝÞÆ,
ga: contains Irish diacritics and special characters, such as áéíóúàèéìòóùúċḃḋḟġṁṗṡṫāēīíōūǽǿÁÉÍÓÚÀÈÍÌÒÓÙÚĊḂḊḞĠṀṖṠṪĀĒĪŌŪǼǾ,
it: contains Italian diacritics and special characters, such as àèéìòóùüÀÈÉÌÒÓÙÜ,
lv: contains Latvian diacritics and special characters, such as āčēģīķļņšūžĀČĒĢĪĶĻŅŠŪŽ,
lt: contains Lithuanian diacritics and special characters, such as ąčęėįį̇šųūžĄČĘĖĮĮ̇ŠŲŪŽ,
mt: contains Maltese/Malta diacritics and special characters, such as ċġħżĊĠĦŻ,
no: contains Norwegian diacritics and special characters, such as áéíæøóåúýÆØÁÅÉÍÓÚÝ,
pl: contains Polish diacritics and special characters, such as ąćęłńóśźżĄĆĘŁŃÓŚŹŻ,
ro: contains Romanian diacritics and special characters, such as ăâîșțţĂÂÎȘȚŢ,
sk: contains Slovakia diacritics and special characters, such as áäčďéíĺľňóôŕšťúýžÁÄČĎÉÍĹĽŇÓÔŔŠŤÚÝŽ,
sl: contains Slovenian diacritics and special characters, such as ą̊ą̇ąáčćđéęíįĺľńŕšśóúųžźĄ̊Ą̇ĄÁČĆĐÉĘÍĮĹĽŃŔŠŚÓÚŲŽŹ,
es: contains Spanish diacritics and special characters, such as áéíóúüñÁÉÍÓÚÜÑ,
tr: contains Turkish diacritics and special characters, such as şığüöçŞİĞÜÖÇ,
vi: contains Vietnam diacritics and special characters, such as áàãảạăắằẵẳặâấầẫẩậđéèẽẻẹêếềễểệíìĩỉịóòõỏọôốồỗổộơớờỡởợúùũủụưứừữửựýỳỹỷỵÁÀÃẢẠĂẮẰẴẲẶÂẤẦẪẨẬĐÉÈẼẺẸÊẾỀỄỂỆÍÌĨỈỊÓÒÕỎỌÔỐỒỖỔỘƠỚỜỠỞỢÚÙŨỦỤƯỨỪỮỬỰÝỲỸỶỴ,
cy: contains Welsh diacritics and special characters, such as äâêëîïôûŵŷÂÄÊËÎÏÔÛŴŶ,
ba: contains Bosnia and Herzegovina diacritics and special characters, such as ćčđšžıĆČĐŠŽ,
el: contains Greek diacritics and special characters, such as άαβγδεέζηθικλμνξοπρστήίϊΐυύΰϋόφχψωώΆΓΔΘΛΈΞΠΉΊΪΣΌΦΨΏΩΎΫ,
sr: contains Serbian diacritics and special characters, such as čćđšžüČĆĐŠŽÜ,
la: contains Latin diacritics and special characters, such as áąāéęēīōūȳǖǘǚǜċćġħǰŝźćģķļłņńŗśźżǎěǐǒóúǔǖǘǚǜǧğıíįķļʼnǫřśŝţťūǔůẏýžæœÁĄĀĒÉĘĪŌÓŪȲǕǗǙǛĊĆĠĦŜŹĆĢĶĻŁŅŃŖŚŹŻǍĚÍǏǑǓǕǗǙǛǦĞİĮĶĻŊǪŘŚŜŢŤÚŪǓŮẎÝŽÆŒ,
ku: contains Kurdish diacritics and special characters, such as âáçéêğîíışţôûýÁÂÇÉÊĞÎÍŞŢÔÛÝ,
fo: contains Faroese diacritics and special characters, such as áðéíóúýæøÁÐÉÍÓÚÝÆǾØ,
fy: contains Frisian diacritics and special characters, such as áäâêéëïîíôóöøûúüÿýÁÄÂÊÉËÏÎÍÖÓÔØÚÜÛÝŸ,
hsb: contains Upper Sorbian diacritics and special characters, such as ǎáćčďđěéęȟíj́ḱĺłḿńʼnóöőǫǒřŕśšŭůw̌ŵźžӯǍÁĆČĎĐĚÉȞÍJ̌J́ḰĹŁḾŃŊÓǪǑӦŐǓŬŮŔŘŚŠŴW̌ӮȲŹŽ,
csb: contains Kashubian diacritics and special characters, such as ąâãćêěëęèéé̄łïńôòóó̄ò́ò̂ôśś́üùûżźĄÂÃĆÈÊĚËĘÉÉ̄ŁÏŃÔÒÒ́Ò̂ÓÓ̄ÔÜÙÛŚŚ́ŻŹ,
sq: contains Kosovan diacritics and special characters, such as çëë̈f̈g̈šïx̌ÿžÇËË̈F̈G̈ŠÏŽX̌ŸŽ,
kw: contains Cornish diacritics and special characters, such as āâēêëôōîīïûūŵẃŷȳÿǣǽĀÂĒÊËÔŌÎĪÏÛŪŴẂŸȲŶǢǼ,
crh: contains Crimean Tatar diacritics and special characters, such as âäƀçèéëğıìíñöşûüÂÄÇÈÉËĞÌÍÑÖŞÛÜ,
lb: contains Luxembourgish diacritics and special characters, such as âäéëîïôöûüŷÂÄÉËÎÏÔÖÛÜŶ,
md: contains Moldovan diacritics and special characters, such as ăâîșşțţĂÂÎȘŞȚŢ,
rom: contains Romani diacritics and special characters, such as ạảấầẩẫậắằẳẵặáa̧ǎăȃâãǣćčđȩéěĕȇêẹ̄ẻếềểễệêëîỉịíi̧ǐĭȋîị̄ļłňỏốồổỗộớờởỡợóo̧ǒŏȏôọ̄ǿöőřśšţťüůűùúu̧ǔŭȗûụ̄ụủứừửữựýźžẠÃẢẤẦÁẨẪẬẮẰẲẴẶA̧ǍĂȂÂÃǢĆČĐÉȨĚĔÊẸ̄ẺẾỀỂỄỆĚËỈÎỊI̧ÍǏĬȊÎỊ̄ĻŁŇỎỐỒỔỖỘỚỜỞỠỢÓO̧ǑŎȎÔỌ̄ǾÖŐŘŚŠŤŢŮÜŰÚU̧ǓŬUÙÛỤ̄ỤỦỨỪỬỮỰÝŹŽ,
gsw: contains Swiss German diacritics and special characters, such as àäëïöüÿÀÄËÏÖÜŸ,
ch-it: contains Swiss Italian diacritics and special characters, such as àèéìòóùÀÈÉÌÒÓÙ,
x-lmk: contains Lemko diacritics and special characters, such as åäǎąćďęěįǐĺľłńňóǒöøŕřśšťųüůýźžÅÄǍĄĆĎĘĚǏĮĹĽŁŃŇÓǑÖØŔŘŚŠŤŮŲÜÝŹŽ,
ao: contains Ao language diacritics and special characters, such as ǎǐěǒǔñńňřłǍĚǑǓÑŃŇŘŁ,
gn: contains Gascon diacritics and special characters, such as âéèêëïîòôöüùûÂÉÈÊËÏÎÒÔÖÜÙÛ,
op: contains Old Prussian diacritics and special characters, such as ąčęėēģīķłńõōšūžĄČĘĖĒĢĪĶŁŃÕŌŠŪŽ,
kv: contains Kurin diacritics and special characters, such as āēīōūǣǫœġḥḳṣṭŋĀĒĪŌŪǢǪŒĠḤḲṢṬŊ,
di: contains Dagestani diacritics and special characters, such as āə̄ēīōūȳĀƏ̄ĒĪŌŪȲ,
sv: contains Swedish diacritics and special characters, such as åäöÅÄÖ,
oc: contains Occitan diacritics and special characters, such as áàâéèêíìîóòôúùûëïüçÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛËÏÜÇ,
sc: contains Sardinian diacritics and special characters, such as àèéìòùÀÈÉÌÒÙ,
co: contains Corsican diacritics and special characters, such as àèìòùÀÈÌÒÙ,
rm: contains Romansh diacritics and special characters, such as áàâǎäąéèêěëęíìîǐïįóòôǒöǫúùûǔüųçčśŝšţñņňŋåŧðłÁÀÂǍÄĄÉÈÊĚËĘÍÌÎǏÏĮÓÒÔǑÖǪÚÙÛǓÜŲÇČŚŜŠŢÑŅŇŊÅŦÐŁ,
rup: contains Aromanian diacritics and special characters, such as âăćĕëîńŏřśţŭźÂĂĆĔËÎŃŎŘŚŢŬŹ,
gd: contains Scottish Gaelic diacritics and special characters, such as àáâäéèêëìíîïòóôöùúûüýÀÁÂÄÈÊËÉÌÍÎÏÔÓÒÖÙÚÛÜÝ,
crn: contains Cornish diacritics and special characters, such as âêĵôûŵŷÂÊĴÔÛŴŶ,
liv: contains Livonian diacritics and special characters, such as áčėęė́ĩįį́į̃ĺņņ̌õóšųų̃ų̄ų̃̄ų̃̌ų̄̌ỹžÁČĖĘĖ́ĨĮĮ́Į̃ĹŅŅ̌ÕÓŠŲŲ̃Ų̄Ų̃̄Ų̃̌Ų̄̌ỸŽ,
mdf: contains Moksha diacritics and special characters, such as ĺćńśẃẁŕḿǵźj́ɗ́ɗťĹĆŃŚẂẀŔḾǴŹJ́Ɗ́ƊŤ,
fiu: contains Võro diacritics and special characters, such as äöüõšžåÄÖÜÕŠŽÅ,
kaz: contains Kazakh diacritics and special characters, such as äïöüÄÏÖÜ,
tut: contains Gagauz diacritics and special characters, such as ăâäêıîöőûüűĂÂÄÊİÎÖŐÛÜŰ,
kdr: contains Karaim diacritics and special characters, such as ėäöüńşźáéíóúĖÄÖÜŃŞŹÁÉÍÓÚ,
baq: contains Basque diacritics and special characters, such as áéíóúüñÁÉÍÓÚÜÑ,
qu: contains Quechua diacritics and special characters, such as áéíóúñÁÉÍÓÚÑ,
ay: contains Aymara diacritics and special characters, such as ąęįǫųĄĘĮǪŲ,
pt: contains Portuguese diacritics and special characters, such as áâàãçéêèíîìóôòõôúûùüñÂÁÂÀÃÇÉÊÈÍÎÌÓÔÒÕÚÙÛÜÑ,
cr: contains Creole diacritics and special characters, such as áéíóúñüÁÉÍÓÚÑÜ,
mxz: contains Mixe-Zoque diacritics and special characters, such as āäȧḁēëėẹ̄īïï̈ị̈ōöö̈ọ̈ūüü̈ụ̈ɨɨ̄ɨ̈ɨ̈̈ɨ̣ʉʉ̄ʉ̈ʉ̈̈ʉ̣ɛɛ̄ɛ̈ɛ̈̈ɛ̣ɔɔ̄ɔ̈ɔ̈̈ɔ̣,

These characters can be used to manipulate and convert texts into their respective languages.

var ogonki = map[string]string{
     //-------------------------------------------------------
      //Albański | Albanian (sq) 
        "sq": "çéëÇËÉ",
      //Baskijksi | Basque (eu)
        "eu": "áéíóúüñÁÉÍÓÚÜÑ",
      //Bretoński | Breton (br) 
        "br": "âãäæçéèêëêôöœûüùñŷýÿìïîŕśÂÄÃÆÇÉÈÊËÔÖŒÑÙÛÜŶÌÏÎŔŚÝŸ",
      //Kataloński | Catalan (ca) 
        "ca": "áàà́èéè́íìòóò́ïúüùýÁÀÀ́ÈÉÈ́ÏÍÌÒÒ́ÓÚÜÙÝ",
      //Chorwacki | Croatian (hr)
        "hr": "čćđšžČĆĐŠŽ",
     //Czeski | Czech (cz) 
        "cz":"áčďéěíňóřšťúůýžÁČĎÉĚÍŇÓŘŠŤÚŮÝŽ",
     //Duński | Danish (dk) 
        "dk":"æøåÆØÅ",
     //Niderlandzki (Holenderski) | Dutch/Netherlands (nl) 
        "nl": "äáéëïíóöüúÄÁÉËÍÏÓÖÜÚ",
     //Estoński | Estonian (et) 
        "et": "äöõüšžÄÖÕÜŠŽ",
     //Fiński | Finnish (fi) 
        "fi":"äåöÄÅÖ",
     //Francuski | French (fr)
        "fr":"àâäçéèêëîïôûùüÿœæÀÄÂÇÉÈÊËÎÏÔÛÙÜŸŒÆ",
     //Galicyjski/Galisyjski | Galician (gl)
        "gl": "áéíóúñüÁÉÍÓÚÑÜ",
     //Węgierski | Hungarian (hu)
        "hu":"áéíóöőúüűÁÉÍÓÖŐÚÜŰ",
     //Islandzki | Icelandic (is)
        "is" : "áðéíóöúýþæÁÐÉÍÓÖÚÝÞÆ",
     //Irlandzki | Irish (ga) 
        "ga": "áéíóúàèéìòóùúċḃḋḟġṁṗṡṫāēīíōūǽǿÁÉÍÓÚÀÈÍÌÒÓÙÚĊḂḊḞĠṀṖṠṪĀĒĪŌŪǼǾ",
     //Włoski | Italian (it) 
        "it":"àèéìòóùüÀÈÉÌÒÓÙÜ",
     //Łotewski | Latvian (lv)
        "lv": "āčēģīķļņšūžĀČĒĢĪĶĻŅŠŪŽ",
     //Litewski | Lithuanian (lt)
        "lt": "ąčęėįį̇šųūžĄČĘĖĮĮ̇ŠŲŪŽ",
     //Maltański/Maltyjski | Maltese/Malta (mt)
        "mt": "ċġħżĊĠĦŻ",
     //Norweski | Norwegian (no)
        "no":"áéíæøóåúýÆØÁÅÉÍÓÚÝ",
     //Polski | Polish (pl)
        "pl":"ąćęłńóśźżĄĆĘŁŃÓŚŹŻ",
     //Rumuński | Romanian (ro)
        "ro": "ăâîșțţĂÂÎȘȚŢ",
     //Słowacki | Slovakia (sk)
        "sk": "áäčďéíĺľňóôŕšťúýžÁÄČĎÉÍĹĽŇÓÔŔŠŤÚÝŽ",
     //Słoweński | Slovenian (sl) 
        "sl": "ą̊ą̇ąáčćđéęíįĺľńŕšśóúųžźĄ̊Ą̇ĄÁČĆĐÉĘÍĮĹĽŃŔŠŚÓÚŲŽŹ",
     //Hiszpański | Spanish (es)
        "es":"áéíóúüñÁÉÍÓÚÜÑ",
     //Turecki | Turkish (tr)
        "tr": "şığüöçŞİĞÜÖÇ",
     //Wietnamski | Vietnam (vi) 
        "vi": "áàãảạăắằẵẳặâấầẫẩậđéèẽẻẹêếềễểệíìĩỉịóòõỏọôốồỗổộơớờỡởợúùũủụưứừữửựýỳỹỷỵÁÀÃẢẠĂẮẰẴẲẶÂẤẦẪẨẬĐÉÈẼẺẸÊẾỀỄỂỆÍÌĨỈỊÓÒÕỎỌÔỐỒỖỔỘƠỚỜỠỞỢÚÙŨỦỤƯỨỪỮỬỰÝỲỸỶỴ",
     //Walijski | Welsh (cy) 
        "cy": "äâêëîïôûŵŷÂÄÊËÎÏÔÛŴŶ",
     //Bośniacki | Bosnia and Herzegovina (ba)
        "ba": "ćčđšžıĆČĐŠŽ",
     //Grecki | Greek (el)
        "el":"άαβγδεέζηθικλμνξοπρστήίϊΐυύΰϋόφχψωώΆΓΔΘΛΈΞΠΉΊΪΣΌΦΨΏΩΎΫ",
     //Serbski | Serbian (sr)
        "sr": "čćđšžüČĆĐŠŽÜ", 
     //Łaciński | Latin (la)
        "la": "áąāéęēīōūȳǖǘǚǜċćġħǰŝźćģķļłņńŗśźżǎěǐǒóúǔǖǘǚǜǧğıíįķļʼnǫřśŝţťūǔůẏýžæœÁĄĀĒÉĘĪŌÓŪȲǕǗǙǛĊĆĠĦŜŹĆĢĶĻŁŅŃŖŚŹŻǍĚÍǏǑǓǕǗǙǛǦĞİĮĶĻŊǪŘŚŜŢŤÚŪǓŮẎÝŽÆŒ",
     //Kurdyjski | Kurdish (ku)
        "ku": "âáçéêğîíışţôûýÁÂÇÉÊĞÎÍŞŢÔÛÝ",
     //Farerski | Faroese (fo)
        "fo": "áðéíóúýæøÁÐÉÍÓÚÝÆǾØ",
     //Fryzyjski | Frisian (fy)
        "fy": "áäâêéëïîíôóöøûúüÿýÁÄÂÊÉËÏÎÍÖÓÔØÚÜÛÝŸ",
     //Górnołużycki | Upper Sorbian (hsb)
        "hsb": "ǎáćčďđěéęȟíj́ḱĺłḿńʼnóöőǫǒřŕśšŭůw̌ŵźžӯǍÁĆČĎĐĚÉȞÍJ̌J́ḰĹŁḾŃŊÓǪǑӦŐǓŬŮŔŘŚŠŴW̌ӮȲŹŽ",
     //Kaszubski | Kashubian (csb)
        "csb": "ąâãćêěëęèéé̄łïńôòóó̄ò́ò̂ôśś́üùûżźĄÂÃĆÈÊĚËĘÉÉ̄ŁÏŃÔÒÒ́Ò̂ÓÓ̄ÔÜÙÛŚŚ́ŻŹ",
     //Kosowski | Kosovan (sq)
        "sq": "çëë̈f̈g̈šïx̌ÿžÇËË̈F̈G̈ŠÏŽX̌ŸŽ", 
     //Kornijski | Cornish (kw)
        "kw": "āâēêëôōîīïûūŵẃŷȳÿǣǽĀÂĒÊËÔŌÎĪÏÛŪŴẂŸȲŶǢǼ",
     //Krymskotatarski | Crimean Tatar (crh) 
        "crh": "âäƀçèéëğıìíñöşûüÂÄÇÈÉËĞÌÍÑÖŞÛÜ",
     //Luksemburski | Luxembourgish (lb)
        "lb": "âäéëîïôöûüŷÂÄÉËÎÏÔÖÛÜŶ",
     //Mołdawski    | Moldovan (md)
        "md": "ăâîșşțţĂÂÎȘŞȚŢ", 
     //Romski/Cygański | Romani (rom) 
        "rom": "ạảấầẩẫậắằẳẵặáa̧ǎăȃâãǣćčđȩéěĕȇêẹ̄ẻếềểễệêëîỉịíi̧ǐĭȋîị̄ļłňỏốồổỗộớờởỡợóo̧ǒŏȏôọ̄ǿöőřśšţťüůűùúu̧ǔŭȗûụ̄ụủứừửữựýźžẠÃẢẤẦÁẨẪẬẮẰẲẴẶA̧ǍĂȂÂÃǢĆČĐÉȨĚĔÊẸ̄ẺẾỀỂỄỆĚËỈÎỊI̧ÍǏĬȊÎỊ̄ĻŁŇỎỐỒỔỖỘỚỜỞỠỢÓO̧ǑŎȎÔỌ̄ǾÖŐŘŚŠŤŢŮÜŰÚU̧ǓŬUÙÛỤ̄ỤỦỨỪỬỮỰÝŹŽ",
     //Szwajcarski niemiecki | Swiss German (gsw)
        "gsw": "àäëïöüÿÀÄËÏÖÜŸ",
     //Szwajcarski włoski | Swiss Italian (ch-it) 
        "ch-it": "àèéìòóùÀÈÉÌÒÓÙ",
     //Łemkowski | Lemko (x-lmk) 
        "x-lmk": "åäǎąćďęěįǐĺľłńňóǒöøŕřśšťųüůýźžÅÄǍĄĆĎĘĚǏĮĹĽŁŃŇÓǑÖØŔŘŚŠŤŮŲÜÝŹŽ",    
     //Ao (języka Naga) | Ao language (ao)
        "ao": "ǎǐěǒǔñńňřłǍĚǑǓÑŃŇŘŁ",
     //Gaskoński | Gascon (gn)
        "gn" : "âéèêëïîòôöüùûÂÉÈÊËÏÎÒÔÖÜÙÛ",
     //Staropruski | Old Prussian (op)
        "op": "ąčęėēģīķłńõōšūžĄČĘĖĒĢĪĶŁŃÕŌŠŪŽ",
     //Kuriński | Kurin (kv) 
        "kv": "āēīōūǣǫœġḥḳṣṭŋĀĒĪŌŪǢǪŒĠḤḲṢṬŊ",
     //Degestański | Dagestani (di)
        "di": "āə̄ēīōūȳĀƏ̄ĒĪŌŪȲ",
     //Szwedzki | Swedish (sv)
        "sv": "åäöÅÄÖ",
     //Prowansalski | Occitan (oc)
        "oc": "áàâéèêíìîóòôúùûëïüçÁÀÂÉÈÊÍÌÎÓÒÔÚÙÛËÏÜÇ",
     //Sardyński | Sardinian (sc)
        "sc": "àèéìòùÀÈÉÌÒÙ",
     //Korsykański | Corsican (co)
        "co": "àèìòùÀÈÌÒÙ",
     //Retoromański | Romansh (rm)
        "rm": "áàâǎäąéèêěëęíìîǐïįóòôǒöǫúùûǔüųçčśŝšţñņňŋåŧðłÁÀÂǍÄĄÉÈÊĚËĘÍÌÎǏÏĮÓÒÔǑÖǪÚÙÛǓÜŲÇČŚŜŠŢÑŅŇŊÅŦÐŁ",
     //Arumuński | Aromanian (rup)
        "rup": "âăćĕëîńŏřśţŭźÂĂĆĔËÎŃŎŘŚŢŬŹ",
     //Szkocki | Scottish Gaelic (gd)
        "gd": "àáâäéèêëìíîïòóôöùúûüýÀÁÂÄÈÊËÉÌÍÎÏÔÓÒÖÙÚÛÜÝ",
     //Kornwalijski | Cornish (crn)
        "crn": "âêĵôûŵŷÂÊĴÔÛŴŶ",
     //Liwski | Livonian (liv)
        "liv": "áčėęė́ĩįį́į̃ĺņņ̌õóšųų̃ų̄ų̃̄ų̃̌ų̄̌ỹžÁČĖĘĖ́ĨĮĮ́Į̃ĹŅŅ̌ÕÓŠŲŲ̃Ų̄Ų̃̄Ų̃̌Ų̄̌ỸŽ",
     //Mordwiński | Moksha (mdf)
        "mdf": "ĺćńśẃẁŕḿǵźj́ɗ́ɗťĹĆŃŚẂẀŔḾǴŹJ́Ɗ́ƊŤ",
     //Woro | Võro (fiu)
        "fiu": "äöüõšžåÄÖÜÕŠŽÅ",
     //Kazachski | Kazakh (kaz)
        "kaz": "äïöüÄÏÖÜ",
     //Gagauski | Gagauz (tut)
        "tut": "ăâäêıîöőûüűĂÂÄÊİÎÖŐÛÜŰ",
     //Karaimski | Karaim (kdr)
        "kdr": "ėäöüńşźáéíóúĖÄÖÜŃŞŹÁÉÍÓÚ",
     //Baskijski | Basque (baq)
        "baq": "áéíóúüñÁÉÍÓÚÜÑ",
     //Keczucki | Quechua (qu)
        "qu": "áéíóúñÁÉÍÓÚÑ",
     //Ajmarski | Aymara (ay)
        "ay": "ąęįǫųĄĘĮǪŲ",
     //Portugalski | Portuguese (pt)
        "pt": "áâàãçéêèíîìóôòõôúûùüñÂÁÂÀÃÇÉÊÈÍÎÌÓÔÒÕÚÙÛÜÑ",
     //Kreolski | Creole (cr)
        "cr": "áéíóúñüÁÉÍÓÚÑÜ",
     //Mixe-Zoque (mxz)
        "mxz": "āäȧḁēëėẹ̄īïï̈ị̈ōöö̈ọ̈ūüü̈ụ̈ɨɨ̄ɨ̈ɨ̈̈ɨ̣ʉʉ̄ʉ̈ʉ̈̈ʉ̣ɛɛ̄ɛ̈ɛ̈̈ɛ̣ɔɔ̄ɔ̈ɔ̈̈ɔ̣",   
     //-----------------------------------------------------
}
Enter fullscreen mode Exit fullscreen mode

MAIN FEATURE

func main() {
...
}
Enter fullscreen mode Exit fullscreen mode

This code snippet defines the main function, which is the entry point of the program.

The main function uses the flag package to handle command line arguments.

The variable flagHelp is defined as a pointer to bool and represents the -help flag. By default it is set to false.

flagHelp := flag.Bool("help", false, "View Help")
Enter fullscreen mode Exit fullscreen mode

The variable flagLang is defined as a pointer to a string and represents the -lang flag. By default, it is set to pl.

flagLang := flag.String("lang", "pl", "The language of diacritic marks: \n (ao, ay, ba, baq, br, ch-it, co, cr, crh, crn, csb, cy, cz, di, dk, el, es, et, eu, fi, fiu, fo, fr, fy, ga, gd, gl, gn, gsw, hr, hsb, hu, is, it, kaz, kv, ku, kw, la, lb, liv, lt, lv, mdf, md, mxz, nl, no, oc, op, pl, pt, qu, rm, rom, rup, sc, sk, sl, sq, sr, sv, tr, tut, vi, x-lmk)")
Enter fullscreen mode Exit fullscreen mode

The function flag.Parse() is called to parse command-line arguments and assign values to flag variables.

flag.Parse()
Enter fullscreen mode Exit fullscreen mode

It then checks whether the -help flag has been set to true.
If so, the printHelp() function is called (which is not shown in this code snippet), and the program exits with a return statement. Otherwise, the program continues executing the code.

    if *flagHelp {
        printHelp()
        return
    }
Enter fullscreen mode Exit fullscreen mode

The flag.Args() function returns a list of command-line arguments not identified as flags.

inputFiles := flag.Args()
Enter fullscreen mode Exit fullscreen mode

In this case, the result of the flag.Args() function is assigned to the inputFiles variable.
The inputFiles variable is therefore a list of non-flag command-line arguments. This code snippet is useful when the program is waiting for files passed as command-line arguments to be processed.

Let's move on...

Let's check if the inputFiles list is empty, i.e., if no file path is given as a command line argument.

    if len(inputFiles) == 0 {
fmt.Println("OGONKI => Error: No file path specified. Displaying help: OGONKI -help")
        return
    }
Enter fullscreen mode Exit fullscreen mode

If the length of the inputFiles list is 0, then no file path was specified. In this case, the program displays an error message using the fmt.Println() function, informing the user that the path to the file has not been specified. In addition, the program suggests displaying help with the OGONKI -help command.

Then, the program exits with a return statement to avoid further code execution when no file path is specified.

FILE EXTENSIONS

    for _, inputFile := range inputFiles {
        fileExt := strings.ToLower(filepath.Ext(inputFile))
            switch fileExt {
            case ".ads", ".adb", ".as", ".asm", ".asp", ".aspx", ".au3", ".avs", ".avsi", ".awk", ".bash" ,".bash_profile" ,".bashrc", ".bat" ,".bb" ,".bi" ,".c" ,".cba" ,".cbf", ".cbh" ,".cfg", ".cbl", ".cd", ".cl" ,".cln", ".cmd", ".cob", ".copy", ".cpy", ".cs", ".csd", ".csh", ".ctg", ".csv", ".cw", ".cxx" ,".d" ,".diff" ,".em", ".epd" ,".erl", ".f", ".f2k", ".f23", ".f77", ".f90", ".f95", ".fen" ,".for", ".forth" ,".gd", ".git", ".gitconfig", ".go" ,".groovy", ".gui", ".h", ".hh", ".hrl", ".hcl", ".hws", ".html", ".hta", ".hex", ".hs",".inf", ".info", ".ini" ,".ino", ".iss", ".java" ,".js", ".jsm", ".json5", ".jsonc", ".jsp", ".kix" ,".kml" ,".kt", ".las", ".lhs", ".lisp", ".log", ".lst", ".lua", ".lpr" ,".lsp", ".mak", ".m", ".matlab" ,".md", ".mib", ".ml" ,".mli", ".mm" ,".mms" ,".mot", ".mxml", ".nfo", ".nim", ".nsi" ,".nsh", ".nt", ".nosql", ".orc", ".osx" ,".out", ".pack" , ".pas", ".pb", ".p", ".php", ".php3" ,".php4" ,".php5", ".phps", ".phpt", ".phtml" ,".plx" , ".pl", ".pm", ".pp" , ".properties" ,".ps1", ".psd1", ".psm1", ".ps" ,".pgn", ".py" ,".pyw",  ".r" ,".r2", ".r3", ".raku", ".rb", ".rbw", ".reg" ,".reb", ".rs" ,".rust", ".s", ".scm" ,".smd", ".shell", ".sh", ".si4", ".sml" ,".splus", ".srt", ".sql" ,".sqlite", ".src", ".srec" ,".ss" ,".stp", ".st", ".sty", ".svg", ".swift", ".shtm" ,".shtml", ".t2t", ".tab" , ".tcl", ".tek", ".tex" ,".thy", ".tsq", ".ts" ,".tsx", ".txt", ".url", ".vb" ,".vba" ,".vbs" ,".v", ".vala" ,".vhdl", ".vh", ".vhd" ,".wer", ".xhtml", ".xht" ,".xml" ,".xsd", ".xsl", ".xslt", ".xul" ,".yaml", ".yml":
            content, err := ioutil.ReadFile(inputFile)
            if err != nil {
                fmt.Printf("Error loading file %s: %s\n", inputFile, err)
        continue
        }
Enter fullscreen mode Exit fullscreen mode

We use a for loop to iterate over the elements of the inputFiles list that contain file paths passed as command line arguments.

For each file path, the function strings.ToLower(filepath.Ext(inputFile)) is called, which returns the file extension in lowercase.
The result is assigned to the variable fileExt.

Then, in the switch block, the file extension is checked using the case statement. In case the file extension is one of the listed (e.g., ".ads", ".adb", ".as", ".asm", ".asp", ".aspx", ".au3", ".avs", ".avsi", ".awk", ".bash" ,".bash_profile" ,".bashrc", ".bat" ,".bb" ,".bi" ,".c" ,".cba" ,".cbf", ".cbh" ,".cfg", ".cbl", ".cd", ".cl" ,".cln", ".cmd", ".cob", ".copy", ".cpy", ".cs", ".csd", ".csh", ".ctg", ".csv", ".cw", ".cxx" ,".d" ,".diff" ,".em", ".epd" ,".erl", ".f", ".f2k", ".f23", ".f77", ".f90", ".f95", ".fen" ,".for", ".forth" ,".gd", ".git", ".gitconfig", ".go" ,".groovy", ".gui", ".h", ".hh", ".hrl", ".hcl", ".hws", ".html", ".hta", ".hex", ".hs",".inf", ".info", ".ini" ,".ino", ".iss", ".java" ,".js", ".jsm", ".json5", ".jsonc", ".jsp", ".kix" ,".kml" ,".kt", ".las", ".lhs", ".lisp", ".log", ".lst", ".lua", ".lpr" ,".lsp", ".mak", ".m", ".matlab" ,".md", ".mib", ".ml" ,".mli", ".mm" ,".mms" ,".mot", ".mxml", ".nfo", ".nim", ".nsi" ,".nsh", ".nt", ".nosql", ".orc", ".osx" ,".out", ".pack" , ".pas", ".pb", ".p", ".php", ".php3" ,".php4" ,".php5", ".phps", ".phpt", ".phtml" ,".plx" , ".pl", ".pm", ".pp" , ".properties" ,".ps1", ".psd1", ".psm1", ".ps" ,".pgn", ".py" ,".pyw", ".r" ,".r2", ".r3", ".raku", ".rb", ".rbw", ".reg" ,".reb", ".rs" ,".rust", ".s", ".scm" ,".smd", ".shell", ".sh", ".si4", ".sml" ,".splus", ".srt", ".sql" ,".sqlite", ".src", ".srec" ,".ss" ,".stp", ".st", ".sty", ".svg", ".swift", ".shtm" ,".shtml", ".t2t", ".tab" , ".tcl", ".tek", ".tex" ,".thy", ".tsq", ".ts" ,".tsx", ".txt", ".url", ".vb" ,".vba" ,".vbs" ,".v", ".vala" ,".vhdl", ".vh", ".vhd" ,".wer", ".xhtml", ".xht" ,".xml" ,".xsd", ".xsl", ".xslt", ".xul" ,".yaml", ".yml"), the code inside the case block is executed.

Inside the case block, the ioutil.ReadFile(inputFile) function is called to read the contents of the file. The result is assigned to the content and err variables. If there was an error while reading the file, an error message is displayed using the fmt.Printf() function.

When an error is displayed, the continue statement proceeds to the next iteration of the for loop, skipping the rest of the code inside the loop for that file.

This code snippet allows you to load the contents of files with the extensions listed in the case block and handle errors while loading these files.

We will use the replaceOgonki function to replace ogonki and special characters in the text.

newContent := replaceOgonki(string(content), *flagLang)
Enter fullscreen mode Exit fullscreen mode

The string(content) function converts the contents of a file (stored in the ontent variable) into a string.

Then, the replaceOgonki function is called with two arguments: the converted file content and the value of the -lang flag (passed as a pointer to the *flagLang variable).

The result returned by the replaceOgonki function is assigned to the newContent variable. Probably, the replaceTags function performs the replacement of diacritics and special characters with their standard equivalents in a given language, according to the value of the -lang flag.

MODYFIED

We create the name of the output file based on the name of the input file.

It uses the function strings.TrimSuffix(inputFile, .txt), which removes the .txt suffix from the name of the input file (contained in the inputFile variable). Then, the _modified.txt suffix is added to the output of this function to create the name of the output file.

For example, if the input file name is example.txt, the resulting output file will be example_modified.txt.
This code snippet is useful when we want to create an output file with modified content based on the name of the input file.

outputFile := strings.TrimSuffix(inputFile, fileExt) + "_modified" + fileExt
Enter fullscreen mode Exit fullscreen mode

Saving the content of the processed to the output file.

        err = ioutil.WriteFile(outputFile, []byte(newContent), 0644)
        if err != nil {
            fmt.Printf("Error for adding support access paths %s: %s\n", outputFile, err)
            continue
        }
Enter fullscreen mode Exit fullscreen mode

The function ioutil.WriteFile(outputFile, []byte(newContent), 0644) is called to write the new content (stored in the newContent variable) to a file named outputFile.

The arguments of the WriteFile function are:

outputFile - name of the output file to which the content is to be written.

[]byte(newContent) - content to be saved as a byte array. To convert text to a byte array, the []byte() construct is used.

0644 - file access flags that define permissions (in this case, 0644 means that the owner of the file has read and write permissions, and other users have read only permissions).

Then, the code checks for an error while writing the file using the condition if err != nil. If an error occurred, fmt.Printf() displays an error message using the outputFile and err values.

continue goes to the next iteration of the loop, skipping further instructions in the current iteration.

We now display a message saying that the diacritics swap on the file was successful and the name of the new file that has been saved. fmt.Printf("Swap diacritics successful. New file saved as %s.\n, outputFile")

fmt.Printf() is used for formatted display on the console.
In this case, %s is where the value of outputFile will be inserted.
A message is displayed on the console to indicate that the diacritics swap was successful and the name of the new file.
Otherwise, if the file format is not supported, it displays an appropriate message on the console:
default: fmt.Printf("Unsupported file format: %s\n", fileExt).

default is a label that is used in the switch statement to handle all cases that do not match any of the specified cases.

fmt.Printf() is used to display an unsupported file format message. The value of the fileExt variable is used, which specifies the file extension. The message is displayed on the console and informs users about the unsupported file format.

fmt.Printf("Diacritics swap completed successfully. New file saved as %s.\n", outputFile)
default:
fmt.Printf("Unsupported file format: %s\n", fileExt) 
Enter fullscreen mode Exit fullscreen mode

Now let's define the function replaceOgonki, which takes two arguments: text (string type) and lang (string type). The function is designed to replace diacritics with their equivalents without diacritics in the given text.

func replaceOgonki(text string, lang string) string {
    ogonkiLetters, ok := ogonki[lang]
    if !ok {
        fmt.Printf("Error: Unknown diacritics language %s.\n", lang)
        return text
    }
Enter fullscreen mode Exit fullscreen mode

Declares the variable ogonkiLetters and the variable ok used to check whether the map ogonki (possibly defined elsewhere in the code) has a value for the given language lang. ogonkiLetters, ok := ogonki[lang].

ogonki[lang] means that the function tries to retrieve a value appropriate for the given language from the diacritics map. If the value exists, it is assigned to the variable ogonkiLetters, and the variable ok is set to true. If the value does not exist, the ok variable is set to false. It checks if the value of the variable ok is equal to false, which means that the given language is not supported. if !ok { fmt.Printf("Error: Unknown diacritics language %s.\n", lang) return text }

!ok checks whether the value of the variable ok is equal to false.
If the condition is met, it means that the specified diacritic language is not supported.
In this case, the function displays an error message on the console stating that the diacritics language is unknown.
Then, the function returns the original text (text) without performing diacritic swapping.
If the specified language is supported, the function continues with further text operations, replacing diacritics with their equivalents without diacritics..

We create a for loop and operations to replace diacritics with their equivalents in the text.

for _, ogonkiLetter := range ogonkiLetters {
   replacement := string(ogonkiLetter)
   text = strings.ReplaceAll(text, replacement, 
   getReplacement(replacement))
}
return text
Enter fullscreen mode Exit fullscreen mode

The for loop performs operations on each element (diacritics letter) in the ogonkiLetters variable. for _, ogonkiLetter := range ogonkiLetters { replacement := string(ogonkiLetter) text = strings.ReplaceAll(text, replacement, getReplacement(replacement)) }.

range ogonkiLetters means that the loop should iterate over each element in the variable ogonkiLetters. _ means we don't need the index of the item in this loop. ogonkiLetter is a variable where we store the current item iterated in each iteration of the loop. Inside the loop, for each diacritic letter, it creates a replacement variable that stores the diacritic letter as a string. replacement:= string(Letterogonki).

string(ogonkiLetter) converts the given tail letter to its representation as a string. Then, using the strings.ReplaceAll() function, it replaces all occurrences of the diacritic letter with its non-tailed equivalent in the text. text = strings.ReplaceAll(text, replacement, getReplacement(replacement)).

strings.ReplaceAll(text, replacement, getReplacement(replacement)) replaces all occurrences of the letter replacement (i.e. the diacritic letter) in the text text with the value returned by the getReplacement(replacement) function.

getReplacement(replacement) is a function that returns the equivalent of a diacritic letter without the tail. After the loop ends, it returns the modified text (diacritics replaced with their equivalents without diacritics). return text.

CHANGING LETTERS

The getReplacement function accepts a single argument, a letter of type string and returns a string. It substitutes the Latin equivalents of some letters.
The switch statement checks the value of the letter argument and returns the appropriate letter from the Latin alphabet within the function.
If the value of the letter parameter matches any of the conditions, the corresponding letter is returned. If no cases match, the original letter is returned. This code includes support for alphabets such as Polish, Romani, Latin, French, Spanish, Italian, Czech, Hungarian, Swedish, Danish, Norwegian, and many others.

func getReplacement(letter string) string {replacements := map[string]string{   
    "ą": "a", "à": "a", "á": "a", "ã": "a", "ả": "a", "ạ": "a", "ắ": "a", "ằ": "a", "ẵ": "a", "ẳ": "a", "ặ": "a", "â": "a", "ấ": "a", "ầ": "a", "ẫ": "a", "ẩ": "a", "ậ": "a", "ä": "a", "å": "a", "α": "a", "ă": "a", "ā": "a", "ǎ": "a", "a": "a", "a̱": "a", "â": "a", "ä̂": "a", "ạ": "a", "à́": "a", "a̧": "a", "ȃ": "a", "ḁ": "a", "ą̇": "a", "Ą": "A", "À": "A", "Á": "A", "Ã": "A", "Ả": "A", "Ạ": "A", "Ắ": "A", "Ằ": "A", "Ẵ": "A", "Ẳ": "A", "Ặ": "A", "Â": "A", "Ấ": "A", "Ầ": "A", "Ẫ": "A", "Ẩ": "A", "Ậ": "A", "Ä": "A", "Å": "A", "Ă": "A", "Ā": "A", "Ǎ": "A", "Ạ": "A", "A̱": "A", "Ä": "A", "Â": "A", "Ä̂": "A", "Ậ": "A", "À́": "A", "A̧": "A", "Ȃ": "A", "Ą̇": "A", "ƀ": "b", "β": "b", "ḃ": "b", "ß": "B", "Ḃ": "ẞ", "č": "c", "ć": "c", "ç": "c", "ċ": "c", "ĉ": "c", "Č": "C", "Ć": "C", "Ç": "C", "Ĉ": "C", "Ċ": "C", "ð": "d", "đ": "d", "ď": "d", "δ": "d", "ɗ́": "d", "ɗ": "d", "ḋ": "d", "Ď": "D", "Đ": "D", "Δ": "D", "Ɗ́": "D", "Ɗ": "D", "Ḋ": "D", "ė": "e", "ę": "e", "è": "e", "é": "e", "ê": "e", "ë": "e", "ẽ": "e", "ẻ": "e", "ẹ": "e", "ε": "e", "ē": "e", "ě": "e", "ẹ": "e", "e̱": "e", "ë̂": "e", "ệ": "e", "é̄": "e", "è́": "e", "ë̈": "e", "ȩ": "e", "ĕ": "e", "ȇ": "e", "ẹ̄": "e", "ế": "e", "ề": "e", "ể": "e", "ễ": "e", "ệ": "e", "ė́": "e", "̣ɛ": "e", "ɛ̄": "e", "ɛ̈": "e", "ɛ̈̈": "e", "̈̈̈̈ɛ": "e", "Ė": "E", "Ę": "E", "Ē": "E", "È": "E", "É": "E", "Ë": "E", "Ẽ": "E", "Ẻ": "E", "Ẹ": "E", "Ě": "E", "E̱": "E", "Ë̂": "E", "Ệ": "E", "É̄": "E", "È́": "E", "Ë̈": "E", "Ȩ": "E", "Ĕ": "E", "Ê": "E", "Ẹ̄": "E", "Ế": "E", "Ề": "E", "Ể": "E", "Ễ": "E", "Ệ": "E", "Ė́": "E", "f̈": "f", "φ": "f", "ḟ": "f", "F̈": "F", "Φ": "F", "Ḟ": "F", "ģ": "g", "ġ": "g", "γ": "g", "ğ": "g", "g̈": "g", "Ģ": "G", "Ġ": "G", "Γ": "G", "Ğ": "G", "Ǧ": "G", "G̈": "G", "ȟ": "h", "ħ": "h", "ḥ": "h", "Ȟ": "H", "Ħ": "H", "Ḥ": "H", "į": "i", "i̇": "i", "ì": "i", "í": "i", "î": "i", "î": "i", "ï": "i", "ĩ": "i", "ỉ": "i", "ị": "i", "ī": "i", "ι": "i", "ı": "i", "ǐ": "i", "i̱": "i", "ï̂": "i", "i̧": "i", "ĭ": "i", "ȋ": "i", "ị̄": "i", "į́": "i", "į̃": "i", "ï̈": "i", "ị̈": "i", "ɨ": "i", "ɨ̄": "i", "ɨ̈": "i", "į̇": "i", "Į": "I", "İ": "I", "Ì": "I", "Í": "I", "Î": "I", "Ï": "I", "Ĩ": "I", "Ỉ": "I", "Ị": "I", "Ǐ": "I", "I̱": "I", "Ï̂": "I", "I̧": "I", "Ĭ": "I", "Ȋ": "I", "Ī": "I", "Į́": "I", "Į̣̃": "I", "Į̇": "I", "ǰ": "j", "ь": "j", "ĵ": "j", "j́": "j", "Ь": "J", "Ĵ": "J", "J́": "J", "κ": "k", "ķ": "k", "ḳ": "k", "Ķ": "K", "Ḳ": "K", "ĺ": "l", "ľ": "l", "ļ": "l", "ł": "l", "λ": "l", "Ĺ": "L", "Ľ": "L", "Ļ": "L", "Ł": "L", "Λ": "L", "ḿ": "m", "μ": "m", "ṁ": "m", "Ḿ": "M", "Ṁ": "M", "ń": "n", "ñ": "n", "ň": "n", "ņ": "n", "ʼn": "n", "ņ̌": "n", "Ń": "N", "Ñ": "N", "Ň": "N", "Ņ": "N", "Ŋ": "N", "Ņ̌": "N", "õ": "o", "ø": "o", "ó": "o", "ô": "o", "ò": "o", "ö": "o", "ő": "o", "ō": "o", "ω": "o", "ȯ": "o", "ȱ": "o", "ỏ": "o", "ọ": "o", "ố": "o", "ồ": "o", "ỗ": "o", "ổ": "o", "ộ": "o", "ơ": "o", "ớ": "o", "ờ": "o", "ỡ": "o", "ở": "o", "ợ": "o", "ǒ": "o", "ǫ": "o", "o̱": "o", "ö̂": "o", "ó̄": "o", "ò́": "o", "ò̂": "o", "ŏ": "o", "ȏ": "o", "ọ̄": "o", "ö̈": "o", "̣ɔ": "o", "ɔ̄": "o", "ɔ̈": "o", "ɔ̈̈": "o", "ɔ̣̣": "o", "Õ": "O", "Ø": "O", "Ó": "O", "Ô": "O", "Ò": "O", "Ö": "O", "Ő": "O", "Ō": "O", "Ω": "O", "Ȯ": "O", "Ȱ": "O", "Ỏ": "O", "Ọ": "O", "Ố": "O", "Ồ": "O", "Ỗ": "O", "Ổ": "O", "Ộ": "O", "Ơ": "O", "Ớ": "O", "Ờ": "O", "Ỡ": "O", "Ở": "O", "Ợ": "O", "Ǫ": "O", "Ǒ": "O", "O̱": "O", "Ö̂": "O", "Ó̄": "O", "Ò́": "O", "Ò̂": "O", "Ŏ": "O", "Ȏ": "O", "Ọ̄": "O", "π": "p", "ṗ": "p", "Π": "P", "Ṗ": "P", "ř": "r", "ŕ": "r", "ŗ": "r", "σ": "r", "Ř": "R", "Ŕ": "R", "Ŗ": "R", "Σ": "R", "š": "s", "ś": "s", "ş": "s", "ș": "s", "τ": "s", "ŝ": "s", "ś́": "s", "ṣ": "s", "ṡ": "s", "Š": "S", "Ś": "S", "Ş": "S", "Ș": "S", "Ŝ": "S", "Ś́": "S", "Ṣ": "S", "Ṡ": "S", "ṭ": "t", "ť": "t", "ţ": "t", "ț": "t", "ŧ": "t", "ṫ": "t", "Ť": "T", "Ţ": "T", "Ț": "T", "Ṭ": "T", "Ŧ": "T", "Ṫ": "T", "ü": "u", "ú": "u", "ů": "u", "û": "u", "ù": "u", "ũ": "u", "ų": "u", "ū": "u", "ű": "u", "υ": "u", "ǘ": "u", "ǚ": "u", "ǜ": "u", "ǔ": "u", "ŭ": "u", "ǖ": "u", "ụ": "u", "u̱": "u", "ü̂": "u", "ъ": "u", "u̧": "u", "ȗ": "u", "ụ̄": "u", "ụ": "u", "ứ": "u", "ừ": "u", "ử": "u", "ữ": "u", "ự": "u", "ų̃": "u", "̄ų̃̌": "u", "ų̄": "u", "ü̈": "u", "ụ̈": "u", "̣ʉ": "u", "ʉ̄": "u", "ʉ̈": "u", "ʉ̈̈": "u", "ʉ": "u", "Ü": "U", "Ú": "U", "Ů": "U", "Û": "U", "Ù": "U", "Ũ": "U", "Ų": "U", "Ū": "U", "Ű": "U", "Ǖ": "U", "Ǘ": "U", "Ǚ": "U", "Ǜ": "U", "Ǔ": "U", "Ụ": "U", "U̱": "U", "Ŭ": "U", "Ü̂": "U", "Ъ": "U", "U̧": "U", "Ụ̄": "U", "Ụ": "U", "Ứ": "U", "Ừ": "U", "Ử": "U", "Ữ": "U", "Ų̃": "U", "Ų̄": "U", "Ų̃̄": "U", "Ų̃̌": "U", "Ų̄̌": "U", "w̌": "w", "ŵ": "w", "ẃ": "w", "ẁ": "w", "W̌": "W", "Ŵ": "W", "Ẃ": "W", "Ẁ": "W", "x̌": "x", "ξ": "x", "X̌": "X", "Ξ": "X", "ý": "y", "ỳ": "y", "ỹ": "y", "ỷ": "y", "ỵ": "y", "ŷ": "y", "ȳ": "y", "ẏ": "y", "ÿ": "y", "Ӯ": "Y", "Ý": "Y", "Ỳ": "Y", "Ỹ": "Y", "Ỷ": "Y", "Ỵ": "Y", "Ŷ": "Y", "Ȳ": "Y", "Ẏ": "Y", "Ÿ": "Y", "ż": "z", "ź": "z", "ž": "z", "ζ": "z", "Ż": "Z", "Ź": "Z", "Ž": "Z", "Θ": "TH", "Þ": "TH", "Ψ": "PS", "η": "ee", "θ": "th", "þ": "th", "χ": "ch", "ψ": "ps", "ѓ": "gj", "ќ": "kj", "љ": "lj", "њ": "nj", "џ": "dz", "ǿ": "OE", "Ǿ": "OE", "æ": "ae", "ǣ": "ae", "ǽ": "ae", "Æ": "AE", "Ǣ": "AE", "Ǽ": "AE", "œ": "oe", "Œ": "OE", "ə̄": "ə", "Ə̄": "Ə",
    }
        replacement, ok := replacements[letter]
        if ok {
            return replacement
    }
    return letter
}
Enter fullscreen mode Exit fullscreen mode

HELP

The printHelp() function, which is responsible for displaying the program's help, is defined in this code snippet.

The flag is called once the function is finished. The printDefaults() function outputs the default settings of the program's flags.

func printHelp() {
    fmt.Println()                                                                
    fmt.Println("     _/_/      _/_/_/    _/_/    _/      _/  _/    _/  _/_/_/   ")
    time.Sleep(300 * time.Millisecond)  
    fmt.Println("  _/    _/  _/        _/    _/  _/_/    _/  _/  _/      _/     ")
    time.Sleep(300 * time.Millisecond)
    fmt.Println(" _/    _/  _/  _/_/  _/    _/  _/  _/  _/  _/_/        _/     ")
    time.Sleep(300 * time.Millisecond)
    fmt.Println("_/    _/  _/    _/  _/    _/  _/    _/_/  _/  _/      _/     ")
    time.Sleep(300 * time.Millisecond)
    fmt.Println(" _/_/      _/_/_/    _/_/    _/      _/  _/    _/  _/_/_/   ")
    time.Sleep(300 * time.Millisecond)
    fmt.Println()
    fmt.Println("OGONKI v1.0 (c) by Łukasz Wójcik 2023")
    fmt.Println("Program for converting diacritics in unformatted text files.")
    fmt.Println()
    fmt.Println("     Use: ogonki [-help] [-lang flag] [file1.txt file2.txt ...]")
    fmt.Println()
    fmt.Println("    Flag: [ao]  [ay] [ba] [baq] [br] [ch-it] [co] [cr] [crh] [crn]")
    fmt.Println("          [csb] [cy] [cz] [di]  [dk] [el] [es] [et] [eu] [fi]")
    fmt.Println("          [fo]  [fr] [fy] [ga]  [gd] [gl] [gn] [gsw] [hr] [hsb]")
    fmt.Println("          [hu]  [is] [it] [kaz] [kv] [ku] [kw] [la] [lb] [liv]")
    fmt.Println("          [lt]  [lv] [mdf] [md] [mxz] [nl] [no] [oc] [op] [pl]")
    fmt.Println("          [pt]  [qu] [rm] [rom] [rup] [sc] [sk] [sl] [sq] [sr]")
    fmt.Println("          [sv]  [tr] [tut] [vi] [x-lmk] [fiu]")
    fmt.Println()
    fmt.Println(" Example: ogonki -lang fr file.txt")
    flag.PrintDefaults()
    fmt.Println()
    fmt.Println("   Site := [https://github.com/lukaszwojcikdev/ogonki.git]")
    fmt.Println("License := [MIT]")
}
Enter fullscreen mode Exit fullscreen mode

ADVANTAGES

The application for substituting diacritical markings in unformatted UTF-8 text files can be helpful in various circumstances.

Here are a few instances:

  1. Data preparation for text analysis:
    This tool can assist you in removing or replacing diacritical marks from large text files with the appropriate characters.
    It will make it easier to compare words and analyze the text.

  2. Database data processing:
    Diacritical marks can sometimes cause issues when importing data into a database. By replacing diacritics with their clean equivalents using this application, you can speed up the import process.

  3. Data preparation for indexing:
    If you are using a text search engine or index, this tool can help you remove diacritics to ensure the consistency and accuracy of your search results.

APPLICATION

The program can be used in various places and situations.

For example:

  1. When creating scripts or automatic text processing tools.
    The program can be used as part of a larger tool that performs various operations on text files, such as removing diacritics before text analysis.

  2. When developing web applications or websites.
    The program can be used to remove or replace diacritics in text entered by users to ensure data consistency and avoid potential problems with unsupported diacritics.

  3. When processing large sets of textual data, such as linguistic corpora.
    The program can be used to standardize and normalize data prior to analysis or manipulation.

  4. When creating tools to automatically convert text files between different formats.
    The program can be used to remove diacritics before converting a text file to another format that does not support diacritics.

  5. Processing of Internet content:
    If you run a website or blog, this program can help you remove or replace diacritics in your content, making it easier to read and find.

  6. Linguistic analysis:
    If you perform language analysis, this program can help you normalize text by removing diacritics and standardizing word forms.

  7. Setting up documents for printing:
    Diacritical characters in text files might occasionally cause issues for printers. To prevent printing issues, you can swap out diacritical marks with their equivalents using this application.

  8. Handling documents in various character encodings:
    This application can assist you in replacing diacritics with the proper characters in the chosen encoding, such as UTF-8, if you are working with documents in multiple character encodings.

All in all, this program can be very useful for programmers who need a simple solution to remove or replace diacritics in text files. It can be used in various fields and situations where texts must be processed in a coherent and consistent manner.

DEFECTS

Here are some potential disadvantages of a diacritic replacement program for text files:

  1. Possibility of losing information:
    Removing or replacing diacritics may lead to the loss of certain information or change the meaning of words. For example, in some languages, diacritics may indicate differences in pronunciation or meaning of words. In such cases, the program may alter the meaning of the text, which can be problematic.

  2. Uncertainty about the correctness of the task:
    The program may not always behave as expected, especially with more complex diacritic patterns. Therefore, when dealing with more advanced applications, it is important to test and verify the program before widespread use.

  3. A variety of languages:
    A program that aims to eliminate or replace diacritics must take into account different languages and their specific diacritical requirements. Some languages may have more complex diacritic systems that the program must handle correctly. If the program does not consider these differences, it may not function properly for texts in these languages.

  4. Possibility of formatting violation:
    A diacritical mark remover may cause text to be formatted incorrectly. For example, if diacritics have been used to denote headings or accents, removing these characters may result in a loss of structure and coherence in the text.

  5. Dependency on context:
    In some cases, the meaning of words may be context-dependent, not just the diacritics themselves. The diacritic remover may not always accurately recognize this context, and incorrect character replacements may occur as a result.

It is important to always thoroughly test such a program and understand the potential downsides and limitations that can occur depending on your specific use case.

IS IT WORTH IT?

In many cases, replacing diacritics in text files with a program can be beneficial, especially if the diacritics are unwanted or interfere with text processing.

Here are several justifications for using such a program:

  1. Facilitating text analysis:
    Removing diacritics or replacing them with appropriate characters without diacritics can make text analysis, word comparison, and information retrieval easier, especially if diacritics have no semantic meaning in the language.

  2. Avoiding Encoding Problems:
    Replacing diacritics in UTF-8 can help avoid encoding problems when processing text, especially when other systems or software do not support the full range of Unicode characters.

  3. Facilitating printing and data import:
    Removing diacritics or replacing them with appropriate characters without diacritics can make it easier to print documents or import data into a database, especially if some systems or printers have problems handling diacritics.

  4. Unifying the form of words:
    Replacing diacritics can help standardize the form of words across documents, which is useful for linguistic analysis or text indexing.

The benefit of using such a tool ultimately depends on your unique scenario and requirements. To prevent information loss or misunderstanding, it is important to be aware of potential errors and exercise caution when processing text.

SUMMARY

The diacritic marks swap program is a useful tool that can be used to automate the diacritic swap process in text files. It makes it straightforward to avoid errors caused by incorrect diacritical presentation on various operating systems. Due to its support for multiple languages, the application is flexible and can be used in various situations.

This application is worth trying if you work with diacritical marks in text files. It will significantly facilitate your work, and the process of changing diacritics will be faster.

FULL SOURCE CODE => GitHub

THANK YOU

'Making Life Easier for Programmers: A Simple Solution to Diacritical Marks in Text Files' was a lengthy essay, so we appreciate your time. I hope you enjoyed the article and learned something from it. Any questions or remarks you may have will be answered with pleasure. Once more, I'd want to thank you and wish you good luck with your reading.

Top comments (0)