Proposing a Centimal Number System for Geotagging and Geocoding
Centimal System, GPS, Geotagging 1 Comment »Introduction
Geotagging of photos and other data items today is a tedious and error-prone affair, not least because Latitude and Longitude have to be entered 100% correctly. One wrong digit and you may be many meters or many miles off. Certainly there are numerous tools to these task easier and new ones are released almost daily in this age of geotagging craze. Even with these products, having to deal with high precision numbers, positive and negative, matching time etc. can be quite daunting. So as with domain names (which obviate remembering IP addresses), wouldn’t it be easier to describe the Latitude and Longitude numbers with a simple mnemonic “babble language” consisting of “friendly names” for each location? Some alternative ways have been tried to do away with the numerical notation of location coordinates:
Grid Systems
There are numerous grid systems using letters as well as numbers with the best known example being the Universal Transverse Mercator (UTM). Grid systems are not directly supported by modern GPS devices which use the WGS84 datum. So for easy conversion the Latitude and Longitude should be kept the established decimal numeric form.
Numbers into Words
It would be desirable to have a system that expresses numbers as pronounceable words, which could be understood in any language and in spoken conversation, such as over the telephone.
Babble Bubble
The bubble babble encoding uses alternating consonants and vowels, facilitating public key validation in verbal form, like over the telephone, as the resulting string can be more easily pronounced and understood than hexadecimal sequences of letters and numbers. Here’s a port of Bubble Babble to C#.
Koremutake
Another take on a related problem is Koremutake, which devises a system to express large integer numbers as a sequence of syllables (phonemes). It is a binary encoding with a base of 128, using 128 distinct syllables to encode integers. While this system works fine for integers, trying to expand this system to fractions one quickly runs into the well-known problems of converting decimal numbers into binary format. Displaying a simple decimal number like 0.1 “would need an infinitely recurring binary fraction”. So while the Koremutake-encoded integer 65535 is an easy-to-remember BOTRETRE, the floating point number 65535.1 would be encoded as BOTRETRE.FAFRAMOHESUDRA if Koremutake covered floating numbers, and precision would still be only adequate. As a sidenote, here’s a port of Koremutake to C#.
Decimal Encoding
To ward off the rounding and approximation issues of floating point numbers, a modified system based on decimal numbers should be used. First of all, decimal numbers are what users are familiar with and more importantly, they are supported by most operating systems and programming languages today (C#, C++, Java, Visual Basic), and even ECMA-Script/JavaScript will support them in the near future.
Centimal System
A base-100 (centimal, centesimal?) system is a tall order if only ASCII vowels and consonants are to be used. Citing the ancient major system, one suggested centimal system uses both numbers and syllables made up of consonants and vowels to encode integers. Then again, the combination of numbers and letters are notoriously hard to remember, probably because numbers and letters are stored and processed in different parts of the brain. Numbers mixed with letters also hamper the reading flow and introduce local translations (for the numbers) into the flow of (language-agnostic) syllables.
Expanding the Vowels to 10
Living in Thailand, I’m used to using 18 consonants and 6 diphthongs, so I expanded the usual a, e, o, u with “ai”, “an”, “ao”, “in”, “oi”, “on” to make 10 vowels, sorted alphabetically this makes the sequence “a”, “ai”, “an”, “ao”, “e”, “in”, “o”, “oi”, “on”, “u”. The artificial Itkhuil language uses the “expand the vowel stock”-approach but that’s a different ball game entirely. ‘I’ and ‘y’ are not used as vowels (unlike Koremutake) since i is pronounced as the English ‘ee’ sound in many languages including German, Dutch and Italian.
Ten Consonants
We now need only 10 consonants to form all our base-100 “digits”. For international compatibility ‘c’, ‘j’ (pronounced like i in German, Dutch) and w (pronounced like v in German, not used in some romance languages) are excluded from the consonants and so is v. Furthermore, ‘r’ is left out because it is not distinguished from ‘l’ in many spoken Asian languages. The result reminds one of Chinese, Vietnamese or Thai, due mainly to the expansion of vowels. But luckily, unlike the aforementioned languages, our system is not tonal ;). So now we have a coherent and predictable 10 x 10 base-100 number system consisting of easy-to-read and-pronounce syllables. Two issues remain:
Negative Numbers
What about negative numbers? One way to express them could be to use lower case for negative numbers and capitalize all letters for positive numbers. However, this is against the established customs for identifiers like web-URLs or e-mail addresses, where case does not matter. Consequently, many users will not distinguish much between lower and upper case. The prefix letter A could be prepended to negative numbers as one of only two letters in the alphabet (the other being H) it has a “full-width” vertical line. This also fits well into the flow of the languages, with a leading A generally easy to pronounce for all the syllables we will use.
Fractions (Non-Integers)
The dot of fractional numbers should be replaced by a distinctive syllable. I was thinking of using “Poi” as a short for “Point”, but I want to use both P as a consonant and oi as a vowel in the number system. So I settled for “Ha”, which is also shorter. So 1.2 becomes BaiHaBan. The word “point” is too idiomatic for English, many European languages use “comma” instead of a point as the decimal separator. Here are the 100 distinct syllables of the centimal system:
| Positive Numbers | Negative Numbers | Fractions (Floats) |
| Bai 1 Ban 2 Bao 3 Be 4 Bin 5 Bo 6 Boi 7 Bon 8 Bu 9 Da 10 Dai 11 Dan 12 Dao 13 De 14 Din 15 Do 16 Doi 17 Don 18 Du 19 Fa 20 Fai 21 Fan 22 Fao 23 Fe 24 Fin 25 Fo 26 Foi 27 Fon 28 Fu 29 Ga 30 Gai 31 Gan 32 Gao 33 Ge 34 Gin 35 Go 36 Goi 37 Gon 38 Gu 39 La 40 Lai 41 Lan 42 Lao 43 Le 44 Lin 45 Lo 46 Loi 47 Lon 48 Lu 49 Ma 50 Mai 51 Man 52 Mao 53 Me 54 Min 55 Mo 56 Moi 57 Mon 58 Mu 59 Pa 60 Pai 61 Pan 62 Pao 63 Pe 64 Pin 65 Po 66 Poi 67 Pon 68 Pu 69 Sa 70 Sai 71 San 72 Sao 73 Se 74 Sin 75 So 76 Soi 77 Son 78 Su 79 Ta 80 Tai 81 Tan 82 Tao 83 Te 84 Tin 85 To 86 Toi 87 Ton 88 Tu 89 Ya 90 Yai 91 Yan 92 Yao 93 Ye 94 Yin 95 Yo 96 Yoi 97 Yon 98 Yu 99 BaiBa 100 |
AbaiBa -100 Ayu -99 Ayon -98 Ayoi -97 Ayo -96 Ayin -95 Aye -94 Ayao -93 Ayan -92 Ayai -91 Aya -90 Atu -89 Aton -88 Atoi -87 Ato -86 Atin -85 Ate -84 Atao -83 Atan -82 Atai -81 Ata -80 Asu -79 Ason -78 Asoi -77 Aso -76 Asin -75 Ase -74 Asao -73 Asan -72 Asai -71 Asa -70 Apu -69 Apon -68 Apoi -67 Apo -66 Apin -65 Ape -64 Apao -63 Apan -62 Apai -61 Apa -60 Amu -59 Amon -58 Amoi -57 Amo -56 Amin -55 Ame -54 Amao -53 Aman -52 Amai -51 Ama -50 Alu -49 Alon -48 Aloi -47 Alo -46 Alin -45 Ale -44 Alao -43 Alan -42 Alai -41 Ala -40 Agu -39 Agon -38 Agoi -37 Ago -36 Agin -35 Age -34 Agao -33 Agan -32 Agai -31 Aga -30 Afu -29 Afon -28 Afoi -27 Afo -26 Afin -25 Afe -24 Afao -23 Afan -22 Afai -21 Afa -20 Adu -19 Adon -18 Adoi -17 Ado -16 Adin -15 Ade -14 Adao -13 Adan -12 Adai -11 Ada -10 Abu -9 Abon -8 Aboi -7 Abo -6 Abin -5 Abe -4 Abao -3 Aban -2 Abai -1 Ba 0 |
Ba 0 BaHaBai 0.01 BaHaBan 0.02 BaHaBao 0.03 BaHaBe 0.04 BaHaBin 0.05 BaHaBo 0.06 BaHaBoi 0.07 BaHaBon 0.08 BaHaBu 0.09 BaHaDa 0.10 BaHaDai 0.11 BaHaDan 0.12 BaHaDao 0.13 BaHaDe 0.14 BaHaDin 0.15 BaHaDo 0.16 BaHaDoi 0.17 BaHaDon 0.18 BaHaDu 0.19 BaHaFa 0.20 BaHaFai 0.21 BaHaFan 0.22 BaHaFao 0.23 BaHaFe 0.24 BaHaFin 0.25 BaHaFo 0.26 BaHaFoi 0.27 BaHaFon 0.28 BaHaFu 0.29 BaHaGa 0.30 BaHaGai 0.31 BaHaGan 0.32 BaHaGao 0.33 BaHaGe 0.34 BaHaGin 0.35 BaHaGo 0.36 BaHaGoi 0.37 BaHaGon 0.38 BaHaGu 0.39 BaHaLa 0.40 BaHaLai 0.41 BaHaLan 0.42 BaHaLao 0.43 BaHaLe 0.44 BaHaLin 0.45 BaHaLo 0.46 BaHaLoi 0.47 BaHaLon 0.48 BaHaLu 0.49 BaHaMa 0.50 BaHaMai 0.51 BaHaMan 0.52 BaHaMao 0.53 BaHaMe 0.54 BaHaMin 0.55 BaHaMo 0.56 BaHaMoi 0.57 BaHaMon 0.58 BaHaMu 0.59 BaHaPa 0.60 BaHaPai 0.61 BaHaPan 0.62 BaHaPao 0.63 BaHaPe 0.64 BaHaPin 0.65 BaHaPo 0.66 BaHaPoi 0.67 BaHaPon 0.68 BaHaPu 0.69 BaHaSa 0.70 BaHaSai 0.71 BaHaSan 0.72 BaHaSao 0.73 BaHaSe 0.74 BaHaSin 0.75 BaHaSo 0.76 BaHaSoi 0.77 BaHaSon 0.78 BaHaSu 0.79 BaHaTa 0.80 BaHaTai 0.81 BaHaTan 0.82 BaHaTao 0.83 BaHaTe 0.84 BaHaTin 0.85 BaHaTo 0.86 BaHaToi 0.87 BaHaTon 0.88 BaHaTu 0.89 BaHaYa 0.90 BaHaYai 0.91 BaHaYan 0.92 BaHaYao 0.93 BaHaYe 0.94 BaHaYin 0.95 BaHaYo 0.96 BaHaYoi 0.97 BaHaYon 0.98 BaHaYu 0.99 Bai 1.00 |
Application to Geocoding and geotagging
Latitude and longitude
So how to apply this encoding or notation to geocoding and geotagging? Especially what to do with the (decimal) point? How can these solutions facilitate geocoding and geotagging, which often involves verbalization or memorization of latitude and longitude information? The classic notation of latitude and longitude is sexagesimal (example: 52°37′ 48.92″ N, 111°57′ 8.55″ W) dividing the 90 degrees (hours) of latitude and the 180 degrees of longitude into minutes and seconds, with accuracy further enhanced by decimal fractions of the seconds. Historically the system is based on nautical miles with a degree of latitude exactly 60 nautical miles, making a minute equal to one nautical mile.
This sexagesimal system is especially hard to handle for the layperson. For geotagging and geocoding, the decimal system (degrees with decimal fraction) today holds sway being supported by maps.google.com, mappoint, autoroute etc.
Still, numbers with six decimal digits after the comma are hard to remember.
So we will apply our above mentioned number system to the decimal notation, with a leading “A” denoting negative latitude or longitude. We will get rid of fractional part (decimal digits after the decimal separator) by multiplying the fractional values by 1 million, delivering accuracy of a millionth of a longitudinal degree, a level of precision that is generally accepted for GIS applications.
This way we can express every location in the world with a maximum just nine syllables with a leading “A” for latitude and longitude values. So we can denote every place on the planet with about 1 meter accuracy using just 32400000000000000 different words consisting of a “first name” for the latitude and a “last name” for the longitude separated by a dot, colon, pound-sign (#) or at-sign (@).
The at-sign is preferrable here because, being introduced and used daily for e-mail, it can be pronounced by people around the world regardless of their native language.
These “geonames” are language-agnostic, making it easy for speakers of different languages to exchange geographic information, or any kind of numeric information for that matter. For example, phone numbers can be exchanged more easily using the proposed system. Leading zeros are provided for by prepending the syllable “zero” for each leading 0. Or your credit card pin is just two syllables, which is definitely easier to remember than four digits, while also providing an (admittedly minimal) level of “encryption”. The system also lends itself to speech recognition, facilitating software applications for geotagging where the user simply pronounces the latitide and longitude in centimal as delivered by the GPS device or camera.
Examples:
| Site | Latitude:Longitude | Centimal:Geonaming Format |
| Statue of Liberty: | Lat: 40.689440 Long: -74.044698 | LaPonYeLa@AseBeLoYon |
| Top of Mount Everest: | Lat: 27.988130 Long: 86.925141 | FoiYonTaiGa@ToYanMaiLai |
| Sydney Opera House | Lat: -33.891788 Long: 151.176251 | AgaoTuDoiTon@BaiMaiDoiPanMai |
| Royal Observatory Greenwich | Lat: 51.477818 Long: 0 | MaiLoiSonDon@Ba |
Other geographic information
Geographic information includes other data items, such as the time and the direction (bearing) of movement.
Time can be expressed in the format yyyyMMddhhmmss (YearsMonthsDaysHoursMinutesSeconds) which gives a value of 200712312359 for the last second of 2007. However, this number is rather large and has to be expressed as FaBoiDanGaiFaoMu in our system. This is six syllables and, while still easier to handle for humans than the numeric expression, it goes beyong the memorization capability of most humans.
Another approach might be a count of the seconds since a point in time, say January 1st, 1900. However, the number of seconds since the beginning of 1900 is now also pushing 3,500,000,000 which could be expressed with five syllables, a minor advantage and a tradeoff in not being able to handle earlier dates. So the yyyyMMddhhmmss should be used, and the the above examples can be just expanded with further @ signs such as LaPonYeLa@AseBeLoYon@FaBoiDanGaiFaoMu, an example of convention over configuration. Or the separator could be changed for time, to the pound sign for example.
The system is open source and can be used by all for any purpose.
And here’s a demo for you GPS-enabled Pocket PC (requires .NET Compact Framework 2.0).
Update (June 16, 2010): A Ruby port will be posted on github in the next few days.
Recent Comments