Proposing a Centimal Number System for Geotagging and Geocoding

Centimal System, GPS, Geotagging 1 Comment »

Introduction

Geotagging of photos and other data items today is a tedious and error-prone affair, not least because Latitude and Longitude have to be entered 100% correctly. One wrong digit and you may be many meters or many miles off. Certainly there are numerous tools to these task easier and new ones are released almost daily in this age of geotagging craze. Even with these products, having to deal with high precision numbers, positive and negative, matching time etc. can be quite daunting. So as with domain names (which obviate remembering IP addresses), wouldn’t it be easier to describe the Latitude and Longitude numbers with a simple mnemonic “babble language” consisting of “friendly names” for each location? Some alternative ways have been tried to do away with the numerical notation of location coordinates:

Grid Systems

There are numerous grid systems using letters as well as numbers with the best known example being the Universal Transverse Mercator (UTM). Grid systems are not directly supported by modern GPS devices which use the WGS84 datum. So for easy conversion the Latitude and Longitude should be kept the established decimal numeric form.

Numbers into Words

It would be desirable to have a system that expresses numbers as pronounceable words, which could be understood in any language and in spoken conversation, such as over the telephone.

Babble Bubble

The bubble babble encoding uses alternating consonants and vowels, facilitating public key validation in verbal form, like over the telephone, as the resulting string can be more easily pronounced and understood than hexadecimal sequences of letters and numbers. Here’s a port of Bubble Babble to C#.

Koremutake

Another take on a related problem is Koremutake, which devises a system to express large integer numbers as a sequence of syllables (phonemes). It is a binary encoding with a base of 128, using 128 distinct syllables to encode integers. While this system works fine for integers, trying to expand this system to fractions one quickly runs into the well-known problems of converting decimal numbers into binary format. Displaying a simple decimal number like 0.1 “would need an infinitely recurring binary fraction”. So while the Koremutake-encoded integer 65535 is an easy-to-remember BOTRETRE, the floating point number 65535.1 would be encoded as BOTRETRE.FAFRAMOHESUDRA if Koremutake covered floating numbers, and precision would still be only adequate. As a sidenote, here’s a port of Koremutake to C#.

Decimal Encoding

To ward off the rounding and approximation issues of floating point numbers, a modified system based on decimal numbers should be used. First of all, decimal numbers are what users are familiar with and more importantly, they are supported by most operating systems and programming languages today (C#, C++, Java, Visual Basic), and even ECMA-Script/JavaScript will support them in the near future.

Centimal System

A base-100 (centimal, centesimal?) system is a tall order if only ASCII vowels and consonants are to be used. Citing the ancient major system, one suggested centimal system uses both numbers and syllables made up of consonants and vowels to encode integers. Then again, the combination of numbers and letters are notoriously hard to remember, probably because numbers and letters are stored and processed in different parts of the brain. Numbers mixed with letters also hamper the reading flow and introduce local translations (for the numbers) into the flow of (language-agnostic) syllables.

Expanding the Vowels to 10

Living in Thailand, I’m used to using 18 consonants and 6 diphthongs, so I expanded the usual a, e, o, u with “ai”, “an”, “ao”, “in”, “oi”, “on” to make 10 vowels, sorted alphabetically this makes the sequence “a”, “ai”, “an”, “ao”, “e”, “in”, “o”, “oi”, “on”, “u”. The artificial Itkhuil language uses the “expand the vowel stock”-approach but that’s a different ball game entirely. ‘I’ and ‘y’ are not used as vowels (unlike Koremutake) since i is pronounced as the English ‘ee’ sound in many languages including German, Dutch and Italian.

Ten Consonants

We now need only 10 consonants to form all our base-100 “digits”. For international compatibility ‘c’, ‘j’ (pronounced like i in German, Dutch) and w (pronounced like v in German, not used in some romance languages) are excluded from the consonants and so is v. Furthermore, ‘r’ is left out because it is not distinguished from ‘l’ in many spoken Asian languages. The result reminds one of Chinese, Vietnamese or Thai, due mainly to the expansion of vowels. But luckily, unlike the aforementioned languages, our system is not tonal ;). So now we have a coherent and predictable 10 x 10 base-100 number system consisting of easy-to-read and-pronounce syllables. Two issues remain:

Negative Numbers

What about negative numbers? One way to express them could be to use lower case for negative numbers and capitalize all letters for positive numbers. However, this is against the established customs for identifiers like web-URLs or e-mail addresses, where case does not matter. Consequently, many users will not distinguish much between lower and upper case. The prefix letter A could be prepended to negative numbers as one of only two letters in the alphabet (the other being H) it has a “full-width” vertical line. This also fits well into the flow of the languages, with a leading A generally easy to pronounce for all the syllables we will use.

Fractions (Non-Integers)

The dot of fractional numbers should be replaced by a distinctive syllable. I was thinking of using “Poi” as a short for “Point”, but I want to use both P as a consonant and oi as a vowel in the number system. So I settled for “Ha”, which is also shorter. So 1.2 becomes BaiHaBan. The word “point” is too idiomatic for English, many European languages use “comma” instead of a point as the decimal separator. Here are the 100 distinct syllables of the centimal system:

Positive Numbers Negative Numbers Fractions (Floats)
Bai 1
Ban 2
Bao 3
Be 4
Bin 5
Bo 6
Boi 7
Bon 8
Bu 9
Da 10
Dai 11
Dan 12
Dao 13
De 14
Din 15
Do 16
Doi 17
Don 18
Du 19
Fa 20
Fai 21
Fan 22
Fao 23
Fe 24
Fin 25
Fo 26
Foi 27
Fon 28
Fu 29
Ga 30
Gai 31
Gan 32
Gao 33
Ge 34
Gin 35
Go 36
Goi 37
Gon 38
Gu 39
La 40
Lai 41
Lan 42
Lao 43
Le 44
Lin 45
Lo 46
Loi 47
Lon 48
Lu 49
Ma 50
Mai 51
Man 52
Mao 53
Me 54
Min 55
Mo 56
Moi 57
Mon 58
Mu 59
Pa 60
Pai 61
Pan 62
Pao 63
Pe 64
Pin 65
Po 66
Poi 67
Pon 68
Pu 69
Sa 70
Sai 71
San 72
Sao 73
Se 74
Sin 75
So 76
Soi 77
Son 78
Su 79
Ta 80
Tai 81
Tan 82
Tao 83
Te 84
Tin 85
To 86
Toi 87
Ton 88
Tu 89
Ya 90
Yai 91
Yan 92
Yao 93
Ye 94
Yin 95
Yo 96
Yoi 97
Yon 98
Yu 99
BaiBa 100
AbaiBa -100
Ayu -99
Ayon -98
Ayoi -97
Ayo -96
Ayin -95
Aye -94
Ayao -93
Ayan -92
Ayai -91
Aya -90
Atu -89
Aton -88
Atoi -87
Ato -86
Atin -85
Ate -84
Atao -83
Atan -82
Atai -81
Ata -80
Asu -79
Ason -78
Asoi -77
Aso -76
Asin -75
Ase -74
Asao -73
Asan -72
Asai -71
Asa -70
Apu -69
Apon -68
Apoi -67
Apo -66
Apin -65
Ape -64
Apao -63
Apan -62
Apai -61
Apa -60
Amu -59
Amon -58
Amoi -57
Amo -56
Amin -55
Ame -54
Amao -53
Aman -52
Amai -51
Ama -50
Alu -49
Alon -48
Aloi -47
Alo -46
Alin -45
Ale -44
Alao -43
Alan -42
Alai -41
Ala -40
Agu -39
Agon -38
Agoi -37
Ago -36
Agin -35
Age -34
Agao -33
Agan -32
Agai -31
Aga -30
Afu -29
Afon -28
Afoi -27
Afo -26
Afin -25
Afe -24
Afao -23
Afan -22
Afai -21
Afa -20
Adu -19
Adon -18
Adoi -17
Ado -16
Adin -15
Ade -14
Adao -13
Adan -12
Adai -11
Ada -10
Abu -9
Abon -8
Aboi -7
Abo -6
Abin -5
Abe -4
Abao -3
Aban -2
Abai -1
Ba 0
Ba 0
BaHaBai 0.01
BaHaBan 0.02
BaHaBao 0.03
BaHaBe 0.04
BaHaBin 0.05
BaHaBo 0.06
BaHaBoi 0.07
BaHaBon 0.08
BaHaBu 0.09
BaHaDa 0.10
BaHaDai 0.11
BaHaDan 0.12
BaHaDao 0.13
BaHaDe 0.14
BaHaDin 0.15
BaHaDo 0.16
BaHaDoi 0.17
BaHaDon 0.18
BaHaDu 0.19
BaHaFa 0.20
BaHaFai 0.21
BaHaFan 0.22
BaHaFao 0.23
BaHaFe 0.24
BaHaFin 0.25
BaHaFo 0.26
BaHaFoi 0.27
BaHaFon 0.28
BaHaFu 0.29
BaHaGa 0.30
BaHaGai 0.31
BaHaGan 0.32
BaHaGao 0.33
BaHaGe 0.34
BaHaGin 0.35
BaHaGo 0.36
BaHaGoi 0.37
BaHaGon 0.38
BaHaGu 0.39
BaHaLa 0.40
BaHaLai 0.41
BaHaLan 0.42
BaHaLao 0.43
BaHaLe 0.44
BaHaLin 0.45
BaHaLo 0.46
BaHaLoi 0.47
BaHaLon 0.48
BaHaLu 0.49
BaHaMa 0.50
BaHaMai 0.51
BaHaMan 0.52
BaHaMao 0.53
BaHaMe 0.54
BaHaMin 0.55
BaHaMo 0.56
BaHaMoi 0.57
BaHaMon 0.58
BaHaMu 0.59
BaHaPa 0.60
BaHaPai 0.61
BaHaPan 0.62
BaHaPao 0.63
BaHaPe 0.64
BaHaPin 0.65
BaHaPo 0.66
BaHaPoi 0.67
BaHaPon 0.68
BaHaPu 0.69
BaHaSa 0.70
BaHaSai 0.71
BaHaSan 0.72
BaHaSao 0.73
BaHaSe 0.74
BaHaSin 0.75
BaHaSo 0.76
BaHaSoi 0.77
BaHaSon 0.78
BaHaSu 0.79
BaHaTa 0.80
BaHaTai 0.81
BaHaTan 0.82
BaHaTao 0.83
BaHaTe 0.84
BaHaTin 0.85
BaHaTo 0.86
BaHaToi 0.87
BaHaTon 0.88
BaHaTu 0.89
BaHaYa 0.90
BaHaYai 0.91
BaHaYan 0.92
BaHaYao 0.93
BaHaYe 0.94
BaHaYin 0.95
BaHaYo 0.96
BaHaYoi 0.97
BaHaYon 0.98
BaHaYu 0.99
Bai 1.00

Application to Geocoding and geotagging

Latitude and longitude

So how to apply this encoding or notation to geocoding and geotagging? Especially what to do with the (decimal) point? How can these solutions facilitate geocoding and geotagging, which often involves verbalization or memorization of latitude and longitude information? The classic notation of latitude and longitude is sexagesimal (example: 52°37′ 48.92″ N, 111°57′  8.55″ W) dividing the 90 degrees (hours) of latitude and the 180 degrees of longitude into minutes and seconds, with accuracy further enhanced by decimal fractions of the seconds. Historically the system is based on nautical miles with a degree of latitude exactly 60 nautical miles, making a minute equal to one nautical mile.

This sexagesimal system is especially hard to handle for the layperson. For geotagging and geocoding, the decimal system (degrees with decimal fraction) today holds sway being supported by maps.google.com, mappoint, autoroute etc.

Still, numbers with six decimal digits after the comma are hard to remember.

So we will apply our above mentioned number system to the decimal notation, with a leading “A” denoting negative latitude or longitude. We will get rid of fractional part (decimal digits after the decimal separator) by multiplying the fractional values by 1 million, delivering accuracy of a millionth of a longitudinal degree, a level of precision that is generally accepted for GIS applications.

This way we can express every location in the world with a maximum just nine syllables with a leading “A” for latitude and longitude values. So we can denote every place on the planet with about 1 meter accuracy using just 32400000000000000 different words consisting of a “first name” for the latitude and a “last name” for the longitude separated by a dot, colon, pound-sign (#) or at-sign (@).

The at-sign is preferrable here because, being introduced and used daily for e-mail, it can be pronounced by people around the world regardless of their native language.

These “geonames” are language-agnostic, making it easy for speakers of different languages to exchange geographic information, or any kind of numeric information for that matter. For example, phone numbers can be exchanged more easily using the proposed system. Leading zeros are provided for by prepending the syllable “zero” for each leading 0. Or your credit card pin is just two syllables, which is definitely easier to remember than four digits, while also providing an (admittedly minimal) level of “encryption”. The system also lends itself to speech recognition, facilitating software applications for geotagging where the user simply pronounces the latitide and longitude in centimal as delivered by the GPS device or camera.

Examples:

Site Latitude:Longitude Centimal:Geonaming Format
Statue of Liberty: Lat: 40.689440 Long: -74.044698 LaPonYeLa@AseBeLoYon
Top of Mount Everest: Lat: 27.988130 Long: 86.925141 FoiYonTaiGa@ToYanMaiLai
Sydney Opera House Lat: -33.891788 Long: 151.176251 AgaoTuDoiTon@BaiMaiDoiPanMai
Royal Observatory Greenwich Lat: 51.477818 Long: 0 MaiLoiSonDon@Ba

Other geographic information

Geographic information includes other data items, such as the time and the direction (bearing) of movement.

Time can be expressed in the format yyyyMMddhhmmss (YearsMonthsDaysHoursMinutesSeconds) which gives a value of 200712312359 for the last second of 2007. However, this number is rather large and has to be expressed as FaBoiDanGaiFaoMu in our system. This is six syllables and, while still easier to handle for humans than the numeric expression, it goes beyong the memorization capability of most humans.

Another approach might be a count of the seconds since a point in time, say January 1st, 1900. However, the number of seconds since the beginning of 1900 is now also pushing 3,500,000,000 which could be expressed with five syllables, a minor advantage and a tradeoff in not being able to handle earlier dates. So the yyyyMMddhhmmss should be used, and the the above examples can be just expanded with further @ signs such as LaPonYeLa@AseBeLoYon@FaBoiDanGaiFaoMu, an example of convention over configuration. Or the separator could be changed for time, to the pound sign for example.

The system is open source and can be used by all for any purpose.

Try it here!

And here’s a demo for you GPS-enabled Pocket PC (requires .NET Compact Framework 2.0).

Update (June 16, 2010): A Ruby port will be posted on github in the next few days.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in