I have been quite intrigued with the working of these URL shortner scripts and surprisingly most of them employ an ingenious solution to compress the URL to a shortened one.
The answer is base36 encoding. why base36? because it can contain 26 alphabets and 10 numbers in the output. This is surprisingly simple way of encoding a URL.
Base 36 is nothing but, you keep on dividing a number by 36, collect its reminders (or modulo) and map them to the corresponding table of alphabets and numbers. see base 36 table.
This is how most URL shortner scripts work…
1. First Insert a long URL into the database. Get the id of row which should be a primary key and unique. In most cases it can also be auto increment.
URLs are stored in database with unique id.
2. Now get the corresponding ID to that URL stored in database. Since the ID it can contain only numbers 0-9 (base 10), convert to base 36 using php function base_convert from 10 to 36
Output:
1099 ——-> uj
10099 ——> 7sj
100099 —–> 258j
As you see the output produced by base36, smaller numbers output fewer characters and for even millionth number, we generate just 4 characters in length in the form of mixed alphabets and numbers.
3. Store that base36 output in the database, respective to that of ID in a separate field. so the trimmed version of URL becomes….
for the URL http://geekworld.co.in with ID 10099 stored in the database.
The above http://example.com/7sj is a mod-write for the php page
which queries the database based for destination URL against base38 stored and then redirects.
The above is a very simple technique and besides this there are many more techniques for URL shortening, for which i recommend the below referenced resources.
Enjoy!
Code:
http://example.com/fe45 ——-> http://geekworld.co.in/blah/page.htm
The answer is base36 encoding. why base36? because it can contain 26 alphabets and 10 numbers in the output. This is surprisingly simple way of encoding a URL.
Base 36 is nothing but, you keep on dividing a number by 36, collect its reminders (or modulo) and map them to the corresponding table of alphabets and numbers. see base 36 table.
This is how most URL shortner scripts work…
1. First Insert a long URL into the database. Get the id of row which should be a primary key and unique. In most cases it can also be auto increment.
URLs are stored in database with unique id.
Code:
----------------------------------------------- ID URL ----------------------------------------------- 10099 http://geekworld.co.in/ 14566 http://geekworld.co.in/blah/page.htm
2. Now get the corresponding ID to that URL stored in database. Since the ID it can contain only numbers 0-9 (base 10), convert to base 36 using php function base_convert from 10 to 36
Code:
[SIZE="3"]<?php $id = "10099"; echo base_convert($id,10,36); ?>[/SIZE]
Output:
1099 ——-> uj
10099 ——> 7sj
100099 —–> 258j
As you see the output produced by base36, smaller numbers output fewer characters and for even millionth number, we generate just 4 characters in length in the form of mixed alphabets and numbers.
3. Store that base36 output in the database, respective to that of ID in a separate field. so the trimmed version of URL becomes….
Code:
http://example.com/7sj
for the URL http://geekworld.co.in with ID 10099 stored in the database.
The above http://example.com/7sj is a mod-write for the php page
Code:
http://example.com/7sj -----> http://example.com/short.php?baseid=7sj
which queries the database based for destination URL against base38 stored and then redirects.
The above is a very simple technique and besides this there are many more techniques for URL shortening, for which i recommend the below referenced resources.
Enjoy!
Comment