ryodo - domain parser

I just created a tiny gem for parsing a given domain string and retrieve relevant information like TLD, registered/registrable domain and subdomain.


Project in Github: github.com/asaaki/ryodo

RubyGems: rubygems.org/gems/ryodo

Gemfile: gem "ryodo"

What is it good for?

Read the some explanation of publicsuffix.org:

The Public Suffix List is a cross-vendor initiative to provide an accurate list of domain name suffixes, maintained by the hard work of Mozilla volunteers and by submissions from registries, to whom we are very grateful.


Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list. This is the aim of the Public Suffix List.

Shortly: It is not so easy to figure out which part of a given domain string is the registered one, because there are different registration rules for the TLDs/suffixes.

There are some implementation for different languages, also a Ruby version is available, called public_suffix.

When I started some tests, playground stuff and finally a —not fully working— gem with cext, I decided to do a pure Ruby version without knowing that there was already one.

In the end I think it was not so bad to do it on my own, because after some benchmarks I figured out, that my implementation is much faster now.

How to use it

Some code examples tell you:


dom = Ryodo.parse("my.awesome.domain.co.jp")
#=> Ryodo::Domain

                  #    SUBDOMAIN  DOMAIN   TLD
dom.tld           #=>                   "co.jp"
dom.domain        #=>            "domain.co.jp"
dom.subdomain     #=> "my.awesome"
dom               #=> "my.awesome.domain.co.jp"
dom.fqdn          #=> "my.awesome.domain.co.jp."

More formats

# all parts also reversable
# mostly used on domain/FQDN
dom.reverse            #=> "jp.co.domain.awesome.my"
dom.fqdn.reverse       #=> ".jp.co.domain.awesome.my"

dom.to_a               #=> ["my","awesome","domain","co","jp"]
dom.domain.to_a        #=> ["domain","co","jp"]
dom.subdomain.to_a     #=> ["my","awesome"]
dom.fqdn.to_a          #=> ["my","awesome","domain","co","jp",""]

# .to_a also usable with parameter :reverse (or shorthand :r)
dom.domain.to_a(:reverse) #=> ["jp","co","domain","awesome","my"]
dom.fqdn.to_a(:reverse)   #=> ["","jp","co","domain","awesome","my"]
dom.fqdn.to_a(:r)         #=> ["","jp","co","domain","awesome","my"]

You also can call ryodo in different ways:


String extension

Is automatically required.


URI extension

Has to be explicitly required.


gem "ryodo", :require => ["ryodo","ryodo/ext/uri"]


require "ryodo/ext/uri" # if not required via Gemfile

uri = URI.parse("http://my.awesome.domain.jp:5555/path")
#=> "my.awesome.domain.jp"

#=> Ryodo::Domain
# but decorates the String class transparently

#=> "domain.com"


Now the tiny benchmark I did:


A domain input list, taken by publicsuffix.org (checkPublicSuffix test script under publicsuffix.org/list/). I added some very long domain names with many parts (for look-up time scale).

Some of them are also invalid (to test, if you implementation works correctly).

Finally 72 entries to check.

Ruby: 1.9.3-p194, no special patches

We only do a basic parsing and retrieve the registered/registrable domain. (Should hit the most important code of the gems.)

Test script snippet

# DOMAINS is the array of domain entries

LOOPS = 1_000

Benchmark.bmbm do |b|

  b.report "ryodo" do
    LOOPS.times do
      DOMAINS.each do |domain|
        Ryodo.parse(domain).domain # returns nil if not valid

  b.report "public_suffix" do
    LOOPS.times do
      DOMAINS.each do |domain|
        PublicSuffix.parse(domain).domain rescue nil # it raises if not valid in any way, so we rescue it



PublicSuffix.parse(…) will raise errors if domain input is invalid (e.g. not a registrable domain). That is the reason why I have to put a rescue statement.

Ryodo.parse(…) won't raise but return nil values for invalid stuff (it only raises if input is not a String, of course).


Rehearsal -------------------------------------------------
ryodo           1.800000   0.000000   1.800000 (  1.809521)
public_suffix  21.880000   0.020000  21.900000 ( 21.907808)
--------------------------------------- total: 23.700000sec

                    user     system      total        real
ryodo           1.770000   0.000000   1.770000 (  1.769734)
public_suffix  22.320000   0.010000  22.330000 ( 22.346013)

As you can see, Ryodo is more than 10 times faster.

(Funfact: My first approach was 6 times slower — improvement factor of 60!)


public_suffix is completely okay. If you haven't to query a lot, it will do its job.

ryodo will be the choice if you expect to parse lots of domain data in short time. I will use it for another project where I have to expect such masses of parsing.

I also will try to extend it with an optional C extension to make it much more faster. The current implementation can handle ~ 40,000 domains/sec — quite okay for API usage.

A simple .split(".") could also do the job if you don't need to find out, of which type the specific parts are. In my cases it won't be enough.