Skip to main content
Huy's site

Implementing DICT protocol: Part 1

DICT Protocol

What is DICT protocol?

The Dictionary Server Protocol (DICT) is a TCP transaction based query/response protocol that allows a client to access dictionary definitions from a set of natural language dictionary databases.

DICT Protocol - RFC 2229

Notable implementations for this include dict(d) and GNU dico(d); the former is the reference implementation that supports multiple database formats, as listed in dictfmt (1).

I intend to implement a server and multiple clients (CLI, GUI, web) to this protocol, as well as some tools to easily create a dictd-readable database.

Why?

No practical reason, but dict is one of the first command line tool introduced to me and easily one of my favorite, along with curl and jq. It’s basically just a dictionary app, but it’s cool:

  • works perfectly in terminal
  • easily self-hostable
  • fast
  • has cool dictionaries (though only Debian, Arch and derivatives distribute those)

Also, I’m writing dictionaries for my conlangs and I want to distribute them via this protocol. Clearly, implementing a server that is already implemented doesn’t help, but I tend to go down rabbit holes.

I also like to explore non-web protocols, and starting with something simple like DICT might be a good idea.

Reading the spec

The spec (linked at the top of this post) is shorter and easier to read than I thought. Ignoring the introduction, examples and citation, it’s les than 20 pages. There are five classes of commands:

  • Querying the database: DEFINE, MATCH
  • SHOW metadata about the servers and the databases
  • Utilities: informing CLIENT name, check STATUS, show HELP, show OPTION and QUIT
  • Authentication: AUTH and SASLAUTH

The authentication ones are optional, and I don’t find that useful, so I won’t implement it anyway, this limits to the first three categories.

Handling TCP

DICT is based on TCP, and there is a neat interactive TCP tool called telnet, which I used for testing the commands.

telnet

DICT runs on port 2628:

$ telnet dict.org 2628
Trying 199.48.130.6...
Connected to dict.org.
Escape character is '^]'.
220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89168346.27665.1642303045@dict.dict.org>

Let’s try out some commands to understand how this work. Note that I prefix the command with ~> here so that it stands out of the response, and truncate long results with [...].

Let’s first show what databases there are

~> SHOW DB
110 166 databases present
[...]
.
250 ok

There are a lot of dictionaries here, including GCIDE, WordNet, The Jargon File, V.E.R.A., FOLDOC, but most of them are FreeDict dictionaries.

To a word, the syntax is

~> MATCH database strategy word

Strategy is how the server will match the word you’re looking up. To list all strategies available, send the command:

~> SHOW STRATEGIES

There are various strategies supported by dictd, for example, substring, which matches if the entry has the queried word as substring:

~> MATCH jargon substring program
152 13 matches found
jargon "c programmer's disease"
jargon "cargo cult programming"
jargon "mickey mouse program"
jargon "perfect programmer syndrome"
jargon "program"
[...]
.
250 ok [d/m/c = 0/13/5775; 0.000r 0.000u 0.000s]

This command only show which words in the database, if any, satisfy the match, without showing the definition. To actually view a definition, one has to supply the dictionary name to the DEFINE command. Note that, you can also use * for both DEFINE and MATCH command, which will define/match for all dictionaries.

~> DEFINE * programming
150 3 definitions retrieved
151 "programming" wn "WordNet (r) 3.0 (2006)"
programming
    [...]
.
151 "programming" jargon "The Jargon File (version 4.4.7, 29 Dec 2003)"
programming
 n.

    [...]

.
151 "programming" foldoc "The Free On-line Dictionary of Computing (30 December 2018)"
programming

.
250 ok [d/m/c = 3/0/145; 0.000r 0.000u 0.000s]

That’s a gist of how to look up words with DICT protocol. You can find more commands with:

~> HELP
[...]
.
250 ok

Finally, to end the session, the command is:

~> QUIT
221 bye [d/m/c = 0/0/0; 123.000r 0.000u 0.000s]

Note that, the response always ends with a period and a 250 ok response—this is equivalent to HTTP’s 200 OK—except for QUIT. These response code are defined in the protocol specification.

Commands other than HELP has some additional statistics, though this is optional. I figured out that d means definitions, m means matches, and s is probably the time it took to query (why are they always zero, though?), but no clues on what c, r, and u mean. I might check the source code to figure that out, but let’s leave it for another time.

Go

Of course we are not going to make the users type these commands (though it’s not too unintuitive and can be easily remembered). I chose Go to build the CLI client, though without any conscious consideration of fitness. I’m trying out new things1 after all.

From the doc, we can figure out how to make a TCP connection.

conn, err := net.Dial("tcp", "golang.org:80")
if err != nil {
	// handle error
}
fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
status, err := bufio.NewReader(conn).ReadString('\n')
// ...

Let’s copy that and replace with DICT command instead of HTTP:

conn, err := net.Dial("tcp", "dict.org:2628")
if err != nil {
	panic(err)
}
defer conn.Close()
buf := bufio.NewReader(conn)
fmt.Fprintf(conn, "MATCH jargon word programming\n")
fmt.Fprintf(conn, "QUIT\n")

for {
	response, err := buf.ReadString('\n')
	if err != nil {
		// oftentimes this is EOF error
		fmt.Println(err)
		break
	}
	fmt.Printf(response)
}

Running this code, we get response:

220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89266600.1914.1642341395@dict.dict.org>
152 4 matches found
jargon "cargo cult programming"
jargon "programming"
jargon "programming fluid"
jargon "voodoo programming"
.
250 ok [d/m/c = 0/4/3814; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
EOF

which is a good start.

There is a problem with this code: currently we are reading line by line, rather than reading the whole response for each command. We can’t know if line 3 is response for the first command or the second this way. A solution is to check if the line is prefixed with a status code, but do we have a better solution?

Let’s wait till next week!


  1. Not really, I’ve written a CLI client for Wiktionary API with Go before. ↩︎


Would you like to discuss this post? Email me!

Fediring

Look at my fedi fellows' sites:
  1. Previous site
  2. What is Fediring?
  3. Next site

Articles from blogs I read

A Hare code generator for finding ioctl numbers

Modern Unix derivatives have this really bad idea called ioctl. It’s a function which performs arbit…

via Drew DeVault's blog May 14, 2022

Phone-tracking companies tracking military and intelligence personnel

Two US phone-tracking companies showed they could track military and intelligence personnel of …

via Richard Stallman's Political Notes May 13, 2022

GCC 12 Becoming Default Compiler in Tumbleweed

More than a month after preparing the default compiler for openSUSE Tumbleweed to be switched to GN…

via openSUSE News May 13, 2022
Generated by openring