Google's SyntaxNet - HTTP API for Portuguese

In a previous post I explained how to load the syntactic and morphological information given by SyntaxNet into NLTK structures by parsing the standard output. Although useful this does not scale when one wants to process thousands of sentences, but finally, I’ve found a Docker image to set up SyntaxNet as a web-service.

It turns out this is simple and straightforward using a Docker image. Here are the steps on how to do it, and setting up for Portuguese:

1. Install Docker

The first thing is to install Docker, the easiest way is to follow a tutorial. I’ve found a simple and straightforward tutorial for Ubuntu 16.04.

2. Install SyntaxNet-API Docker Image

The next step is to install a Docker image which already contains Tensorflow together with SyntaxNet and provides an API interface exposing the model via a web interface.

git clone https://github.com/askplatypus/syntaxnet-api
cd syntaxnet-api
docker build . -t syntaxnet-api

Before building the docker image (i.e., running docker build . -t syntaxnet-api) you can specify which language you want SyntaxNet to parse. This is done by updating the following line in flask_server.py, in this case for Portuguese:

# Overrides available languages map
-language_code_to_model_name['en'] = 'English'
+language_code_to_model_name['pt'] = 'Portuguese'

3. Run the Docker Image

Next, after building the docker image, you just need to run it. Type

docker images

to see a list of current images, you want to see the IMAGE ID for the syntaxnet-api, passing it to the following command:

docker run -i -p 7000:7000 -t IMAGE_ID_value

This should expose a web service on your localhost on port 7000 similar to http://syntaxnet.askplatyp.us/v1#/default

The -p 7000:7000 forwards the port 7000 on your host to the same port on the running image, the parameter -i forces the output to be shown on stdout, you may replace this with -d to make the image run in the background and detach from the shell.

You can also run in from the command line, with curl:

curl -X POST --header 'Content-Type: text/plain; charset=utf-8' --header 'Accept: text/plain' --header 'Content-Language: pt' -d 'Olá mundo, teste!' 'http://0.0.0.0:7000/v1/parsey-universal-full'

This should output something like this:

Olá	_	INTJ	in	Gender=Masc|Number=Sing	0	ROOT
mundo,	_	NOUN	n|M|S	Gender=Masc|Number=Sing	1	nsubj
teste!	_	ADJ	adj|M|S	Gender=Masc|Number=Sing	2	amod

NOTE: I omitted some of the outputted info for each word to make everything fit in one line :)

Alternatives (a bit faster)

After running a few experiments in batch mode I notice this was still a bit slow, probably because I was also running it on a machine without any GPUs.

Looking through more SyntaxNet Docker images I’ve found another, a fork of the one described above, which pre-loads the models, and makes the batch processing a bit faster.

The only problem I found was that it was loading the models for all the languages, and this would take around 10GB of RAM! So I created a new image by removing all the other models except the one for Portuguese, and build it using the commands described above.

I did an experiment, comparing both images, by measuring the time taken to process 500 sentences in Portuguese.

Docker Image	Time
Original	35m12.131s
Updated	25m56.689s

Using the image that first pre-loads the Portuguese model is about 10min faster. Note, this was done in a machine without any GPUS, I believe using GPUs this would be much faster.

The image, loading models only for Portuguese, it’s here:

https://github.com/davidsbatista/syntaxnet-api

NOTE: After experimenting, building and running Docker images you might want to clean up and free up some space on your machine, these links might help:

Loading SyntaxNet’s syntactic dependencies into NLTK’s DependencyGraph

1. Install Docker

2. Install SyntaxNet-API Docker Image

3. Run the Docker Image

Alternatives (a bit faster)

Related posts