Jason Milliken

flan-t5

using flan-t5 for sequence to sequence tasks

English to JSON?

In many cases, important information is communicated through natural language. As an experiment, I wanted to explore using flan-t5 for translating English into JSON. For example, if someone asked you to pick up groceries, they might say something like, "Can you run by the store and get milk, eggs, two loaves of bread and apples. Make sure you get almond milk."

What we want to communicate:

  • 1 Almond Milk
  • 1 Eggs
  • 2 Bread
  • ? Apples

We would want to be able to reason over this data and prompt for more information to ask clarifing questions - "How many apples?", "What type of apples?". To do that we need to get the information out of natural language and express it as data. For that we will use flan-t5.

Step 1 - Build a training dataset

We need a CSV file with the request sentences and the corresponding JSON format. I had to write a script to generate the training data as I did not find a pre-existing dataset.

info, label
store: Please pick up milk,[{"item":"milk", "qty":1}]

Step 2 - Training the t5-flan model

Huggingface has a number of ready made scripts for fine tune training on a variety of tasks. (Thanks Huggingface!) I am using the Summarization script.

However, the results were trash. It turns out that the curly brackets {} are not part of the t5 tokenizer vocabulary, which means we can't translate to JSON. This is a known issue with t5 issue

Step 3 - Switch to s-expressions

While JSON is my favorite serialization method, it isn't the only one in use. Instead, I used s-expressions. Now, my training data looks like this:

info,label
store: Please pick up milk and bread,((:item "milk" :qty 1)(:item "bread" :qty 1))

Step 4 - Update training dataset and re-train the model

Now, with updated training data, we can retrain the model.

Step 5 - Inferencing

How does it perform?

model_order = AutoModelForSeq2SeqLM.from_pretrained("model/tst-summarization")
tokenizer = AutoTokenizer.from_pretrained("model/tst-summarization")
inputs = tokenizer("store: " + text, return_tensors="pt")
outputs = model_order.generate(**inputs, max_new_tokens=256)
result = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(result)

It worked great!

  • Please pick up milk and 2 loaves of bread
((:item "milk" :qty 1)(:item "bread" :qty 2))

Step 6 - But s-expressions aren't json

To convert the s-expression to JSON, we can use an s-expression to JSON converter available on GitHub s-expression to json.