Parsing from text to graph using dictionaries, part 2

If my life depended on timely blogging, I’d be dead ten times over. Regardless, there’s stuff to talk about, and it involves parsing text into graphs. I updated the dictionary we talked about in the last blog, and I’m currently in the middle of another update. The short story is that using lookups in the dictionary for parsing got me a much faster parse time, and I found yet more flaws in how I was representing the words, leading to another format and moving the dictionary into a database instead of a flat file that is parsed at startup.

Where we left off with parsing to graphs…

Last time, I had a pretty neat parser that was looping through words and using a dictionary. We did this to both tell what kind of word they were, as well as what kind of relationship they had with other words according to the kind of datatype they represented. Aside from the usual datatypes such as timestamps, numbers, and colors, semantic gradients were also widely used. We use that type for words that fall along gradients, like the words that fall between “good” and “evil”.

Then, we use some simple pattern matching of vector rules against the word vectors generated by the sentences. That would give us things like this:

Parsing text to a graph.

That’s all well and good, but there were still problems with words that could be both an adjective or verb, and a noun, such as cross (“don’t cross me”/”a cross hung on the wall”). It was also relatively slow, since I used for() loops to iterate over basically everything. Not that for() loops are bad- but they can be slow when iterating over thousands of dictionary entries dozens of times just to find words.

Lookup, the parse times are falling!

One of the best ways to fix the slow parse times was to build a dictionary that loaded words into a dictionary object. In JavaScript, that’s basically just making an object. Parse times, as you can see in the screenshot below, fell quite a bit:

Parsing times outlined in red. 

From the version 1 dictionary average parse times of 26ms, we fell to 3ms. That was really good! But I wasn’t done. Next, I structured the rules into arrays that got rid of a lot of duplicated code, and parse times fell under 1ms, which felt a lot better. Not only am I looking to parse text into graphs that show relationships between data, but I need to be performant about it as well. At about this time, I also made the tool a bit snazzier looking, and decided to parse the whole dictionary to a graph to visualize it:

It took a while to load…
Words are grouped by their root word

All that done, I began to look at the persistence of the information. At this point, I was loading a file in and parsing it, holding it in memory, and then writing it back to a file if I wanted to edit it in the tool. It worked, but the drawback was that there is no place to “remember” the parsed information between sessions. Also, editing a flat file is a pain in the ass. So it was time to begin working on version 3 of the dictionary.

The new, new dictionary

I fell back on MySQL, not only because that’s what I know, but also because a lot of the graph databases seem to be built around triplets and such, and what I was representing in my data was a lot less neat than that. Interrogative already uses a hierarchical representation of its knowledge, and these graphs are eventually going to supersede that representation. So, I combined the techniques so that we had lookup tables for words and gradients, and the dictionary entries got stored in a table that could be maintained much easier than a flat file.

That’s a low bar to meet right there. With these tables in place, I wrote a data access layer based on previous work in that patiently-waiting demo game, and as of this writing, I’m chugging along on getting that integrated to see what the parse times look like.

Where to?

With the new dictionary version being set up, I’m looking to see what the performance looks like, though I imagine it won’t be too much slower. I’m still loading the dictionary into memory, so it’s still using lookups, and will still be fast. But that’s not the main concern anymore.

My main concern is to begin implementing more advanced usage. Now that we can persist parsed text into knowledge graphs, we can:

  • Update those graphs by parsing more text.
  • Update those graphs by bolting on a conversational interface a la Interrogative (this is something of a must).
    • Use the above conversational interface to query the graph like a chat bot.
    • Overlay the Personality-based-AI discussed here to give that conversation a bit more personality.
  • Parse multiple texts in parallel into multiple knowledge graphs pertaining to the same object/subject.
    • Compare, contrast, and merge the knowledge graphs.
  • Begin addressing information that changes over time (movement, state changes, time, etc.).
  • Begin working on reasoners to iterate over knowledge that represents actions.
  • If I can find time, I’d love to dabble with some machine learning pumping its output into these knowledge graphs. Much of ML/DL these days seems to be moving in the direction of related things such as memory, attention, etc. A knowledge graph is really just top-level knowledge.

Until next time!

I wish I had more time to work on this! I’m not treading too much new ground here- this being a mish-mash of semantic web, ontologies, and game AI techniques. All of that, just to suit my needs for NPCs that have a coherent world model in one place. It may not end up ticking every checkbox (it won’t), but it should yield some good techniques for more advanced NPCs and games. Especially where narrative needs a good knowledge representation system.

Next time: Another update on the dictionary and tool, and hopefully some movement on getting it integrated into the demo game NPCs.

Leave a Reply