{"id":158,"date":"2023-10-24T11:40:26","date_gmt":"2023-10-24T15:40:26","guid":{"rendered":"https:\/\/carleton.ca\/xlab\/?p=158"},"modified":"2023-10-24T11:40:26","modified_gmt":"2023-10-24T15:40:26","slug":"using-simon-willisons-llm-package-to-extract-a-knowledge-graph","status":"publish","type":"post","link":"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/","title":{"rendered":"Using Simon Willison&#8217;s LLM Package to Extract a Knowledge Graph"},"content":{"rendered":"<p>Simon Willison&#8217;s <a href=\"https:\/\/llm.datasette.io\/en\/stable\/index.html\">LLM package<\/a> is a lovely little command line utility that allows you to work with many different large language models. In this post, we use LLM to extract a knowledge graph from a mermaid diagram sketch.<\/p>\n<p>1. Sketch out the basics of your knowledge graph. On paper &#8211; yes, on paper! It&#8217;s quicker that way. What kinds of entities do you have? What are their properties? What are the relationships between the entities, and what are their properties? Once you have this sketched, you can translate the sketch into the more formal language of an entity relation graph using the <a href=\"https:\/\/mermaid.js.org\/syntax\/entityRelationshipDiagram.html\">mermaid conventions.<\/a><br \/>\n<code>erDiagram<br \/>\nAUCTIONHOUSE ||--o{ ARTIFACT: auctions<br \/>\nAUCTIONHOUSE ||--o{ PERSON: sells_to<br \/>\nAUCTIONHOUSE ||--o{ MUSEUM: sells_to<br \/>\nMUSEUM{<br \/>\nstring name<br \/>\nstring city<br \/>\nstring country<br \/>\n}<br \/>\nAUCTIONHOUSE{<br \/>\nstring name<br \/>\nstring city<br \/>\nstring country<br \/>\n}<br \/>\n<\/code><br \/>\nfor instance specifies some relationships between auction houses and artifacts, persons, and museums; then, we specify some of the properties of MUSEUM and AUCTIONHOUSE and so on.<\/p>\n<p>I write my diagram at <a href=\"https:\/\/mermaid.live\/edi\">https:\/\/mermaid.live\/edit\u00a0<\/a>so I can also see the result, making sure it conforms to my initial sketch:<\/p>\n<div><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-159 size-full\" src=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM.png\" alt=\"\" width=\"4166\" height=\"1930\" srcset=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM.png 4166w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-240x111.png 240w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-400x185.png 400w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-160x74.png 160w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-768x356.png 768w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-1536x712.png 1536w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-2048x949.png 2048w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-24-at-10.59.34-AM-360x167.png 360w\" sizes=\"(max-width: 4166px) 100vw, 4166px\" \/><\/div>\n<p>2. I install LLM into a new python environment. Assuming you&#8217;ve created your environment, at the terminal:<\/p>\n<p><code>$ pip install llm<\/code><\/p>\n<p>3. I install the llm-gpt4all plugin which makes a number of models optimized to run on consumer grade machines available:<\/p>\n<p><code>$llm install llm-gpt4all<\/code><\/p>\n<p>You can then see what&#8217;s available by running:<\/p>\n<p><code>$llm models list<\/code><\/p>\n<p>If you want to use any particular model, you just use the name from that list after the -m flag; the first time you use it, it will download the relevant model. Test it now:<\/p>\n<p><code>$llm -m orca-mini-7b '3 names for a pet cow'<\/code><\/p>\n<p>Now, for our purposes, we want to use that sketch of the ontology of the antiquities trade as the guide for the model. We want the model to do one thing, and one thing well: identify entities and their properties, and the relationships that we&#8217;ve already specified in the sketch. To do that, we&#8217;ll make a template file.<\/p>\n<p>4. Find the templates directory: <code>$llm templates path<\/code> .\u00a0 Open your code or text editor of choice. Create an empty file called <code>extract.yaml<\/code> and save it in that folder. We&#8217;re now going to define a system key, which will tell the model what kind of persona to adopt, and a prompt key which will tell it what to do. <a href=\"https:\/\/gist.githubusercontent.com\/shawngraham\/5224367dda62a320085858ba5c45260c\/raw\/80e02cc3b9e39e5ed21f74a646ff5117c0018462\/extract.yaml\">Here&#8217;s mine.<\/a> Notice you can use variables in your template; in mine, I have $input for the text being processed. Here&#8217;s <a href=\"https:\/\/llm.datasette.io\/en\/stable\/templates.html#additional-template-variables\">more about templates and variables<\/a>; I no doubt could be more elegant here.<\/p>\n<p>5. Now that we have a model downloaded, and a template to guide it, we&#8217;ll run it against a file. We can grab webpages and feed them directly to the model using a combination of tools such as curl and Willison&#8217;s strip-tags utility, which would be really elegant. However, I already have a folder with the information I&#8217;m after on my machine. To feed a single file to the prompt we use the <code>cat<\/code> command:<\/p>\n<p><code>$<span class=\"s1\">cat giacomo.txt | llm -m 4 -t extract &gt; outputgraph.csv`<\/span><\/code><\/p>\n<p>That one-liner says take the text file called <code>giacomo.txt<\/code> and run it through the GPT4 large language model using extract.yaml as the prompt and guide then write the output to <code>outputgraph.csv<\/code>.<\/p>\n<p>We get:<\/p>\n<p><code><br \/>\n1. (\"Giacomo Medici\", \"sells_to\", \"Sotheby\u2019s London\")<br \/>\n2. (\"Sotheby\u2019s London\", \"auctions\", \"Onesimos kylix\")<br \/>\n3. (\"Hydra Gallery\", \"sells_to\", \"Sotheby\u2019s London\")<br \/>\n4. (\"Christian Boursaud\", \"works_with\", \"Giacomo Medici\")<br \/>\n5. (\"Giacomo Medici\", \"works_with\", \"Robert Hecht\")<br \/>\n6. (\"J. Paul Getty Museum\", \"buys\", \"Onesimos kylix\")<br \/>\n7. (\"Robert Hecht\", \"buys\", \"Euphronios (Sarpedon) krater\")<br \/>\n8. (\"Giacomo Medici\", \"sells_to\", \"J. Paul Getty Museum\")<br \/>\n9. (\"Giacomo Medici\", \"controls\", \"Hydra Gallery\")<br \/>\n10. (\"Giacomo Medici\", \"obtains_from\", \"Christian Boursaud\")<br \/>\n11. (\"Medici\", \"has_posession_of\", \"Euphronios (Sarpedon) krater\")<br \/>\n12. (\"Editions Services\", \"is_instance_of\", \"ORGANIZATION\")<br \/>\n13. (\"Giacomo Medici\", \"controls\", \"Editions Services\")<br \/>\n14. (\"Sotheby\u2019s\", \"auctions\", \"looted Apulian vases\")<br \/>\n15. (\"Sotheby\u2019s\", \"sells_to\", \"J. Paul Getty Museum\")<br \/>\n16. (\"Sotheby\u2019s\", \"sells_to\", \"Metropolitan Museum of Art\")<br \/>\n17. (\"Sotheby\u2019s\", \"sells_to\", \"Cleveland Museum of Art\")<br \/>\n18. (\"Sotheby\u2019s\", \"sells_to\", \"Boston Museum of Fine Arts\")<br \/>\n19. (\"Maurice Tempelsman\", \"buys_at\", \"Sotheby\u2019s\")<br \/>\n20. (\"Shelby White\", \"buys_at\", \"Sotheby\u2019s\")<br \/>\n21. (\"George Ortiz\", \"buys_at\", \"Sotheby\u2019s\")<br \/>\n22. (\"Jos\u00e9 Luis V\u00e1rez Fisa\", \"buys_at\", \"Sotheby\u2019s\")<br \/>\n23. (\"Lawrence Fleischman\", \"buys_at\", \"Sotheby\u2019s\")<br \/>\n24. (\"Giacomo Medici\", \"sells_to\", \"Barbara Fleischman\")<br \/>\n25. (\"Medici\", \"has_possesion_of\", \"sarcophagus\")<br \/>\n26. (\"Medici\", \"has_possesion_of\", \"illegally-excavated artefacts\")<br \/>\n27. (\"Giacomo Medici\", \"sells_to\", \"Maurice Tempelsman\")<br \/>\n28. (\"Giacomo Medici\", \"sells_to\", \"Shelby White\")<br \/>\n29. (\"Giacomo Medici\", \"sells_to\", \"Leon Levy\")<br \/>\n30. (\"Giacomo Medici\", \"sells_to\", \"George Ortiz\")<br \/>\n31. (\"Giacomo Medici\", \"sells_to\", \"Jos\u00e9 Luis V\u00e1rez Fisa\")<br \/>\n32. (\"Medici\", \"donates_to\", \"J. Paul Getty\")<br \/>\n<\/code><\/p>\n<p>Ta da! A knowledge graph. To iterate over everything in our folder, we can use this one-liner:<\/p>\n<p><code>$find \/path\/to\/directory -type f -exec sh -c 'cat {} | llm -m 4 -t extract' \\; &gt;&gt; outputgraph.csv<\/code><\/p>\n<p>replacing \/path\/to\/directory of course.<\/p>\n<p>Now we\u00a0<em>could<\/em> specify that the output follow turtle rdf conventions, in which case we would also get all of the properties for the entities and relationships. To do that, we change the prompt portion of our extract.yaml:<\/p>\n<p><code>prompt: 'Extract ONLY entities as indicated in the pattern, and ONLY predicates as indicated in the pattern. Return subject, predicate, object triples, as an ontology using RDF-Turtle for the input text, using the following guidelines: 1 \u2013 Denote subjects and objects using relative hash-based hyperlinks i.e., negating the use of example.com. 2- Output response to a code-block. 3 \u2013 Place ## Turtle Start ## and ## Turtle End ## around the code within the code-block.<br \/>\nerDiagram<br \/>\nAUCTIONHOUSE ||--o{ ARTIFACT : auctions<\/code> etc.<\/p>\n<p>We could specify that the output be written as cypher statements, meaning we could load the data directly into a Neo4j database. Another important thing to note is that all interactions are logged to a local sqlite database for further use; see <a href=\"https:\/\/llm.datasette.io\/en\/stable\/logging.html\">https:\/\/llm.datasette.io\/en\/stable\/logging.html<\/a>. If you have <a href=\"https:\/\/datasette.io\/\">datasette<\/a> installed, you can explore the results of your experiments by running <code>datasette \"$(llm logs path)\"<\/code><\/p>\n<p>Handy, eh?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Simon Willison&#8217;s LLM package is a lovely little command line utility that allows you to work with many different large language models. In this post, we use LLM to extract a knowledge graph from a mermaid diagram sketch. 1. Sketch out the basics of your knowledge graph. On paper &#8211; yes, on paper! It&#8217;s quicker [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[1],"tags":[63,26,68,27],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Simon Willison&#039;s LLM Package to Extract a Knowledge Graph - X-Lab<\/title>\n<meta name=\"description\" content=\"Simon Willison&#039;s LLM package is a lovely little command line utility that allows you to work with many different large language models. In this post, we\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"shawngraham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/\",\"url\":\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/\",\"name\":\"Using Simon Willison's LLM Package to Extract a Knowledge Graph - X-Lab\",\"isPartOf\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\"},\"datePublished\":\"2023-10-24T15:40:26+00:00\",\"dateModified\":\"2023-10-24T15:40:26+00:00\",\"author\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\"},\"description\":\"Simon Willison's LLM package is a lovely little command line utility that allows you to work with many different large language models. In this post, we\",\"breadcrumb\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/carleton.ca\/xlab\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"News\",\"item\":\"https:\/\/carleton.ca\/xlab\/category\/news\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Using Simon Willison&#8217;s LLM Package to Extract a Knowledge Graph\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\",\"url\":\"https:\/\/carleton.ca\/xlab\/\",\"name\":\"X-Lab\",\"description\":\"Carleton University\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/carleton.ca\/xlab\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\",\"name\":\"shawngraham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"caption\":\"shawngraham\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Simon Willison's LLM Package to Extract a Knowledge Graph - X-Lab","description":"Simon Willison's LLM package is a lovely little command line utility that allows you to work with many different large language models. In this post, we","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/","twitter_misc":{"Written by":"shawngraham","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/","url":"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/","name":"Using Simon Willison's LLM Package to Extract a Knowledge Graph - X-Lab","isPartOf":{"@id":"https:\/\/carleton.ca\/xlab\/#website"},"datePublished":"2023-10-24T15:40:26+00:00","dateModified":"2023-10-24T15:40:26+00:00","author":{"@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb"},"description":"Simon Willison's LLM package is a lovely little command line utility that allows you to work with many different large language models. In this post, we","breadcrumb":{"@id":"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/carleton.ca\/xlab\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/carleton.ca\/xlab\/category\/news\/"},{"@type":"ListItem","position":3,"name":"Using Simon Willison&#8217;s LLM Package to Extract a Knowledge Graph"}]},{"@type":"WebSite","@id":"https:\/\/carleton.ca\/xlab\/#website","url":"https:\/\/carleton.ca\/xlab\/","name":"X-Lab","description":"Carleton University","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/carleton.ca\/xlab\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb","name":"shawngraham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","caption":"shawngraham"}}]}},"acf":{"Post Thumbnail Icon":"","Post Thumbnail":false},"_links":{"self":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/158"}],"collection":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/comments?post=158"}],"version-history":[{"count":3,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/158\/revisions"}],"predecessor-version":[{"id":166,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/158\/revisions\/166"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/media?parent=158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/categories?post=158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/tags?post=158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}