{"id":167,"date":"2023-10-25T13:44:03","date_gmt":"2023-10-25T17:44:03","guid":{"rendered":"https:\/\/carleton.ca\/xlab\/?p=167"},"modified":"2023-10-25T14:06:52","modified_gmt":"2023-10-25T18:06:52","slug":"further-adventures-with-llm-gpt4all-and-templates","status":"publish","type":"post","link":"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/","title":{"rendered":"Further Adventures with LLM-GPT4All and Templates"},"content":{"rendered":"<div id=\"attachment_168\" class=\"wp-caption alignleft\" style=\"width: 240px\"><img decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-168\" src=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-240x206.png\" alt=\"\" width=\"240\" height=\"206\" srcset=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-240x206.png 240w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-400x343.png 400w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-160x137.png 160w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-768x658.png 768w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-1536x1317.png 1536w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM-360x309.png 360w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/Screen-Shot-2023-10-25-at-9.59.54-AM.png 1920w\" sizes=\"(max-width: 240px) 100vw, 240px\" \/><p class=\"wp-caption-text\">Model results turned into .gexf using another series of prompts and then visualized in gephi lite<\/p><\/div>\n<p>Yesterday Simon Willison updated the LLM-GPT4All plugin which has permitted me to download several large language models to explore how they work and how we could work with the LLM package to use templates to guide our knowledge graph extraction.<\/p>\n<p>For instance, using GPT4, we could pipe a text file with information in it through the model and give it this instruction:<br \/>\n<code>cat giacomo.txt | llm -m 4 'You are an excellent natural language processor trained on data relating to the antiquities trade. Extract entities and relationships and return them as [subject],[predicate],[object] triples'<\/code><\/p>\n<p>This duly sorts things out, grabbing the relevant phrases, but we want <em>structured<\/em> output- hence the example template from <a href=\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/\">the previous post<\/a>. By giving very clear instructions, and a bit of the ol&#8217; prompt engineering magic, you end up with something very close to what you want.<\/p>\n<p>The tricky thing is that different models expect the template in different ways. Some models can take a &#8216;system&#8217; prompt, which gives the model a kind of persona or area of its model data to zone in on, and then a &#8216;prompt&#8217; that tells it exactly what to do. These can have variables, like this:<\/p>\n<p><code>system: 'You are an excellent natural language processor in the domain of the antiquities trade. Take a step back and consider the critical information presented to you in the $input.'<br \/>\nprompt: 'Extract the most salient ENTITIES and use ONLY predicates from the $example<\/code><\/p>\n<p>or this:<\/p>\n<p><code>system: You speak like an excitable Victorian adventurer<br \/>\nprompt: 'Summarize this: $input'<\/code><\/p>\n<p>Models like GPT4 can do that. Say that template was called &#8216;victorian.yaml&#8217;. You&#8217;d invoke the template like so:<\/p>\n<p><code>cat giacomo.txt | llm -m4 -t victorian<\/code><\/p>\n<p>The text file on the left of the | character is the $input for the prompt. (Although I think you should be able to do it like this: <code>llm -m 4 -t victorian -p input giacomo.txt<\/code> but that doesn&#8217;t seem to work and I don&#8217;t understand why). Result:<\/p>\n<blockquote><p>Oh, my dear friend, listen well as I recount this most insidious tale! Our story hinges upon a gentleman named Giacomo Medici, an Italian purveyor of antiquities. Alas, not a man of honour, for he was found guilty in the year of our Lord 2005 of the most heinous of crimes! Namely, handling stolen goods, exporting possessions unlawfully, and hatching schemes to traffick.<\/p><\/blockquote>\n<p>Other models, like <a href=\"https:\/\/huggingface.co\/NousResearch\/Nous-Hermes-Llama2-13b\">Nous Hermes<\/a>, follow this kind of template:<\/p>\n<p><code>### Instruction:<br \/>\n### Input:<br \/>\n### Response:<\/code><\/p>\n<p>So for those, you end up with this kind of .yaml template:<\/p>\n<p><code><br \/>\nprompt: &gt;<br \/>\n### Instruction:<br \/>\nYou are an excellent natural language processor in the domain of the antiquities trade. Here is an ontology for this domain:<br \/>\nAUCTIONHOUSE ||--o{ ARTIFACT : auctions<br \/>\nAUCTIONHOUSE ||--o{ PERSON: sells_to<br \/>\nAUCTIONHOUSE ||--o{ MUSEUM: sells_to<br \/>\nART_WORK ||--o{ ARTIFACT: is_instance_of<br \/>\nORGANIZATION ||--|{ GOVERNMENT_AGENCY: is_instance_of<br \/>\nORGANIZATION ||--|{ GALLERY: is_instance_of<br \/>\nGOVERNMENT_AGENCY ||--O{ ARTIFACT: repatriates<br \/>\nGALLERY ||--o{ ARTIFACT: has_possesion_of<br \/>\nMUSEUM ||--o{ ARTIFACT: has_possesion_of<br \/>\nPERSON ||--o{ ARTIFACT: has_possesion_of<br \/>\nPERSON ||--o{ ARTIFACT: buys<br \/>\nPERSON ||--o{ MUSEUM: donates_to<br \/>\nPERSON ||--o{ PERSON: works_with<br \/>\nPERSON ||--o{ ORGANIZATION: employed_by<br \/>\nPERSON ||--o{ ORGANIZATION: controls<br \/>\nPERSON ||--o{ PERSON: spouse_of<br \/>\nPERSON ||--o{ AUCTIONHOUSE: buys_at<br \/>\nPERSON ||--o{ PERSON: obtains_from<br \/>\nPERSON ||--o{ ARTIFACT: has_possesion_of<br \/>\nPERSON ||--o{ ARTIFACT: stole<br \/>\nExtract entities and relationships as [subject],[predicate],[object] triples from the $input.<br \/>\nHere is an example of the desired output: \"Joe Smith purchased the Agathobulus Painter Vase before selling it to the Ottawa Art Gallery\"<br \/>\nResult: [Joe Smith],[sells_to],[Ottawa Art Gallery]<br \/>\n[Joe Smith],[buys],[Agathobulus Painter Vase]<br \/>\n### Response:<br \/>\n<\/code><\/p>\n<p>Right now, this particular model kinda misses the objective. We try to invoke it, <code>cat giacomo.txt | llm -m nous-hermes-llama2-13b -t extract-nous<\/code> but the first time I ran it I forgot to add $input in the prompt, instead saying &#8216;provided text&#8217; and so got this:<\/p>\n<p><code> \"The Metropolitan Museum of Art repatriated a collection of artifacts to the Namibian government\" Result: [Metropolitan Museum of Art],[repatriates],[Namibian government]<\/code><\/p>\n<p>&#8230;It is fully making up stuff because it didn&#8217;t read the input text. Bah. Fixing that error and we get:<\/p>\n<p><code>[\"Giacomo Medici\", \"is_instance_of\", \"ARTIFACT\"]<br \/>\n[\"Antiquaria Romana\", \"controls\", \"Giacomo Medici\"]<br \/>\n[\"Hydra Gallery\", \"works_with\", \"Christian Boursaud\"] <\/code><\/p>\n<p>Almost. Almost. So confused, poor wee model.<\/p>\n<p>I used that same template with the much smaller mistral-7b-instruct-v0 model and got something closer:<\/p>\n<p><code>1. Giacomo Medici ||--o{ ARTIFACT : deals_in<br \/>\n2. Rome ||--o{ AUCTIONHOUSE : has_possesion_of<br \/>\n3. July 1967 ||--o{ MEDICI : convicted_of<br \/>\n4. Italy ||--o{ PERSON : sells_to<br \/>\n5. December 1971 ||--o{ MEDICI : buys<br \/>\n6. Switzerland ||--o{ ARTIFACT : sold_to<br \/>\n7. Robert Hecht ||--o{ MEDICI : supplies_antiquities_to<\/code><\/p>\n<p>Which is sometimes almost right. This output could be turned into a mermaid diagram, I suppose.<\/p>\n<p>Anyway, there are other models to try, but so far, gpt4 is winning hands-down. Which is too bad, because I&#8217;d rather not pay for access.<\/p>\n<p>Post script<\/p>\n<p>In an earlier post, I used gpt3 to create an rdf schema from a mermaid diagram. We can use that rdf schema easily with LLM and say gpt4 like this. First, we create a new template:<\/p>\n<p><code>system: You are an excellent assistant with deep knowledge of research in the field of illegal and illicit antiquities.<br \/>\nprompt: $example is an ontology structuring knowledge about the field; using the ontology exclusively, please create specific instances and data about individuals within the antiquities trade from the following encyclopedia $input.<\/code><\/p>\n<p>Then you can pass our ttl rdf ontology to the prompt and model like this:<\/p>\n<p><code>cat giacomo.txt | llm -m 4 -t ontology -p example antiquities.ttl <\/code><\/p>\n<p>which starts generating output like this:<\/p>\n<p><code>@prefix : &lt;http:\/\/www.semanticweb.org\/illicit_antiquities_ontology#&gt; .<br \/>\n@prefix xsd: &lt;http:\/\/www.w3.org\/2001\/XMLSchema#&gt; .<br \/>\n:Giacomo_Medici a :Individual ;<br \/>\n:name \"Giacomo Medici\"^^xsd:string ;<br \/>\n:nationality \"Italy\"^^xsd:string ;<br \/>\n:profession :Antiquities_Dealer ;<br \/>\n:criminal_record<br \/>\n[ a :Criminal_Record ;<br \/>\n:year_of_conviction \"2005\"^^xsd:int ;<br \/>\n:charges \"Receiving stolen goods, illegal export of goods, conspiracy to traffic\"^^xsd:string ] ;<br \/>\n:started_profession \"1960s\"^^xsd:decade .<\/code><\/p>\n<p>which we can dump into something like this: https:\/\/www.ldf.fi\/service\/rdf-grapher and end up with this:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-172 size-full\" src=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher.png\" alt=\"\" width=\"4385\" height=\"1781\" srcset=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher.png 4385w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-240x97.png 240w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-400x162.png 400w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-160x65.png 160w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-768x312.png 768w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-1536x624.png 1536w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-2048x832.png 2048w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/rdf-grapher-360x146.png 360w\" sizes=\"(max-width: 4385px) 100vw, 4385px\" \/><\/p>\n<p>So, if we use the <a href=\"https:\/\/carleton.ca\/xlab\/2023\/using-simon-willisons-llm-package-to-extract-a-knowledge-graph\/\">one-liner from the previous post<\/a> we can use GPT4 at least to process raw text about the antiquities trade and end up with the knowledge graph.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday Simon Willison updated the LLM-GPT4All plugin which has permitted me to download several large language models to explore how they work and how we could work with the LLM package to use templates to guide our knowledge graph extraction. For instance, using GPT4, we could pipe a text file with information in it through [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[1],"tags":[26,27],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Further Adventures with LLM-GPT4All and Templates - X-Lab<\/title>\n<meta name=\"description\" content=\"Yesterday Simon Willison updated the LLM-GPT4All plugin which has permitted me to download several large language models to explore how they work and how\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"shawngraham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/\",\"url\":\"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/\",\"name\":\"Further Adventures with LLM-GPT4All and Templates - X-Lab\",\"isPartOf\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\"},\"datePublished\":\"2023-10-25T17:44:03+00:00\",\"dateModified\":\"2023-10-25T18:06:52+00:00\",\"author\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\"},\"description\":\"Yesterday Simon Willison updated the LLM-GPT4All plugin which has permitted me to download several large language models to explore how they work and how\",\"breadcrumb\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/carleton.ca\/xlab\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"News\",\"item\":\"https:\/\/carleton.ca\/xlab\/category\/news\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Further Adventures with LLM-GPT4All and Templates\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\",\"url\":\"https:\/\/carleton.ca\/xlab\/\",\"name\":\"X-Lab\",\"description\":\"Carleton University\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/carleton.ca\/xlab\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\",\"name\":\"shawngraham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"caption\":\"shawngraham\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Further Adventures with LLM-GPT4All and Templates - X-Lab","description":"Yesterday Simon Willison updated the LLM-GPT4All plugin which has permitted me to download several large language models to explore how they work and how","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/","twitter_misc":{"Written by":"shawngraham","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/","url":"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/","name":"Further Adventures with LLM-GPT4All and Templates - X-Lab","isPartOf":{"@id":"https:\/\/carleton.ca\/xlab\/#website"},"datePublished":"2023-10-25T17:44:03+00:00","dateModified":"2023-10-25T18:06:52+00:00","author":{"@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb"},"description":"Yesterday Simon Willison updated the LLM-GPT4All plugin which has permitted me to download several large language models to explore how they work and how","breadcrumb":{"@id":"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/carleton.ca\/xlab\/2023\/further-adventures-with-llm-gpt4all-and-templates\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/carleton.ca\/xlab\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/carleton.ca\/xlab\/category\/news\/"},{"@type":"ListItem","position":3,"name":"Further Adventures with LLM-GPT4All and Templates"}]},{"@type":"WebSite","@id":"https:\/\/carleton.ca\/xlab\/#website","url":"https:\/\/carleton.ca\/xlab\/","name":"X-Lab","description":"Carleton University","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/carleton.ca\/xlab\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb","name":"shawngraham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","caption":"shawngraham"}}]}},"acf":{"Post Thumbnail Icon":"","Post Thumbnail":false},"_links":{"self":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/167"}],"collection":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/comments?post=167"}],"version-history":[{"count":3,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/167\/revisions"}],"predecessor-version":[{"id":173,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/167\/revisions\/173"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/media?parent=167"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/categories?post=167"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/tags?post=167"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}