{"id":133,"date":"2023-08-10T15:41:21","date_gmt":"2023-08-10T19:41:21","guid":{"rendered":"https:\/\/carleton.ca\/xlab\/?p=133"},"modified":"2023-08-10T15:41:21","modified_gmt":"2023-08-10T19:41:21","slug":"a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade","status":"publish","type":"post","link":"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/","title":{"rendered":"A follow-up to &#8216;Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade&#8217;"},"content":{"rendered":"<p><\/p>\n<div id=\"attachment_139\" class=\"wp-caption alignright\" style=\"width: 346px\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-139 \" src=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5.png\" alt=\"\" width=\"346\" height=\"255\" srcset=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5.png 800w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5-240x177.png 240w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5-400x295.png 400w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5-160x118.png 160w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5-768x566.png 768w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/gephi-lite-5-360x266.png 360w\" sizes=\"(max-width: 346px) 100vw, 346px\" \/><p class=\"wp-caption-text\">Colours = subgroups, size = betweenness.<\/p><\/div>\n<p>Using the <a href=\"https:\/\/carleton.ca\/xlab\/2023\/mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/\">ontology crafted in the previous post<\/a> I fed 129 Trafficking Culture articles through GPT4. I used a script to pass the ontology, with<\/p>\n<blockquote><p>You are an excellent assistant with deep knowledge of research in the field of illegal and illicit antiquities. Below is an ontology structuring knowledge about the field; using the ontology exclusively, please create specific instances and data about individuals within the antiquities trade from the following encyclopedia text.<\/p><\/blockquote>\n<p>followed by the ontology as my prompt.<\/p>\n<p>This is the script to do that; note you have to make sure the texts you&#8217;re working on are about 6kb or smaller (so you need to split any that are larger into smaller pieces; you can do it by hand or by script):<\/p>\n<pre>import os\r\nimport sys\r\nimport openai\r\nimport tiktoken\r\nfrom tenacity import retry, wait_random_exponential\r\n\r\ndef num_tokens_from_string(string: str, encoding_name: str) -&gt; int:\r\n    \"\"\"Returns the number of tokens in a text string.\"\"\"\r\n    encoding = tiktoken.get_encoding(encoding_name)\r\n    num_tokens = len(encoding.encode(string))\r\n    return num_tokens\r\n\r\n\r\ndef write_results_to_file(output_filename, content):\r\n    with open(output_filename, 'w') as f:\r\n        f.write(content)\r\n\r\n\r\n# Add a delay function that waits between retries\r\n@retry(wait=wait_random_exponential(multiplier=1, max=10))\r\ndef get_openai_response(prompt, input_text):\r\n    openai.api_key = os.getenv(\"OPENAI_API_KEY\")\r\n    response = openai.ChatCompletion.create(\r\n        model=\"gpt-4\",\r\n        messages=[\r\n            {\r\n                \"role\": \"system\",\r\n                \"content\": prompt,\r\n            },\r\n            {\r\n                \"role\": \"user\",\r\n                \"content\": input_text,\r\n            },\r\n            {\r\n                \"role\": \"assistant\",\r\n                \"content\": \"\"\r\n            }\r\n        ],\r\n        temperature=0,\r\n        max_tokens=4820,\r\n        top_p=1,\r\n        frequency_penalty=0,\r\n        presence_penalty=0,\r\n        stop=[\"END \"]\r\n    )\r\n    return response\r\n\r\ndef main():\r\n    if len(sys.argv) != 3:\r\n        print(\"Usage: python your_script_name.py prompt_file.txt input_file.txt\")\r\n        sys.exit(1)\r\n\r\n    prompt_file = sys.argv[1]\r\n    input_file = sys.argv[2]\r\n\r\n    with open(prompt_file, 'r') as f:\r\n        prompt = f.read()\r\n\r\n    with open(input_file, 'r') as f:\r\n        input_text = f.read()\r\n\r\n    combined_text = prompt + input_text\r\n\r\n    encoding_name = \"cl100k_base\"\r\n\r\n    max_tokens_per_chunk = 2500\r\n    combined_tokens = num_tokens_from_string(combined_text, encoding_name)\r\n    print(combined_tokens)\r\n    if combined_tokens &gt; max_tokens_per_chunk:\r\n        print(f\"{input_file} skipped, too big!\")\r\n        return\r\n\r\n    # Single chunk is within the limit, make the API call\r\n    print (\"Working on \" + input_file)\r\n    response = get_openai_response(prompt, input_text)\r\n    result_content = response[\"choices\"][0][\"message\"][\"content\"]\r\n    output_filename = f\"result_{input_file}.txt\"\r\n    write_results_to_file(output_filename, result_content)\r\n    print(\"Done\")\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n<\/pre>\n<p>Then I joined the results together at the command line with `cat *txt &gt;&gt; output.ttl` I added the prompt text (the original ontology) to the start of the output.ttl file to make sure everything was present and accounted for. Ta da, a ttl file of the antiquities trade!<\/p>\n<p>But I wanted as csv for opening in gephi and so on. This little snippet does that:<\/p>\n<pre>import csv\r\nfrom rdflib import Graph\r\n\r\ndef ttl_to_csv(ttl_file_path, csv_file_path):\r\n    # Parse the .ttl file with rdflib\r\n    g = Graph()\r\n    g.parse(ttl_file_path, format='ttl')\r\n\r\n    # Open the csv file with write privileges\r\n    with open(csv_file_path, 'w', newline='') as csv_file:\r\n        csv_writer = csv.writer(csv_file)\r\n\r\n        # Write the header row\r\n        csv_writer.writerow([\"subject\", \"predicate\", \"object\"])\r\n\r\n        # Iterate through each triple in the graph and write to the csv file\r\n        for s, p, o in g:\r\n            csv_writer.writerow([s, p, o])\r\n<\/pre>\n<p>After that, I just needed to clean out a handful of statements where things like my predicates were declared against a namespace, that sort of thing. <a href=\"https:\/\/gist.github.com\/shawngraham\/fa7b3146f66b29c2aea96e0383c217c8\">All of which is available here.<\/a> When I run it through Ampligraph, I get a pretty good MRR score too &#8211; without a lot of the futzing that I&#8217;ve had to do previously.<\/p>\n<p>So the next thing to do is to run this process again, but against the 3k newspaper articles that we have. Pretty happy about this.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Using the ontology crafted in the previous post I fed 129 Trafficking Culture articles through GPT4. I used a script to pass the ontology, with You are an excellent assistant with deep knowledge of research in the field of illegal and illicit antiquities. Below is an ontology structuring knowledge about the field; using the ontology [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[1],"tags":[27],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>A follow-up to &#039;Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade&#039; - X-Lab<\/title>\n<meta name=\"description\" content=\"&nbsp; Using the ontology crafted in the previous post I fed 129 Trafficking Culture articles through GPT4. I used a script to pass the ontology, with You\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"shawngraham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/\",\"url\":\"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/\",\"name\":\"A follow-up to 'Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade' - X-Lab\",\"isPartOf\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\"},\"datePublished\":\"2023-08-10T19:41:21+00:00\",\"dateModified\":\"2023-08-10T19:41:21+00:00\",\"author\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\"},\"description\":\"&nbsp; Using the ontology crafted in the previous post I fed 129 Trafficking Culture articles through GPT4. I used a script to pass the ontology, with You\",\"breadcrumb\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/carleton.ca\/xlab\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"News\",\"item\":\"https:\/\/carleton.ca\/xlab\/category\/news\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"A follow-up to &#8216;Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade&#8217;\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\",\"url\":\"https:\/\/carleton.ca\/xlab\/\",\"name\":\"X-Lab\",\"description\":\"Carleton University\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/carleton.ca\/xlab\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\",\"name\":\"shawngraham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"caption\":\"shawngraham\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A follow-up to 'Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade' - X-Lab","description":"&nbsp; Using the ontology crafted in the previous post I fed 129 Trafficking Culture articles through GPT4. I used a script to pass the ontology, with You","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/","twitter_misc":{"Written by":"shawngraham","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/","url":"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/","name":"A follow-up to 'Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade' - X-Lab","isPartOf":{"@id":"https:\/\/carleton.ca\/xlab\/#website"},"datePublished":"2023-08-10T19:41:21+00:00","dateModified":"2023-08-10T19:41:21+00:00","author":{"@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb"},"description":"&nbsp; Using the ontology crafted in the previous post I fed 129 Trafficking Culture articles through GPT4. I used a script to pass the ontology, with You","breadcrumb":{"@id":"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/carleton.ca\/xlab\/2023\/a-follow-up-to-mermaid-diagram-to-ontology-via-gpt3-for-the-illicit-antiquities-trade\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/carleton.ca\/xlab\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/carleton.ca\/xlab\/category\/news\/"},{"@type":"ListItem","position":3,"name":"A follow-up to &#8216;Mermaid Diagram to Ontology via GPT3 for the illicit antiquities trade&#8217;"}]},{"@type":"WebSite","@id":"https:\/\/carleton.ca\/xlab\/#website","url":"https:\/\/carleton.ca\/xlab\/","name":"X-Lab","description":"Carleton University","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/carleton.ca\/xlab\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb","name":"shawngraham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","caption":"shawngraham"}}]}},"acf":{"Post Thumbnail Icon":"","Post Thumbnail":false},"_links":{"self":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/133"}],"collection":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/comments?post=133"}],"version-history":[{"count":2,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/133\/revisions"}],"predecessor-version":[{"id":141,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/133\/revisions\/141"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/media?parent=133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/categories?post=133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/tags?post=133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}