{"id":251,"date":"2024-10-18T10:18:15","date_gmt":"2024-10-18T14:18:15","guid":{"rendered":"https:\/\/carleton.ca\/xlab\/?p=251"},"modified":"2024-10-18T10:18:15","modified_gmt":"2024-10-18T14:18:15","slug":"transcribing-handwriting-with-a-multimodal-model","status":"publish","type":"post","link":"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/","title":{"rendered":"Transcribing Handwriting with a Multimodal Model"},"content":{"rendered":"<p><em>by Shawn Graham<\/em><\/p>\n<p>Legacy archaeological data is often trapped in handwritten context sheets, site diaries, and scraps of paper. Transcribing such materials so that one can work with them is time consuming and difficult. If you&#8217;re an English speaker, trained in North American cursive you&#8217;re probably better off that the average undergrad today who didn&#8217;t learn cursive&#8230; but then throw into the loop Italian cursive:<\/p>\n<div id=\"attachment_253\" class=\"wp-caption aligncenter\" style=\"width: 400px\"><img decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-253\" src=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-400x557.png\" alt=\"\" width=\"400\" height=\"557\" srcset=\"https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-400x557.png 400w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-240x334.png 240w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-160x223.png 160w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-768x1069.png 768w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-1104x1536.png 1104w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0-360x501.png 360w, https:\/\/carleton.ca\/xlab\/wp-content\/uploads\/page0.png 1174w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><p class=\"wp-caption-text\">A page from a list of stratigraphy at Gabii<\/p><\/div>\n<p>&#8230;well, that&#8217;s a challenge. Or consider the more tractable challenge &#8211; context recording sheets that certainly do structure the data, but are filled in by hand:<\/p>\n<div class=\"wp-caption aligncenter\" style=\"width: 317px\"><a href=\"https:\/\/archaeologywkas.com\/\"><img decoding=\"async\" loading=\"lazy\" class=\"\" src=\"https:\/\/archaeologywkas.com\/wp-content\/uploads\/2015\/10\/img187-744x1024.jpg\" width=\"317\" height=\"436\" \/><\/a><p class=\"wp-caption-text\">Single context recording sheet from the West Kent Archaeological Society<\/p><\/div>\n<p>This too can be difficult and time consuming. So we have been exploring what might be done with so-called &#8216;multimodal models&#8217;. These are LLMs that have had visual and\/or audio data expressed into the same world of embeddings as the associated text. Some of these models are then further fine tuned to work primarily with images. One can ask of such models, after feeding them an image, &#8216;Describe the scene&#8217; and so on.<\/p>\n<p>We&#8217;ve been experimenting with the Qwen2-VL-Instruct model (you can launch a chat interface for it via Google colab from this <a href=\"https:\/\/github.com\/camenduru\/Qwen-VL-Chat-colab\">user&#8217;s github repository<\/a>, if you&#8217;d like a go); we feed it an image, then prompt it with &#8216;extract text&#8217;. Simon Willison <a href=\"https:\/\/simonwillison.net\/2024\/Sep\/4\/\">writes about this here, too<\/a>. And it works well, but getting it working consistently, getting it to load up, has been a problem. We&#8217;re still experimenting. But we&#8217;ve also tried Google&#8217;s Gemini model, again because of Simon Willison&#8217;s <a href=\"https:\/\/simonwillison.net\/2024\/Oct\/17\/video-scraping\/\">experimentation<\/a> where he feeds it <em>video<\/em> and gets structured text out!<\/p>\n<p>Ok, so we&#8217;re not at the playing-with-archaeological-video-to-see-what-we-can-see &#8211; yet. But I did sort out a <a href=\"https:\/\/colab.research.google.com\/drive\/13XUD_a8TJ2IbvLi9o7hHNzTvK_d7d9cP?usp=sharing\">Google notebook that uses Gemini<\/a> (you&#8217;ll need an API key, available through the &#8216;secrets&#8217; button in the notebook) that will take images and transcribe them. I fed it that first page from Gabii telling it &#8216;Extract HIGH QUALITY text from the italian handwriting in the image&#8217; and it returned:<\/p>\n<pre>ELENCO DEGLI STRATI\r\n \r\nUS-0 = US-1 15\/10\/98 - da 0 e 50-60cm\r\n \r\nUS-2 = Si pu\u00f2 individuare alla [illegible] di un 60-70 cm dal piano di\r\ncampagna -\r\nSi compone di un livello terroso, con file presenze di malta\r\ndiverse, frammenti di mat. ceramici, pezzi di tegoli, pezzi di\r\npietre [illegible].\r\nSi raccolgono 6 monete di bronzo. 13\/10\r\n \r\nUS-3 = Strato terroso, friabile, di colore nerastro.\r\n(ex US-A) Tale strato si mette in luce sotto nel livello formato\r\nda US-2\r\n\u00c8 ricchissimo di frammenti di materiale ceramico, frammenti\r\ndi vetro, ossa animali, alcune monete, frammenti di\r\nmetalli da costruzione.\r\n \r\nUS-4 = Strato terroso, pi\u00f9 compatto rispetto a US-3, di colore\r\n(ex US-B) grigio-biancastro. Tale colorazione \u00e8 dovuta all'elevata\r\nconcentrazione di malta o numerose minute disfettie\r\nSi concentra in zone vicine ed alle sterne\r\nprete di US-3\r\n \r\nUS-5 = Taglio rilevato nello strato terroso [illegible] US-3\r\n(ex US-D) Il taglio, dalla forma meglio conservata ma leggermente visibile [illegible] di\r\n1 e 1,50 m.\r\n \r\nUS-6 = Riempimento taglio US-5 composto da un sedimento terroso di colore\r\n(ex US-C) marrone-grigiastro ([illegible] di malta e [illegible]).\r\n \r\nUS-7 = Forme di forma tondeggiante scavate\r\n(ex US-E) nell'US-3.\r\nDi forma si individuano solo\r\nal margine NORD.\r\n<\/pre>\n<p>Pretty slick, eh? And then, because of the way my code runs, I inadvertently gave it an extra prompt, &#8216;Please process this page&#8217; (instead of the placeholder text &#8216;insert_input_here&#8217;) and it <em>translated<\/em> the page too:<\/p>\n<pre>**US-0 = US-1**  15\/10\/98 - from 0 to 50-60cm\r\n \r\n**US-2** = It can be distinguished at the edge of a pit about 60-70 cm from the wall of the \"muro\" (wall).\r\nComposed of a loose layer, with the presence of various mortars, fragments of ceramic material, pieces of tiles, pieces of green glass.\r\n6 bronze coins were collected. 13\/10\r\n \r\n**US-3** (ex US-A) = Loose, brittle, dark-colored layer.\r\nThis layer is visible just below the level formed by US-2.\r\n \r\nIt is rich in fragments of ceramic material, glass fragments, animal bones, some coins, fragments of construction materials.\r\n \r\n**US-4** (ex US-B) = Compact layer, more compact than US-3, greyish-whitish in color. This coloring is due to the high concentration of mortar or various dispersed materials.\r\nIt can be found in small areas near and under US-3\r\n \r\n**US-5** (ex US-D) = Cut located in the earthen layer from north-west to US-3\r\nThe cut, of better defined but slightly curved shape, measures about 1 by 1.50 m\r\n \r\n**US-6** (ex US-C) = Filling of cut US-5, made up of a grayish-greenish earthen deposit (mortar remains).\r\n \r\n**US-7** (ex US-E) = Traces of a rounded shape dug into US-3\r\nThis trace is only visible on the NORTH edge\r\n\r\n<\/pre>\n<p>I fed it the context sheet. It returned:<\/p>\n<pre>Grid Square(s):\r\n\r\nDEPOSIT\r\n1. Compaction: FRIABLE\r\n2. Colour: BROWNISH BLACK\r\n3. Composition \/ Particle size (over 10%): SANDY SILT\r\n4. Inclusions (under 10%): GREY CLAY, OCCASIONAL TUFA, MODERATE\r\n5. Thickness &amp; extent: 0.6m x 1.3m x\r\n6. Other comments: POSS: ANIMAL BURROWS PULLING TOPSOIL INTO CONTEXT. SIMILAR TO (3001)\r\n7. Method &amp; conditions: TROWEL - WARM &amp; SUNNY\r\n\r\nSite Code: KEP (12)\r\nContext: 3058\r\nArea\/Section: TRENCH 3\r\nContext type: DEPOSIT\r\n\r\nCUT\r\n1. Shape in plan\r\n2. Corners\r\n3. Dimensions\/Depth\r\n4. Break of slope- top\r\n5. Sides\r\n6. Break of slope- base\r\n7. Base\r\n8. Orientation\r\n9. Inclination of axis\r\n10. Truncated (if known)\r\n11. Fill nos\r\n12. Other comments\r\nDraw profile overleaf\r\n\r\nStratigraphic matrix\r\n3001 \r\nThis context is 3058\r\n3057  3059\r\n\r\nYour interpretation: Internal\r\nYour discussion: Primary fill of 3057. Very similar to (3001) but with more density of finds and small irregular lumps of tufa\r\n\r\nContext same as: (X)\r\nPlan nos: P\r\nInitials &amp; date: KF 26\/6\/12\r\n\r\nFinds (tick)\r\nBone \u2713 Glass \u2713\r\n\r\nLevels on reverse\r\nHighest:\r\nLowest:\r\n\r\nEnvironmental samples\r\nSample nos &amp; type:\r\n\r\nFinds Sleving: on site \u2610\r\n\r\nChecked interpretation:\r\n<\/pre>\n<p>I then swapped out the secondary placeholder prompt with &#8216;Return results suitable for CSV&#8217; and lo! It duly did. I also cooked up a prompt asking it just to extract the stratigraphic relationships recorded on the sheet. This took a bit more jiggery pokery, but it also worked (and for extra fun, could be returned as a cypher query for integration of data into a graph database).<\/p>\n<p>Anyway, not the most elegant code, and we&#8217;ll want to experiment with other multimodal models since it would be best if all of this could be run on a local machine, but holy moly this is a useful thing.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>by Shawn Graham Legacy archaeological data is often trapped in handwritten context sheets, site diaries, and scraps of paper. Transcribing such materials so that one can work with them is time consuming and difficult. If you&#8217;re an English speaker, trained in North American cursive you&#8217;re probably better off that the average undergrad today who didn&#8217;t [&hellip;]<\/p>\n","protected":false},"author":6,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_relevanssi_hide_post":"","_relevanssi_hide_content":"","_relevanssi_pin_for_all":"","_relevanssi_pin_keywords":"","_relevanssi_unpin_keywords":"","_relevanssi_related_keywords":"","_relevanssi_related_include_ids":"","_relevanssi_related_exclude_ids":"","_relevanssi_related_no_append":"","_relevanssi_related_not_related":"","_relevanssi_related_posts":"","_relevanssi_noindex_reason":"","_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[1],"tags":[94,96,68,95],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Transcribing Handwriting with a Multimodal Model - X-Lab<\/title>\n<meta name=\"description\" content=\"by Shawn Graham Legacy archaeological data is often trapped in handwritten context sheets, site diaries, and scraps of paper. Transcribing such materials\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"shawngraham\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/\",\"url\":\"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/\",\"name\":\"Transcribing Handwriting with a Multimodal Model - X-Lab\",\"isPartOf\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\"},\"datePublished\":\"2024-10-18T14:18:15+00:00\",\"dateModified\":\"2024-10-18T14:18:15+00:00\",\"author\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\"},\"description\":\"by Shawn Graham Legacy archaeological data is often trapped in handwritten context sheets, site diaries, and scraps of paper. Transcribing such materials\",\"breadcrumb\":{\"@id\":\"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/carleton.ca\/xlab\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"News\",\"item\":\"https:\/\/carleton.ca\/xlab\/category\/news\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Transcribing Handwriting with a Multimodal Model\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#website\",\"url\":\"https:\/\/carleton.ca\/xlab\/\",\"name\":\"X-Lab\",\"description\":\"Carleton University\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/carleton.ca\/xlab\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb\",\"name\":\"shawngraham\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g\",\"caption\":\"shawngraham\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Transcribing Handwriting with a Multimodal Model - X-Lab","description":"by Shawn Graham Legacy archaeological data is often trapped in handwritten context sheets, site diaries, and scraps of paper. Transcribing such materials","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/","twitter_misc":{"Written by":"shawngraham","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/","url":"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/","name":"Transcribing Handwriting with a Multimodal Model - X-Lab","isPartOf":{"@id":"https:\/\/carleton.ca\/xlab\/#website"},"datePublished":"2024-10-18T14:18:15+00:00","dateModified":"2024-10-18T14:18:15+00:00","author":{"@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb"},"description":"by Shawn Graham Legacy archaeological data is often trapped in handwritten context sheets, site diaries, and scraps of paper. Transcribing such materials","breadcrumb":{"@id":"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/carleton.ca\/xlab\/2024\/transcribing-handwriting-with-a-multimodal-model\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/carleton.ca\/xlab\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/carleton.ca\/xlab\/category\/news\/"},{"@type":"ListItem","position":3,"name":"Transcribing Handwriting with a Multimodal Model"}]},{"@type":"WebSite","@id":"https:\/\/carleton.ca\/xlab\/#website","url":"https:\/\/carleton.ca\/xlab\/","name":"X-Lab","description":"Carleton University","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/carleton.ca\/xlab\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/e8707158a71e77734ea13346b6e46feb","name":"shawngraham","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/carleton.ca\/xlab\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/1b4be5c0f305aa12c7a3dd75ae5c731e?s=96&d=mm&r=g","caption":"shawngraham"}}]}},"acf":{"Post Thumbnail Icon":"","Post Thumbnail":false},"_links":{"self":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/251"}],"collection":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/comments?post=251"}],"version-history":[{"count":1,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/251\/revisions"}],"predecessor-version":[{"id":254,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/posts\/251\/revisions\/254"}],"wp:attachment":[{"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/media?parent=251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/categories?post=251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/carleton.ca\/xlab\/wp-json\/wp\/v2\/tags?post=251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}