Skip to Content

Grad Research: Data-mining Jane Austen

Published on January 31, 2018

Before there was Mr. Big, there was Mr. Darcy.

Tall and handsome, wealthy and witty, aloof and arrogant…Fitzwilliam Darcy made a poor first impression in Jane Austen’s Pride and Prejudice before winning the affections of protagonist Elizabeth Bennet with his gentlemanliness and kindness. Elizabeth marries him for love, and preserves the family fortune in the process.

The End.

Except not really. Austen’s oeuvre has persisted, thriving through the age of aristocratic suitors, modern dating, and into the era of Tinder.

And Carleton’s Jenna Herdman’s research illustrates this. The second year English PhD has used Google NGram Viewer to track Austen mentions. It scours Google Books, and shows that Austen’s work is mentioned more often as time passes – particularly Pride and Prejudice, on a steady upward trend since about 1990.

Herdman also uses text-mining and distant reading to create data visualizations of the content of Austen’s novels. It helps students understand how they’ve structured and the techniques she’s engaging have helped help academics critique entire bodies of literature and move beyond the literary canon.

There may have been as many as 60,000 novels published in 19th Century England. Reading one per day, it would take more than 160 years to read them all. But by aggregating data on grammar and language, it’s possible to recognize patterns within the full body of work.

“Thematically, Austen novels generally focus on a female protagonist and a marriage plot,” Herdman says. “The heroine surmounts the financial difficulty of her inheritance position by settling into a marriage that, conveniently, fulfills a domestic ideal of having both romantic love and economic security. The heroine rejects the ‘wrong’ choice of husband – often defined by sexual attractiveness, but which will lead to a ruinous union – in favour of the ‘right’ choice.”

In addition to Pride and Prejudice, Herdman used Voyant – a web-based text analysis tool — to analyze patterns in the romantic rivalries in Sense and Sensibility, Northanger Abbey and Mansfield Park.

Dividing each book into 10 segments, Herdman identifies the number of mentions of each romantic rival. The resulting graph shows Darcy, the romantic hero, fluctuating alongside Wickham, who falsely accused Darcy of denying him a lucrative post before the romantic hero ultimately wins the protagonist’s heart, and dominates mentions in the novel’s conclusion.

Read the full story on the Faculty of Graduate Studies and Postdoctoral Affairs page.

Two medals on display.

Carleton Celebrates 2026 Spring Convocation Medal Winners

From June 8 to 12, Carleton University celebrates Spring Convocation as more than 6900 graduates cross the stage to receive their degrees. Among them are ...

Carleton-Led Hub Expands Support for Black Entrepreneurs

Carleton University welcomes the renewed federal support for the Black Entrepreneurship Knowledge Hub (BEKH) through March 2028, reinforcing the university’s leadership role in advancing research, ...

An artist's concept of a stock market trading screen with two charts displayed and various numbers in the background.

Carleton Experts Available – Federal Spring Economic Update

The federal Liberals will table their spring economic update today, and Carleton experts are available to comment. If you are interested in speaking with the ...