<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Encoding History]]></title><description><![CDATA[I am an aspiring coder and historian excited to explore the world of digital humanities. They say that there is no better way to learn than to teach!]]></description><link>https://encodinghistory.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1636700221444/2kWolOLMn.png</url><title>Encoding History</title><link>https://encodinghistory.com</link></image><generator>RSS for Node</generator><lastBuildDate>Fri, 17 Apr 2026 01:02:50 GMT</lastBuildDate><atom:link href="https://encodinghistory.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Using Python to trace threads through history]]></title><description><![CDATA[This blog post has two primary goals:

Consider the merits of leveraging programming as an additional research method for analyzing historical sources and writing object-oriented narratives
Share preliminary findings (code, graphical outputs, and res...]]></description><link>https://encodinghistory.com/using-python-to-trace-threads-through-history</link><guid isPermaLink="true">https://encodinghistory.com/using-python-to-trace-threads-through-history</guid><category><![CDATA[history]]></category><category><![CDATA[#data visualisation]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[python beginner]]></category><dc:creator><![CDATA[Natasha]]></dc:creator><pubDate>Thu, 28 Oct 2021 06:49:43 GMT</pubDate><content:encoded><![CDATA[<p>This blog post has two primary goals:</p>
<ul>
<li>Consider the <strong>merits of leveraging programming as an additional research method</strong> for analyzing historical sources and writing object-oriented narratives</li>
<li>Share <strong>preliminary findings</strong> (code, graphical outputs, and results) that supplement an early modern exploration of the pearl</li>
</ul>
<h3 id="heading-project-background">Project Background</h3>
<p>As a technology consultant, I learned to leverage programming to query, process, and evaluate complex data. This work galvanized my desire to integrate technical, digital, and computational approaches with my historical practice. </p>
<p>While there are multitudinous ways to use coding to supplement humanities, my knowledge of <strong>object- or lens-based histories </strong> provided motivation and direction. I decided to focus on evaluating the use of  <strong>Python to help trace transient threads through history.</strong> Thus, I have built upon an object-oriented piece I had already researched regarding early modern pearls. My understanding of the primary source materials and historical landscape allowed me to quickly consider how programming could augment a historical investigation and examine how I (and others!) could leverage technical research and analysis methods in future works. </p>
<h3 id="heading-findings-overview">Findings Overview</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1635792292833/Uu5QLNYAU.gif" alt="ezgif.com-gif-maker (2).gif" /></p>
<p>For this object-oriented investigation, I <strong>built scalable Python functions</strong> using <code>pandas</code>, <code>nltk</code>, <code>matplotlib</code>, <code>seaborn</code>, and <code>plotly</code> that perform textual analysis, recognize object frequency and placement within an imported file, and produce interactive histograms, scatter plots, and other charts to visualize results. 
I used functions, blocks of instructions that produce a desired outcome, in order to create repeatable programs to analyze text. Each function intakes a text file or source via parameters, ie information that can be passed back through the function when it is called. For each function, I've <strong>shared the explanation, code, and output below</strong>. <a target="_blank" href="https://encodinghistory.com/creating-scalable-and-repeatable-functions-to-augment-historical-research">See this post for details on digital history sources</a> and <a target="_blank" href="https://encodinghistory.com/creating-scalable-and-repeatable-functions-to-augment-historical-research">this post for details on building functions</a>. </p>
<h3 id="heading-functions-intentions-and-code-results">Functions (Intentions and Code Results)</h3>
<p><strong> 1. frequency_all_words_graph(filename, title_of_file, author_date, number_of_words): </strong></p>
<p>Outputs bar graph depicting the most frequently used word in any inputted text file (saved in the same environment). Stopwords are not counted as they are removed through pre-built clean-up <code>file_function</code>.</p>
<p>Parameters:</p>
<ul>
<li><code>filename</code>: primary source file that function analyzes for most frequently used word; saved as a .txt file in the same environment and cleaned up through pre-built function.  <a target="_blank" href="https://encodinghistory.com/creating-scalable-and-repeatable-functions-to-augment-historical-research">See this post for details on pre-building a function to open, read, and clean up .txt historical file types </a> </li>
<li><code>title_of_file</code>: enter the title of the primary source file as a string. This string element populates a portion of the <code>title()</code> function from the matplotlib library that sets the heading for the bar-plot</li>
<li><code>author_date</code>: enter bibliographic information (author, editors, translators, dates, etc.) of the primary source file as a string (i.e. <code>('Thomas Coryate, 1611')</code>. This string element populates a portion of the <code>title()</code> function from the matplotlib library that sets the sub-heading for the bar-plot</li>
<li><code>number_of_words</code>: n in <code>most_common([n])</code> which returns a list of top 'n' elements from most common to least common. If n is omitted or None, most_common() returns all elements in the counter and will error out. For this bar-plot function, the 'n' of 'number_of_words' impacts quantity of words included in final graph (dictates whether the plot displays the top 20, 50, or 100 most frequently used words)</li>
</ul>
<p>The function allows me to start exploring the text, the authors' tone, and the primary topics. The updatable parameters and visual end result make this function a quick way to engage with new source materials. After naming and defining the <code>frequency_all_words_graph</code> function (or any function), it is best practice to import any necessary libraries/modules. This includes calling other function that you have built previously. </p>
<p><strong><em>My Code:</em></strong></p>
<pre><code>def frequency_all_words_graph(filename, title_of_file, author_date, number_of_words):
    <span class="hljs-keyword">from</span> clean_up_text_function <span class="hljs-keyword">import</span> <span class="hljs-title">file_function</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">nltk</span>.<span class="hljs-title">probability</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">FreqDist</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">pandas</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">pd</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">seaborn</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">sns</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">collections</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">Counter</span>

<span class="hljs-comment">//call pre-built file_function to open, read, and clean up text file</span>
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> <span class="hljs-title">file_function</span>(<span class="hljs-title">filename</span>)

<span class="hljs-comment">//create frequency distribution DataFrame with # of words specified</span>
    <span class="hljs-title">text_cnt</span> <span class="hljs-operator">=</span> <span class="hljs-title">FreqDist</span>(<span class="hljs-title">text_final</span>)
    <span class="hljs-title">common_words</span> <span class="hljs-operator">=</span> <span class="hljs-title">text_cnt</span>.<span class="hljs-title">most_common</span>(<span class="hljs-title">number_of_words</span>)
    <span class="hljs-title">common_words</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span>(<span class="hljs-title">common_words</span>, <span class="hljs-title">columns</span> <span class="hljs-operator">=</span> [<span class="hljs-string">'Words'</span>, <span class="hljs-string">'Counts'</span>])

<span class="hljs-comment">//format seaborn barplot    </span>
    <span class="hljs-title">sns</span>.<span class="hljs-title">set</span>()
    <span class="hljs-title">sns</span>.<span class="hljs-title">color_palette</span>(<span class="hljs-string">"husl"</span>, 8)
    <span class="hljs-title">plt</span>.<span class="hljs-title">figure</span>(<span class="hljs-title">figsize</span><span class="hljs-operator">=</span>(10,8)) 
    <span class="hljs-title">sns</span>.<span class="hljs-title">barplot</span>(<span class="hljs-title">y</span><span class="hljs-operator">=</span> <span class="hljs-string">"Words"</span>, <span class="hljs-title">x</span> <span class="hljs-operator">=</span> <span class="hljs-string">"Counts"</span>, <span class="hljs-title">data</span> <span class="hljs-operator">=</span><span class="hljs-title">common_words</span>)
    <span class="hljs-title">plt</span>.<span class="hljs-title">title</span>(<span class="hljs-string">'Most Frequent '</span> <span class="hljs-operator">+</span> <span class="hljs-title">str</span>(<span class="hljs-title">number_of_words</span>) <span class="hljs-operator">+</span> 
             <span class="hljs-string">' Words in '</span> <span class="hljs-operator">+</span> <span class="hljs-title">title_of_file</span> <span class="hljs-operator">+</span> <span class="hljs-string">'\n'</span> <span class="hljs-operator">+</span> <span class="hljs-string">'by '</span> <span class="hljs-operator">+</span> <span class="hljs-title">author_date</span>, <span class="hljs-title">fontsize</span><span class="hljs-operator">=</span>12)
    <span class="hljs-title">plt</span>.<span class="hljs-title">show</span>()

<span class="hljs-comment">//to call, untab line or start new page, type function name, and</span>
<span class="hljs-comment">//input the parameters in parenthesis, example INPUT:</span>
<span class="hljs-title">frequency_all_words_graph</span>(<span class="hljs-string">'coryat_crudities.txt'</span>, <span class="hljs-string">'Coryats Crudities'</span>,
 <span class="hljs-string">'Thomas Coryate, 1611'</span>, 20)
</code></pre><p><strong><em>Example Results</em></strong> (with <em>Coryats Crudities</em>, Thomas Coryate, 1611 as the text input):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1635401579931/TqAJ8wuA_.png" alt="Screen Shot 2021-10-27 at 11.12.47 PM.png" /></p>
<p><strong>2. frequency_gem_graph(filename, title_of_file, author_date):</strong></p>
<p>The function produces a graphic representation of how many times different gems (diamonds, sapphires, pearls, etc.) are mentioned in an inputted text.</p>
<p>Parameters: </p>
<ul>
<li><code>filename</code>: primary source file that function analyzes for frequency of gem mentions; file should be saved as a .txt file in the same environment</li>
<li><code>title_of_file</code>: enter the title of the primary source file as a string. This string element populates a portion of the <code>title()</code> function from the matplotlib library that sets the heading for the bar-plot</li>
<li><p><code>author_date</code>: enter bibliographic information (author, editors, translators, dates, etc.) of the primary source file as a string (i.e. <code>('Jean-Baptiste Tavernier, 1678')</code>. This string element populates a portion of the <code>title()</code> function from the matplotlib library that sets the sub-heading for the bar-plot</p>
<p>This function probes into whether pearls hold unique relevance in the text and determines what other gems are discussed and/or mentioned frequently. This program could easily be updated to examine another object (by augmenting the gem-centric  <code>text_final</code> list and DataFrame).</p>
</li>
</ul>
<p><strong><em>My Code:</em></strong></p>
<pre><code>def frequency_gem_graph(filename, title_of_file, author_date):
    <span class="hljs-keyword">from</span> clean_up_text <span class="hljs-keyword">import</span> <span class="hljs-title">file_function</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">nltk</span>.<span class="hljs-title">probability</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">FreqDist</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">pandas</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">pd</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">seaborn</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">sns</span>

<span class="hljs-comment">//call pre-built function to clean up filename</span>
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> <span class="hljs-title">file_function</span>(<span class="hljs-title">filename</span>)

<span class="hljs-comment">//augment text_final so output considers both  "pearl" and "pearls" = pearl</span>
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> [<span class="hljs-title">i</span>.<span class="hljs-title">replace</span>(<span class="hljs-string">'pearls'</span>, <span class="hljs-string">'pearl'</span>) <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">text_final</span>]
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> [<span class="hljs-title">i</span>.<span class="hljs-title">replace</span>(<span class="hljs-string">'emeralds'</span>, <span class="hljs-string">'emerald'</span>) <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">text_final</span>]
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> [<span class="hljs-title">i</span>.<span class="hljs-title">replace</span>(<span class="hljs-string">'diamonds'</span>, <span class="hljs-string">'diamond'</span>) <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">text_final</span>]
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> [<span class="hljs-title">i</span>.<span class="hljs-title">replace</span>(<span class="hljs-string">'sapphires'</span>, <span class="hljs-string">'sapphire'</span>) <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">text_final</span>]
    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> [<span class="hljs-title">i</span>.<span class="hljs-title">replace</span>(<span class="hljs-comment">//etc... complete with rest of gems in gems list</span>

<span class="hljs-comment">//next, make frequency distribution for gems (note this function could</span>
<span class="hljs-comment">//be used to analyze other objects by updating the augmentation and list types).</span>
    <span class="hljs-title">gems</span> <span class="hljs-operator">=</span> [<span class="hljs-string">'pearl'</span>, <span class="hljs-string">'emerald'</span>, <span class="hljs-string">'diamond'</span>, <span class="hljs-string">'sapphire'</span>, <span class="hljs-string">'ruby'</span>, <span class="hljs-string">'jewel'</span>, <span class="hljs-string">'gem'</span>, <span class="hljs-string">'coral'</span>, <span class="hljs-string">'gem'</span>,
 <span class="hljs-string">'turquoise'</span>, <span class="hljs-string">'jade'</span>, <span class="hljs-string">'amethyst'</span>, <span class="hljs-string">'topaz'</span>, <span class="hljs-string">'opal'</span>, <span class="hljs-string">'ivory'</span>, <span class="hljs-string">'amber'</span>, 
<span class="hljs-string">'catseye'</span>, <span class="hljs-string">'alexandrite'</span>, <span class="hljs-string">'garnet'</span>, <span class="hljs-string">'peridot'</span>, <span class="hljs-string">'mother-of-pearl'</span>]
    <span class="hljs-title">gem_list</span> <span class="hljs-operator">=</span> [<span class="hljs-title">w</span> <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">w</span> <span class="hljs-title">in</span> <span class="hljs-title">text_final</span> <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">w</span> <span class="hljs-title">in</span> <span class="hljs-title">gems</span>]
    <span class="hljs-title">all_fdist</span> <span class="hljs-operator">=</span> <span class="hljs-title">FreqDist</span>(<span class="hljs-title">gem_list</span>)
    <span class="hljs-title">all_fdist</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">Series</span>(<span class="hljs-title">dict</span>(<span class="hljs-title">all_fdist</span>)).<span class="hljs-title">sort_values</span>(<span class="hljs-title">ascending</span><span class="hljs-operator">=</span><span class="hljs-title">False</span>)

<span class="hljs-comment">//format seaborn barplot</span>
    <span class="hljs-title">all_plot</span> <span class="hljs-operator">=</span> <span class="hljs-title">sns</span>.<span class="hljs-title">barplot</span>(<span class="hljs-title">x</span><span class="hljs-operator">=</span><span class="hljs-title">all_fdist</span>.<span class="hljs-title">index</span>, <span class="hljs-title">y</span><span class="hljs-operator">=</span><span class="hljs-title">all_fdist</span>.<span class="hljs-title">values</span>, <span class="hljs-title">ax</span><span class="hljs-operator">=</span><span class="hljs-title">ax</span>)
    <span class="hljs-title">sns</span>.<span class="hljs-title">set</span>()
    <span class="hljs-title">sns</span>.<span class="hljs-title">color_palette</span>(<span class="hljs-string">"husl"</span>, 8)
    <span class="hljs-title">fig</span>,<span class="hljs-title">ax</span> <span class="hljs-operator">=</span> <span class="hljs-title">plt</span>.<span class="hljs-title">subplots</span>(<span class="hljs-title">figsize</span><span class="hljs-operator">=</span>(8,8))
    <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">p</span> <span class="hljs-title">in</span> <span class="hljs-title">all_plot</span>.<span class="hljs-title">patches</span>:
        <span class="hljs-title">all_plot</span>.<span class="hljs-title">annotate</span>(<span class="hljs-title">format</span>(<span class="hljs-title">p</span>.<span class="hljs-title">get_height</span>(), <span class="hljs-string">'.0f'</span>), 
                   (<span class="hljs-title">p</span>.<span class="hljs-title">get_x</span>() <span class="hljs-operator">+</span> <span class="hljs-title">p</span>.<span class="hljs-title">get_width</span>() <span class="hljs-operator">/</span> 2., <span class="hljs-title">p</span>.<span class="hljs-title">get_height</span>()), 
                   <span class="hljs-title">ha</span> <span class="hljs-operator">=</span> <span class="hljs-string">'center'</span>, <span class="hljs-title">va</span> <span class="hljs-operator">=</span> <span class="hljs-string">'center'</span>, <span class="hljs-title">fontsize</span> <span class="hljs-operator">=</span> 8,
                   <span class="hljs-title">xytext</span> <span class="hljs-operator">=</span> (0, 9), 
                   <span class="hljs-title">textcoords</span> <span class="hljs-operator">=</span> <span class="hljs-string">'offset points'</span>)
    <span class="hljs-title">sns</span>.<span class="hljs-title">despine</span>()
    <span class="hljs-title">plt</span>.<span class="hljs-title">xticks</span>(<span class="hljs-title">rotation</span><span class="hljs-operator">=</span>30)
    <span class="hljs-title">plt</span>.<span class="hljs-title">ylabel</span>(<span class="hljs-string">'Frequency (Count)'</span>, <span class="hljs-title">fontsize</span><span class="hljs-operator">=</span>12)
    <span class="hljs-title">plt</span>.<span class="hljs-title">xlabel</span>(<span class="hljs-string">'Gem Type'</span>, <span class="hljs-title">fontsize</span><span class="hljs-operator">=</span>12)
    <span class="hljs-title">plt</span>.<span class="hljs-title">title</span>(<span class="hljs-string">'Count mention of various gemstones in '</span> <span class="hljs-operator">+</span> <span class="hljs-title">title_of_file</span> <span class="hljs-operator">+</span> <span class="hljs-string">'\n'</span> <span class="hljs-operator">+</span> <span class="hljs-string">'by '</span> <span class="hljs-operator">+</span> <span class="hljs-title">author_date</span>, <span class="hljs-title">fontsize</span><span class="hljs-operator">=</span>12)
    <span class="hljs-title">plt</span>.<span class="hljs-title">show</span>()

<span class="hljs-comment">//example INPUT:</span>
<span class="hljs-title">gem_frequency_graph</span>(<span class="hljs-string">'tavernier_text.txt'</span>, <span class="hljs-string">'The Six Voyages of John Baptista Tavernier'</span>, <span class="hljs-string">'Jean-Baptiste Tavernier, 1678'</span>)
</code></pre><p><strong><em>Example Results</em></strong> (with <em>The Six Voyages of John Baptista Tavernier</em>, by Jean-Baptiste Tavernier, 1678 as the text input):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1635403591064/tbsBAJ4f28.png" alt="Screen Shot 2021-10-27 at 11.46.18 PM.png" /></p>
<p><strong>3. gem_histogram_graph(filename, title_of_file):</strong> </p>
<p>This function produces an interactive histogram and rug plot that discloses where in the text different gems are discussed, if there are notable correlations between gems, and which gems hold an irregular or individual role in the text.</p>
<p> Parameters: </p>
<ul>
<li><code>filename</code>: primary source file that function analyzes for frequency of gem mentions; file should be saved as a .txt file in the same environment</li>
<li><code>title_of_file</code>: enter the title of the primary source file as a string. This string element informs the heading for the graphic output</li>
</ul>
<p>The output <strong>provides a novel way of interacting with and visualizing the historical source</strong> -- graphically threading and displaying the location and frequency of gems through the text. The produced plots would be helpful in evaluating news sources and gauging how they refer to gems (or, similar to the frequency chart above, any specified defined list or category of objects). </p>
<p><strong><em>My Code:</em></strong></p>
<pre><code>def gem_histogram_graph(filename, title_of_file):
    <span class="hljs-keyword">from</span> clean_up_text <span class="hljs-keyword">import</span> <span class="hljs-title">file_function</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">matplotlib</span>.<span class="hljs-title">pyplot</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">plt</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">nltk</span>.<span class="hljs-title">probability</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">FreqDist</span>   
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">pandas</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">pd</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">plotly</span>.<span class="hljs-title">express</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">px</span>

    <span class="hljs-title">good_list_lower</span> <span class="hljs-operator">=</span> <span class="hljs-title">file_function</span>(<span class="hljs-title">filename</span>)

<span class="hljs-comment">//create gem specific lists and counts from tokenized inputted filename</span>
    <span class="hljs-title">pearl</span> <span class="hljs-operator">=</span> []
    <span class="hljs-title">i</span> <span class="hljs-operator">=</span> 0
    <span class="hljs-title"><span class="hljs-keyword">while</span></span> <span class="hljs-title">i</span> <span class="hljs-operator">&lt;</span> <span class="hljs-title">len</span>(<span class="hljs-title">good_list_lower</span>):
        <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'pearls'</span> <span class="hljs-title">or</span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'pearl'</span>:
            <span class="hljs-title">pearl</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
        <span class="hljs-title">i</span> <span class="hljs-operator">=</span> <span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1

    <span class="hljs-title">diamond</span> <span class="hljs-operator">=</span> []
    <span class="hljs-title">i</span> <span class="hljs-operator">=</span> 0
    <span class="hljs-title"><span class="hljs-keyword">while</span></span> <span class="hljs-title">i</span> <span class="hljs-operator">&lt;</span> <span class="hljs-title">len</span>(<span class="hljs-title">good_list_lower</span>):
        <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'diamond'</span> <span class="hljs-title">or</span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'diamonds'</span>:
            <span class="hljs-title">diamond</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
        <span class="hljs-title">i</span> <span class="hljs-operator">=</span> <span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1    

    <span class="hljs-title">ruby</span> <span class="hljs-operator">=</span> []
    <span class="hljs-title">i</span> <span class="hljs-operator">=</span> 0
    <span class="hljs-title"><span class="hljs-keyword">while</span></span> <span class="hljs-title">i</span> <span class="hljs-operator">&lt;</span> <span class="hljs-title">len</span>(<span class="hljs-title">good_list_lower</span>):
        <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'ruby'</span> <span class="hljs-title">or</span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'rubies'</span> <span class="hljs-title">or</span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'spinel'</span>:
            <span class="hljs-title">ruby</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
        <span class="hljs-title">i</span> <span class="hljs-operator">=</span> <span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1

    <span class="hljs-title">emerald</span> <span class="hljs-operator">=</span> []
    <span class="hljs-title">i</span> <span class="hljs-operator">=</span> 0
    <span class="hljs-title"><span class="hljs-keyword">while</span></span> <span class="hljs-title">i</span> <span class="hljs-operator">&lt;</span> <span class="hljs-title">len</span>(<span class="hljs-title">good_list_lower</span>):
        <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'emerald'</span> <span class="hljs-title">or</span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'emeralds'</span>:
            <span class="hljs-title">emerald</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
        <span class="hljs-title">i</span> <span class="hljs-operator">=</span> <span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1

    <span class="hljs-title">sapphire</span> <span class="hljs-operator">=</span> []
    <span class="hljs-title">i</span> <span class="hljs-operator">=</span> 0
    <span class="hljs-title"><span class="hljs-keyword">while</span></span> <span class="hljs-title">i</span> <span class="hljs-operator">&lt;</span> <span class="hljs-title">len</span>(<span class="hljs-title">good_list_lower</span>):
        <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'sapphire'</span> <span class="hljs-title">or</span> <span class="hljs-title">good_list_lower</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'sapphires'</span>:
            <span class="hljs-title">sapphire</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
        <span class="hljs-title">i</span> <span class="hljs-operator">=</span> <span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1

    <span class="hljs-comment">//create and merge the five gem specific dataframe </span>
    <span class="hljs-title">dfp</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span>({<span class="hljs-string">'pearls'</span>:<span class="hljs-title">pearl</span>})
    <span class="hljs-title">dfr</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span>({<span class="hljs-string">'rubies'</span>:<span class="hljs-title">ruby</span>})
    <span class="hljs-title">dfe</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span>({<span class="hljs-string">'emeralds'</span>:<span class="hljs-title">emerald</span>})
    <span class="hljs-title">dfs</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span> ({<span class="hljs-string">'sapphire'</span>:<span class="hljs-title">sapphire</span>})
    <span class="hljs-title">dfd</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span> ({<span class="hljs-string">'diamond'</span>:<span class="hljs-title">diamond</span>})
    <span class="hljs-title">df1</span> <span class="hljs-operator">=</span> <span class="hljs-title">dfp</span>.<span class="hljs-title">merge</span>(<span class="hljs-title">dfr</span>, <span class="hljs-title">left_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">right_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">how</span><span class="hljs-operator">=</span><span class="hljs-string">'outer'</span>)
    <span class="hljs-title">df2</span> <span class="hljs-operator">=</span> <span class="hljs-title">df1</span>.<span class="hljs-title">merge</span>(<span class="hljs-title">dfe</span>, <span class="hljs-title">left_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">right_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">how</span><span class="hljs-operator">=</span><span class="hljs-string">'outer'</span>)
    <span class="hljs-title">df3</span> <span class="hljs-operator">=</span> <span class="hljs-title">df2</span>.<span class="hljs-title">merge</span>(<span class="hljs-title">dfs</span>, <span class="hljs-title">left_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">right_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">how</span><span class="hljs-operator">=</span><span class="hljs-string">'outer'</span>)
    <span class="hljs-title">df</span> <span class="hljs-operator">=</span> <span class="hljs-title">df3</span>.<span class="hljs-title">merge</span>(<span class="hljs-title">dfd</span>, <span class="hljs-title">left_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">right_index</span><span class="hljs-operator">=</span><span class="hljs-title">True</span>, <span class="hljs-title">how</span><span class="hljs-operator">=</span><span class="hljs-string">'outer'</span>)

    <span class="hljs-comment">//create &amp; format plotly histogram </span>
    <span class="hljs-title">fig</span> <span class="hljs-operator">=</span> <span class="hljs-title">px</span>.<span class="hljs-title">histogram</span>(<span class="hljs-title">df</span>, <span class="hljs-title">opacity</span><span class="hljs-operator">=</span>0.8, <span class="hljs-title">nbins</span><span class="hljs-operator">=</span>35, <span class="hljs-title">marginal</span><span class="hljs-operator">=</span><span class="hljs-string">'rug'</span>, 
        <span class="hljs-title">color_discrete_sequence</span><span class="hljs-operator">=</span>[<span class="hljs-string">"#FFBD00"</span>, <span class="hljs-string">"#FF5768"</span>, <span class="hljs-string">"#4EC29D"</span>, <span class="hljs-string">"#0065a2"</span>, <span class="hljs-string">"#8376AA"</span>])
    <span class="hljs-title">fig</span>.<span class="hljs-title">update_layout</span>(<span class="hljs-title">yaxis_title</span><span class="hljs-operator">=</span><span class="hljs-string">"Count"</span>, <span class="hljs-title">xaxis_title</span><span class="hljs-operator">=</span><span class="hljs-string">'Location in Book Histogram (where in the book do the mentions occur)'</span>) 
    <span class="hljs-title">fig</span>.<span class="hljs-title">update_traces</span>(<span class="hljs-title">opacity</span><span class="hljs-operator">=</span>0.80)
    <span class="hljs-title">fig</span>.<span class="hljs-title">update_xaxes</span>(<span class="hljs-title">showticklabels</span><span class="hljs-operator">=</span><span class="hljs-title">False</span>)
    <span class="hljs-title">fig</span>.<span class="hljs-title">show</span>()  

<span class="hljs-comment">//example INPUT:</span>
<span class="hljs-title">gem_histogram_graph</span>(<span class="hljs-string">'jahangir_wheeler_thackston_translation.txt'</span>, <span class="hljs-string">'Jahangirnama, Wheeler Thackston Translation'</span>)
</code></pre><p><strong><em>Example Results</em></strong> (with <em>The Jahangirnama</em>, by Nur al-Din Jahangir (Jahangir Emperor of Hindustan) and translated by Wheeler M. Thackston as the text input):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1635792377733/lMZHw1CTl.gif" alt="ezgif.com-gif-maker (3).gif" /></p>
<p><strong>4. pearl_sentence_rug(filename): </strong> </p>
<p>This function's output, a pearl-mention specific interactive rug plot, reveals how the pearl is described in the inputted text, what its typical context is, and whether there is a specific section of the book that mentions pearls more frequently.</p>
<p> Parameters: </p>
<ul>
<li><code>filename</code>: primary source file that function analyzes for frequency of gem mentions; file should be saved as a .txt file in the same environment</li>
<li><code>title_of_file</code>: enter the title of the primary source file as a string. This string element informs the heading for the graphic output</li>
</ul>
<p>The rug plot could be helpful in gauging potential new sources and provide directions to where to start a deeper reading or investigation. It provides an alternative and atypical way to both  engage with the source's subject, sentences, and syntax.</p>
<p><strong><em>My Code:</em></strong></p>
<pre><code>def pearl_sentence_rug(filename, title_of_file):
    <span class="hljs-keyword">from</span> FUN_clean_up_text <span class="hljs-keyword">import</span> <span class="hljs-title">file_function</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">pandas</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">pd</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">plotly</span>.<span class="hljs-title">express</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">px</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">plotly</span>.<span class="hljs-title">figure_factory</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">ff</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">plotly</span>.<span class="hljs-title">validators</span>.<span class="hljs-title">scatter</span>.<span class="hljs-title">marker</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">SymbolValidator</span>
    <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">pandas</span> <span class="hljs-title"><span class="hljs-keyword">as</span></span> <span class="hljs-title">pd</span>

    <span class="hljs-title">text_final</span> <span class="hljs-operator">=</span> <span class="hljs-title">file_function</span>(<span class="hljs-title">filename</span>)

<span class="hljs-comment">//identify index location of pearl mentions within text file</span>
    <span class="hljs-title">pearl_index</span> <span class="hljs-operator">=</span> []
    <span class="hljs-title">i</span> <span class="hljs-operator">=</span> 0
    <span class="hljs-title"><span class="hljs-keyword">while</span></span> <span class="hljs-title">i</span> <span class="hljs-operator">&lt;</span> <span class="hljs-title">len</span>(<span class="hljs-title">text_final</span>):
        <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">text_final</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'pearls'</span> <span class="hljs-title">or</span> <span class="hljs-title">text_final</span>[<span class="hljs-title">i</span>] <span class="hljs-operator">=</span><span class="hljs-operator">=</span> <span class="hljs-string">'pearl'</span>:
            <span class="hljs-title">pearl_index</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
        <span class="hljs-title">i</span> <span class="hljs-operator">=</span> <span class="hljs-title">i</span> <span class="hljs-operator">+</span> 1

<span class="hljs-comment">//identify pearl sentences</span>
    <span class="hljs-title">sentence_one</span> <span class="hljs-operator">=</span> <span class="hljs-title">re</span>.<span class="hljs-title">findall</span>(<span class="hljs-title">r</span><span class="hljs-string">"([^.]*?pearl[^.]*\ .)"</span>,<span class="hljs-title">text</span>)
    <span class="hljs-title">sentence_two</span> <span class="hljs-operator">=</span> <span class="hljs-title">re</span>.<span class="hljs-title">findall</span>(<span class="hljs-title">r</span><span class="hljs-string">"([^.]*?pearls[^.]*\ .)"</span>,<span class="hljs-title">text</span>)
    <span class="hljs-title">all_sentences</span> <span class="hljs-operator">=</span> <span class="hljs-title">sentence_one</span> <span class="hljs-operator">+</span> <span class="hljs-title">sentence_two</span>
    <span class="hljs-title">list_tuples</span> <span class="hljs-operator">=</span> <span class="hljs-title">list</span>(<span class="hljs-title">zip</span>(<span class="hljs-title">all_sentences</span>, <span class="hljs-title">pearl_index</span>))

<span class="hljs-comment">//create DataFrame with pearl sentence and index location</span>
    <span class="hljs-title">df</span> <span class="hljs-operator">=</span> <span class="hljs-title">pd</span>.<span class="hljs-title">DataFrame</span>(<span class="hljs-title">list_tuples</span>, <span class="hljs-title">columns</span> <span class="hljs-operator">=</span> [<span class="hljs-string">'Pearl Sentence'</span>, <span class="hljs-string">'Index Location'</span>])
    <span class="hljs-title">df</span>[<span class="hljs-string">"Sentence Key Word"</span>] <span class="hljs-operator">=</span> <span class="hljs-string">'Pearl'</span>

<span class="hljs-comment">//create and format plotly graph</span>
    <span class="hljs-title">fig</span> <span class="hljs-operator">=</span> <span class="hljs-title">px</span>.<span class="hljs-title">scatter</span>(<span class="hljs-title">df</span>, <span class="hljs-title">y</span><span class="hljs-operator">=</span><span class="hljs-string">"Sentence Key Word"</span>, <span class="hljs-title">x</span><span class="hljs-operator">=</span><span class="hljs-string">"Index Location"</span>, 
    <span class="hljs-title">hover_data</span><span class="hljs-operator">=</span>[<span class="hljs-string">'Pearl Sentence'</span>], <span class="hljs-title">color</span><span class="hljs-operator">=</span><span class="hljs-string">"Index Location"</span>, <span class="hljs-title">color_continuous_scale</span><span class="hljs-operator">=</span><span class="hljs-title">px</span>.<span class="hljs-title">colors</span>.<span class="hljs-title">diverging</span>.<span class="hljs-title">Temps</span>)

    <span class="hljs-title">fig</span>.<span class="hljs-title">update_traces</span>(<span class="hljs-title">marker_symbol</span><span class="hljs-operator">=</span><span class="hljs-string">'line-ns-open'</span>,
                            <span class="hljs-title">marker_line_width</span><span class="hljs-operator">=</span>2.5, <span class="hljs-title">marker_size</span><span class="hljs-operator">=</span>70)
    <span class="hljs-title">fig</span>.<span class="hljs-title">update_yaxes</span>(<span class="hljs-title">showticklabels</span><span class="hljs-operator">=</span><span class="hljs-title">False</span>)
    <span class="hljs-title">fig</span>.<span class="hljs-title">update_layout</span>(<span class="hljs-title">yaxis_title</span><span class="hljs-operator">=</span><span class="hljs-string">""</span>, <span class="hljs-title">xaxis_title</span><span class="hljs-operator">=</span><span class="hljs-string">'Index Location(location of mentions of pearl in inputted work)'</span>, <span class="hljs-title">font</span><span class="hljs-operator">=</span><span class="hljs-title">dict</span>(
            <span class="hljs-title">size</span><span class="hljs-operator">=</span>14,
            <span class="hljs-title">color</span><span class="hljs-operator">=</span><span class="hljs-string">"#1B6262"</span>), <span class="hljs-title">title</span><span class="hljs-operator">=</span><span class="hljs-string">'Sentences containing pearl identified in '</span> <span class="hljs-operator">+</span> <span class="hljs-title">title_of_file</span> <span class="hljs-operator">+</span>  
            <span class="hljs-string">'.  Hover over to see the sentences!'</span>)
    <span class="hljs-title">fig</span>.<span class="hljs-title">show</span>()

<span class="hljs-comment">//example INPUT:</span>
<span class="hljs-title">pearl_sentence_rug</span>(<span class="hljs-string">'jahangir_wheeler_thackston_translation.txt'</span>, <span class="hljs-string">'The Jahangirnama'</span>)
</code></pre><p><strong><em>Example Results</em></strong> (with <em>The Jahangirnama</em>, by Nur al-Din Jahangir (Jahangir Emperor of Hindustan) and translated by Wheeler M. Thackston as the text input):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1635455738981/oM1P3KtI2.gif" alt="demo_gif.gif" /></p>
<h3 id="heading-results-analysis">Results Analysis</h3>
<p>As a historian, I have looked to employ concentrated lenses to craft transregional and transdisciplinary histories. While researching and writing, I have relied upon traditional history research practices and tools at my disposal--primarily in-depth readings of selected documents and detailed analyses of other early modern materials. Returning to past sources and works has allowed me to consider the benefits of using data analysis, visualization, and other computational methods to augment historical research.</p>
<p>The four python functions shared in this blog only demonstrate the surface or potential of what could be built to augment an object-oriented or thread-tracing historical research project. My efforts and results are duly limited as (1) I only explored text files, (2) I used a small sample source size, (3) my knowledge of available python libraries or modules is still developing, and, (4) the time I have available!</p>
<p>That being said, using code can offer <strong>benefits</strong> that traditional readings of pre-selected documents cannot provide, such as the ability to:</p>
<ul>
<li><strong>Rapidly analyze many sources</strong> through the same function and evaluate the potential usefulness of sources (While I only explored written-texts, the input sources can be other digitized sources such as databases, auditory files, or visual files.)</li>
<li>Discover larger <strong>patterns or correlations</strong> between various documents or text corpa </li>
<li><strong>Visually reappraise and engage with</strong> sources in a nontraditional manner</li>
<li>Perform additional, <strong>objective analysis</strong> of a text's central meaning, topics, focus, and sentiment (I have not yet explored the sentiment analysis models of the <code>nltk</code> library, but am excited to learn more about it)</li>
<li>Constantly <strong>augment, iterate, and improve programs</strong> that are both self-authored or available open source</li>
</ul>
<p>There are seemingly limitless avenues for scalable and repeatable explorations into historical source inputs--especially for object-oriented projects. I have only begun to scratch the surface of possibilities and am eagerly growing my programming skills! </p>
<p>Happy Coding!</p>
<p><em>Please refer to other posts in my blog or external public repositories such as GitHub or the Programming Historian for basic information on python, using programming for text-based analysis, historical source considerations, and other cool articles. </em></p>
<p><strong>Bibliography:</strong></p>
<p>Coryate, Thomas, and George Coryate. <em>Coryat's Crudities, vol I &amp; II.</em> Glasgow: J. MacLehose and Sons, 1905.</p>
<p>Harper, Charlie. "Visualizing Data with Bokeh and Pandas," <em>Programming Historian 7,</em> 2018.  <a target="_blank" href="https://programminghistorian.org/en/lessons/visualizing-with-bokeh">doi.org/10.46430/phen0081</a> </p>
<p>Nur al-Din Jahangir (Jahangir Emperor of Hindustan). <em>The Jahangirnama: Memoirs of Jahangir. Emperor of India</em>. Translated, edited, and annotated by Wheeler M. Thackston. New York: Oxford University Press, 1999.</p>
<p>Tavernier, Jean-Baptiste. <em>The Six Voyages of John Baptista Tavernier</em>. London: Printed for R.L. and M.P., 1678.</p>
]]></content:encoded></item><item><title><![CDATA[Creating scalable and repeatable functions to augment historical research]]></title><description><![CDATA[This blog post has three primary goals:

Walk through how to build a function intended to help examine primary sources
Establish the benefit of building repeatable and iterative functions
Provoke history scholars and researchers to consider using pyt...]]></description><link>https://encodinghistory.com/creating-scalable-and-repeatable-functions-to-augment-historical-research</link><guid isPermaLink="true">https://encodinghistory.com/creating-scalable-and-repeatable-functions-to-augment-historical-research</guid><category><![CDATA[Python]]></category><category><![CDATA[history]]></category><category><![CDATA[python beginner]]></category><dc:creator><![CDATA[Natasha]]></dc:creator><pubDate>Wed, 20 Oct 2021 21:55:44 GMT</pubDate><content:encoded><![CDATA[<p>This blog post has three primary goals:</p>
<ul>
<li>Walk through how to build a function intended to help examine primary sources</li>
<li>Establish the benefit of <strong>building repeatable and iterative functions</strong></li>
<li>Provoke history scholars and researchers to <strong>consider using python</strong> as an additional investigatory tool</li>
</ul>
<h3 id="heading-getting-started-with-functions">Getting started with functions</h3>
<p>One major potential benefit of using programming to augment historical research is its highly <strong>scalable, iterative, and repetitive</strong> nature. Building functions is a primary way to write code that can be reused and gradually refined. Python functions enable programmers, researchers, and scholars to engage with a wider range and higher number of sources or inputs than they could consider via traditional research methods.</p>
<p>Functions can be very simple or incredibly complex. Think of <strong>functions as blocks of instructions that produce wanted outcomes. </strong></p>
<ul>
<li>Start by using the <code>def</code> command to declare the function name, add parameters in parentheses, and end the line with a colon:<code>def &lt;function_name&gt;(parameter_1, parameter_2, etc.):</code><ul>
<li>A parameter/argument is information that can be passed back through the function when it is called. You can add any number of parameters in the parentheses divided by commas. Parameter is typically the term used when defining a function, and, when called, you enter in arguments.  </li>
</ul>
</li>
<li>Add indented statements that entail what the functions should execute, including desired outputs (i.e. <code>print(), show(), return()</code>). There can be multiple outputs.</li>
<li>Once all instructions have been written, un-tab the line and call the function. The function can be imported and called from other locations.</li>
</ul>
<h3 id="heading-simple-example-function">Simple Example Function</h3>
<ul>
<li><strong>Function</strong>: function_name(filename)</li>
<li><strong>Purpose</strong>: When called, the function opens a file inputted as the parameter <code>filename</code>, reads the text, store it as variable <code>text</code>, and then closes the file</li>
<li><strong>Parameters</strong>: <ul>
<li><code>filename</code>: can be any source file saved in the same environment</li>
</ul>
</li>
</ul>
<pre><code><span class="hljs-comment">#declare the function:</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">function_name</span>(<span class="hljs-params">filename</span>):</span>
    file = open(filename, <span class="hljs-string">'rt'</span>)
    text = file.read()
    file.close()

<span class="hljs-comment">#call the function:</span>
function_name(<span class="hljs-string">'file_input_as_parameter.txt'</span>)
</code></pre><h2 id="heading-building-a-function-to-augment-historical-research">Building a function to augment historical research</h2>
<p>To augment my historical research, I wanted to leveraged data analysis and visualization functionality to investigate gems in text-based records. The following steps outline how I built a <code>gem_frequency_graph</code> function. The function produces a bar chart of the frequency/count gemstones appearance in any inputted file. <a target="_blank" href="https://encodinghistory.com/using-python-to-trace-threads-through-history">See this post for a more complex discussion regarding functions that augmented my analysis of the gems and pearls in early modern history</a>. </p>
<p><strong>1.  Declare the function and parameters, import any needed libraries/modules</strong></p>
<ul>
<li><strong>Function</strong>:  frequency_gem_graph(filename, title_of_file, author_date)</li>
<li><strong>Purpose</strong>: The function produces a graphic representation of how many times different gems (diamonds, sapphires, pearls, etc.) are mentioned in an inputted text.</li>
<li><strong>Parameters</strong>: <ul>
<li><code>filename</code>: primary source file that function analyzes for frequency of gem mentions; file should be saved as a .txt file in the same environment <a target="_blank" href="https://encodinghistory.com/establishing-and-transforming-your-digital-history-sources">See this post for more details on digital history sources. </a></li>
<li><code>title_of_file</code>: enter the title of the primary source file as a string. This string element populates a portion of the <code>title()</code> function from the matplotlib library that sets the heading for the bar-plot</li>
<li><code>author_date</code>: enter bibliographic information (author, editors, translators, dates, etc.) of the primary source file as a string (i.e. ('Jean-Baptiste Tavernier, 1678'). This string element populates a portion of the <code>title()</code> function from the matplotlib library that sets the sub-heading for the bar-plot</li>
</ul>
</li>
</ul>
<p>After naming and defining the <code>gem_frequency_graph</code> function, it is best practice to importing any necessary libraries/modules. This includes importing other function that you have built previously. For example, I pre-built <code>file_function()</code> that <a target="_blank" href="https://encodinghistory.com/establishing-and-transforming-your-digital-history-sources">cleans up my historical source input.</a> </p>
<pre><code><span class="hljs-comment">#define your function with clear parameters and import needed libraries</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">gem_frequency_graph</span>(<span class="hljs-params">filename, title_of_file, author_date</span>):</span>
    <span class="hljs-keyword">from</span> clean_up_text <span class="hljs-keyword">import</span> file_function
    <span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
    <span class="hljs-keyword">from</span> nltk.probability <span class="hljs-keyword">import</span> FreqDist
    <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
    <span class="hljs-keyword">import</span> seaborn <span class="hljs-keyword">as</span> sns
    <span class="hljs-keyword">from</span> nltk.stem <span class="hljs-keyword">import</span> PorterStemmer

<span class="hljs-comment">#instead of continually redoing clean up, call a built function </span>
<span class="hljs-comment">#with the argument from new function of (filename) </span>
   text_final = file_function(filename)
</code></pre><p><strong>2. Add statements that the function should execute</strong></p>
<p>For the <code>gem_frequency_graph</code> function, I wanted to compare how frequently different gems are mentioned in specific text. I augmented a large (cleaned-up) list: <code>text_final</code> to ensure that any variations or plural spellings of the different gems would be counted (i.e. that 'pearls' would be counted in the final count for pearl). <em>I did equate spinel to ruby, though there are debates around this topic. This decisions could easily be changed or even added as its own, permeable parameter to the function.</em></p>
<p>From there, I leveraged the <code>pandas</code> library to build a frequency distribution dataframe. This changes list data into table form. If you added the output command <code>print(all_fdist)</code> to the function, you would view a table with each gem in the <code>gem_list</code> and its count for the inputted text you called through the function in the terminal.</p>
<pre><code><span class="hljs-comment">#augmenting the list to build our frequency dist function</span>
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'pearls'</span>, <span class="hljs-string">'pearl'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'jewels'</span>, <span class="hljs-string">'jewel'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'gemstone'</span>, <span class="hljs-string">'gem'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'gems'</span>, <span class="hljs-string">'gem'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'rubies'</span>, <span class="hljs-string">'ruby'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'spinel'</span>, <span class="hljs-string">'ruby'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'emeralds'</span>, <span class="hljs-string">'emerald'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'corals'</span>, <span class="hljs-string">'coral'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'diamonds'</span>, <span class="hljs-string">'diamond'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'ambers'</span>, <span class="hljs-string">'amber'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'sapphires'</span>, <span class="hljs-string">'sapphire'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'jades'</span>, <span class="hljs-string">'jade'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'turquoises'</span>, <span class="hljs-string">'turquoise'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'ivories'</span>, <span class="hljs-string">'ivory'</span>) for i in text_final]
    <span class="hljs-attr">text_final</span> = [i.replace(<span class="hljs-string">'garnets'</span>, <span class="hljs-string">'garnet'</span>) for i in text_final]

   <span class="hljs-attr">gems</span> = [<span class="hljs-string">'pearl'</span>, <span class="hljs-string">'ruby'</span>, <span class="hljs-string">'jewel'</span>, <span class="hljs-string">'emerald'</span>, <span class="hljs-string">'coral'</span>, <span class="hljs-string">'gem'</span>, <span class="hljs-string">'diamond'</span>, <span class="hljs-string">'sapphire'</span>, <span class="hljs-string">'turquoise'</span>, <span class="hljs-string">'jade'</span>, <span class="hljs-string">'amethyst'</span>, <span class="hljs-string">'topaz'</span>, <span class="hljs-string">'opal'</span>, <span class="hljs-string">'ivory'</span>, <span class="hljs-string">'amber'</span>, <span class="hljs-string">'catseye'</span>, <span class="hljs-string">'alexandrite'</span>, <span class="hljs-string">'garnet'</span>, <span class="hljs-string">'peridot'</span>]
   <span class="hljs-attr">gem_list</span> = [w for w in text_final if w in gems]

<span class="hljs-comment">#build frequency distribution dataframe</span>
    <span class="hljs-attr">all_fdist</span> = FreqDist(gem_list)
    <span class="hljs-attr">all_fdist</span> = pd.Series(dict(all_fdist)).sort_values(ascending=<span class="hljs-literal">False</span>)
</code></pre><p>Then, I used the table from the dataframe <code>all_fdist</code> as an input into the <code>seaborn</code> bar graph (<code>sns.barplot</code>). This transforms the data into a visual representation of the findings!</p>
<pre><code><span class="hljs-comment">#build graph; establish your color, format, and other specifications</span>
    <span class="hljs-attribute">sns</span>.set()
    <span class="hljs-attribute">sns</span>.color_palette(<span class="hljs-string">"husl"</span>, <span class="hljs-number">8</span>)

    <span class="hljs-attribute">fig</span>,ax = plt.subplots(figsize=(<span class="hljs-number">8</span>,<span class="hljs-number">8</span>))
    <span class="hljs-attribute">all_plot</span> = sns.barplot(x=all_fdist.index, y=all_fdist.values, ax=ax)
    <span class="hljs-attribute">sns</span>.despine()
    <span class="hljs-attribute">plt</span>.xticks(rotation=<span class="hljs-number">30</span>)
    <span class="hljs-attribute">plt</span>.ylabel('Frequency (Count)', fontsize=<span class="hljs-number">8</span>)
    <span class="hljs-attribute">plt</span>.xlabel('Gem', fontsize=<span class="hljs-number">8</span>)
    <span class="hljs-attribute">plt</span>.suptitle('Count mentions of various gemstones in ' + title_of_file, fontsize=<span class="hljs-number">14</span>)
    <span class="hljs-attribute">plt</span>.title( 'by ' + author_date, fontsize = <span class="hljs-number">10</span>)

<span class="hljs-comment">#display the graph</span>
    <span class="hljs-attribute">plt</span>.show()
</code></pre><p><strong>3. Call the function</strong></p>
<p>The above code (building the gem count list, the dataframe, the chart, etc.) is indented and, thus, instructions built into the defined <code>gem_frequency_barchart</code> function. In a <strong>new, unindented line, you can call the function.</strong> Call the function by typing the function name and correctly inputting the arguments/parameters in parentheses. </p>
<pre><code><span class="hljs-comment">#example calling the gem frequency bar chart function</span>
gem_frequency_chart(<span class="hljs-string">'jahangir_wheeler_thackston_translation.txt'</span>, <span class="hljs-string">'&lt;Jahangirnama&gt;'</span>,
 <span class="hljs-string">'Emperor Jahangir - Translated by Wheeler Thackston'</span>)
</code></pre><p>The above parameters outputs the following graphic representation:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1634878420751/hqHjwbjj8.png" alt="Screen Shot 2021-10-21 at 9.53.05 PM.png" /></p>
<h3 id="heading-research-test-refine-and-iterate">Research, Test, Refine, and Iterate</h3>
<p>For my inputs, I practiced using two translations of the <em>Jahangirnama</em> and Jean-Baptiste Tavernier's <em>The Six Voyages of John Baptista Tavernier.</em> Additional text files could be called through the same code by saving the files in the same environment and updating the parameters. The example function is, by no means, perfect. As I continue to expand my python skills, I can alter and improve my functions. With every change, I can run the same or new historical texts through it to see the data and end results.</p>
<p> <strong>Functions are easily scalable, repeatable, and permeable. </strong> As many programmers or individuals in the technology space know, the clearest path to a better end result is through iteration! After researching or thinking about what you may want to code, try to build it. Iterate your functions, test them often, improve and tweak as necessary, or start new if something is not turning out the way you want it to! The possibilities are limitless. </p>
<p>Happy coding!</p>
<p><strong>Bibliography:</strong></p>
<p>Nur al-Din Jahangir (Jahangir Emperor of Hindustan). The Jahangirnama: Memoirs of Jahangir. Emperor of India. Translated, edited, and annotated by Wheeler M. Thackston. New York: Oxford University Press, 1999.</p>
<p>Tavernier, Jean-Baptiste. The Six Voyages of John Baptista Tavernier. London: Printed for R.L. and M.P., 1678.</p>
]]></content:encoded></item><item><title><![CDATA[Establishing and transforming your digital history sources]]></title><description><![CDATA[This blog post has three main goals:

Review potential historical source that can be used as digital inputs for your programs
Establish how to open text digital input in your code
Explore how to transform and clean up your text inputs (removing stopw...]]></description><link>https://encodinghistory.com/establishing-and-transforming-your-digital-history-sources</link><guid isPermaLink="true">https://encodinghistory.com/establishing-and-transforming-your-digital-history-sources</guid><category><![CDATA[history]]></category><category><![CDATA[python beginner]]></category><category><![CDATA[Python 3]]></category><category><![CDATA[DigitalOcean]]></category><dc:creator><![CDATA[Natasha]]></dc:creator><pubDate>Sun, 10 Oct 2021 03:47:02 GMT</pubDate><content:encoded><![CDATA[<p>This blog post has three main goals:</p>
<ul>
<li>Review potential historical source that can be used as digital inputs for your programs</li>
<li>Establish how to open text digital input in your code</li>
<li>Explore how to <strong>transform and clean up</strong> your text inputs (removing stopwords, cleaning up formatting, and other considerations)</li>
</ul>
<p><em>This post attempts to provide a simple and digestible overview of what to consider when thinking of what historical sources could be used your programs. However, it does not cover the basic information regarding installing <code>python3</code>, setting up a virtual environment, installing libraries, or defintions for common terminology (lists, strings, etc.).</em></p>
<h3 id="heading-historical-sources-as-digital-inputs">Historical Sources as Digital Inputs</h3>
<p>To conduct code-based research analysis, you need to find (or create) digital source inputs. The potential inputs you can use are really limitless. Analysis can be performed on text files, data tables, digitally-uploaded images, auditory files, and more modalities. </p>
<p>The following focuses on cleaning up text files. Many historical primary sources are already available online (via various efforts to digitize and archive cultural works such as Project Gutenberg, Fordham Universities Internet History Sourcebooks Project, the National Archives, etc.). For written materials, you can find and download sources as txt, csv (comma-separated values), html (HyperText Markup Language), or other formats. I found it easiest to work with the raw text files as they are easy to transform into strings, there are less notation additions, and, thus, there is less clean up required.</p>
<p><em>If you only have access to a certain format (like html), there are steps you can take to clean up your files and ensure that they can be used as an input. Explore other posts and forums to learn more about the numerous files types or inputs you can use in your analysis.</em></p>
<h3 id="heading-opening-digital-input-in-code">Opening Digital Input in Code</h3>
<p>Download and add your text file to the workspace or virtual environment (<code>venv</code>) you are using and then you can use it in analysis.</p>
<p>For example, I imported the raw text of Wheeler Thackston's translation of the <em>Jahangirnama</em> made available by Freer Gallery of Art, Arthur M. Sackler Gallery, Smithsonian Institution (doi:<a target="_blank" href="Link">10.5479/sil.849796.39088018028456</a> )</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1634796267884/2kKTl0Ufu.png" alt="Screen Shot 2021-10-20 at 11.04.07 PM.png" /></p>
<p>At this point, you can manually conduct some very high-level clean up efforts to ensure the text is ready for code-based analysis. This could include actions like manually deleting prefaces or indices at the end of the text you don't want to include in your analysis. After saving this file as <strong><em>jahangir_wheeler_thackston_translation.txt </em></strong>in my virtual environment, I am able to open it in other programs in the same environment. </p>
<pre><code><span class="hljs-comment">#opening a file (located in the same environment) in your program </span>
filename = <span class="hljs-string">'jahangir_wheeler_thackston_translation.txt'</span>
file = <span class="hljs-keyword">open</span>(filename, <span class="hljs-string">'rt'</span>)
jahangirnama_text = file.read()
file.close()
</code></pre><h3 id="heading-transforming-and-cleaning-up-your-digital-input">Transforming and Cleaning up your Digital Input</h3>
<p>Next, use the built-in python functionality and <code>nltk</code> library (ensure you have installed the library first: <code>pip install  nltk</code>) to transform your text from a string to a list as well as generally clean up your input file. While there are many different approaches and different libraries that you can take to transform and clean up your input, the <code>nltk</code> library provide powerful, ready-made functions. </p>
<pre><code><span class="hljs-comment">#best practice is to import all libraries at the top of your program</span>
 <span class="hljs-keyword">from</span> nltk.corpus <span class="hljs-keyword">import</span> stopwords
 <span class="hljs-keyword">from</span> nltk.tokenize <span class="hljs-keyword">import</span> word_tokenize
 <span class="hljs-keyword">from</span> nltk.stem <span class="hljs-keyword">import</span> PorterStemmer

<span class="hljs-comment">#transform text from a string to a list (lists are easier for analysis)</span>
 tokens = word_tokenize(jahangirnama_text)
 text_turned_into_list = [word <span class="hljs-keyword">for</span> word <span class="hljs-keyword">in</span> tokens <span class="hljs-keyword">if</span> word.isalpha()]
</code></pre><p>After creating a list, you can transform your input further by making the list all lower case, removing stopwords (a predefined list of commonly used words built into the <code>nltk</code> library), editing punctuation issues, or other issues could hinder you analysis.</p>
<p>For example, you don't want a frequency function to tell you that "the" or "a" are the most frequently used words or think that 'Pearl' and 'pearl' are different words. Removing stopwords or capitalization allows your programs to dig more deeply into the meaning and data of the text.</p>
<pre><code><span class="hljs-comment">#make the list lowercase to avoid case sensitivities issues</span>
 <span class="hljs-attribute">text_lower</span> = [parts.lower() for parts in text_turned_into_list]

<span class="hljs-comment">#use built-in nltk function to remove stopwords</span>
 stopwords = nltk.corpus.stopwords.words(<span class="hljs-string">'english'</span>)

<span class="hljs-comment">#you can also easily append the stopword list to include other words </span>
<span class="hljs-comment">#you think detract from your text analysis</span>
 new_words = [<span class="hljs-string">'one'</span>, <span class="hljs-string">'also'</span>, <span class="hljs-string">'two'</span>, <span class="hljs-string">'would'</span>]
   for i in new_words:
         stopwords.append(i)

 final_text = [w for w in text_lower if not w in stopwords]
</code></pre><p>Additionally, you can stem the words. This may or may not be necessary depending on the analysis you want to perform. Stemming, as the word suggests, cuts the end of words or reduces the word to its root. There are different types of stemming or lemmatization and are built into the <code>nltk</code> library (<code>PorterStemmer</code>, <code>LancasterStemmer</code>, etc.).</p>
<pre><code> ps <span class="hljs-operator">=</span> PorterStemmer()
 stemmed_words<span class="hljs-operator">=</span>[]
 <span class="hljs-keyword">for</span> t in final_text:
        stemmed_words.append(ps.stem(t))
</code></pre><p>Use <code>print()</code> command to test the results along the way. Given we usually use primary sources that are hundreds of pages long (the Jahangirnama is around 500-pages), practice using indices to cut what is shows in your terminal. </p>
<pre><code><span class="hljs-comment">#print the first one-hundred elements in the list in the terminal</span>
print(final_text[:<span class="hljs-number">100</span>])
</code></pre><p>You can turn all this <strong>code into a neat, reusable function</strong>. <a target="_blank" href="https://encodinghistory.com/creating-scalable-and-repeatable-functions-to-augment-historical-research">See this post for more details on building functions. </a> ! </p>
<h3 id="heading-example-file-clean-up-function-no-stemming">Example File Clean-up Function (no stemming):</h3>
<ul>
<li><strong>Function</strong>: function_name(filename)</li>
<li><strong>Purpose</strong>: When called, the function opens a file inputted as the parameter <code>filename</code> and performs all necessary clean up tasks (i.e. tokenizes, removes stop words, etc.).</li>
<li><strong>Parameters</strong>:<ul>
<li><code>filename</code>: can be any source file saved in the same environment</li>
</ul>
</li>
</ul>
<pre><code>def file_function (filename):
    <span class="hljs-keyword">import</span> <span class="hljs-title">nltk</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">nltk</span>.<span class="hljs-title">corpus</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">stopwords</span>
    <span class="hljs-title"><span class="hljs-keyword">from</span></span> <span class="hljs-title">nltk</span>.<span class="hljs-title">tokenize</span> <span class="hljs-title"><span class="hljs-keyword">import</span></span> <span class="hljs-title">word_tokenize</span>

    <span class="hljs-title">file</span> <span class="hljs-operator">=</span> <span class="hljs-title">open</span>(<span class="hljs-title">filename</span>, <span class="hljs-string">'rt'</span>)
    <span class="hljs-title">text</span> <span class="hljs-operator">=</span> <span class="hljs-title">file</span>.<span class="hljs-title">read</span>()
    <span class="hljs-title">file</span>.<span class="hljs-title">close</span>()
    <span class="hljs-title">tokens</span> <span class="hljs-operator">=</span> <span class="hljs-title">word_tokenize</span>(<span class="hljs-title">text</span>)
    <span class="hljs-title">text_turned_into_list</span> <span class="hljs-operator">=</span> [<span class="hljs-title">word</span> <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">word</span> <span class="hljs-title">in</span> <span class="hljs-title">tokens</span> <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">word</span>.<span class="hljs-title">isalpha</span>()]
    <span class="hljs-title">text_lower</span> <span class="hljs-operator">=</span> [<span class="hljs-title">parts</span>.<span class="hljs-title">lower</span>() <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">parts</span> <span class="hljs-title">in</span> <span class="hljs-title">text_turned_into_list</span>]
    <span class="hljs-title">stopwords</span> <span class="hljs-operator">=</span> <span class="hljs-title">nltk</span>.<span class="hljs-title">corpus</span>.<span class="hljs-title">stopwords</span>.<span class="hljs-title">words</span>(<span class="hljs-string">'english'</span>)
    <span class="hljs-title">new_words</span><span class="hljs-operator">=</span>[<span class="hljs-string">'i'</span>, <span class="hljs-string">'also'</span>, <span class="hljs-string">'much'</span>, <span class="hljs-string">'would'</span>, <span class="hljs-string">'by'</span>, <span class="hljs-string">'another'</span>, <span class="hljs-string">'could'</span>, <span class="hljs-string">'thou'</span>, <span class="hljs-string">'do'</span>]
    <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">i</span> <span class="hljs-title">in</span> <span class="hljs-title">new_words</span>:
        <span class="hljs-title">stopwords</span>.<span class="hljs-title">append</span>(<span class="hljs-title">i</span>)
    <span class="hljs-title">final_text</span> <span class="hljs-operator">=</span> [<span class="hljs-title">w</span> <span class="hljs-title"><span class="hljs-keyword">for</span></span> <span class="hljs-title">w</span> <span class="hljs-title">in</span> <span class="hljs-title">text_lower</span> <span class="hljs-title"><span class="hljs-keyword">if</span></span> <span class="hljs-title">not</span> <span class="hljs-title">w</span> <span class="hljs-title">in</span> <span class="hljs-title">stopwords</span>]

#<span class="hljs-title">instead</span> <span class="hljs-title">of</span> <span class="hljs-title">print</span>(), <span class="hljs-title">use</span> <span class="hljs-title">the</span> <span class="hljs-title"><span class="hljs-keyword">return</span></span>() <span class="hljs-title">command</span> <span class="hljs-title">in</span> <span class="hljs-title"><span class="hljs-built_in">this</span></span> <span class="hljs-title"><span class="hljs-keyword">function</span></span>    
    <span class="hljs-title"><span class="hljs-keyword">return</span></span>(<span class="hljs-title">final_text</span>)
</code></pre><p>There are other commonly used functions that you can leverage to clean up your primary source (or other) inputs. Many blog post share how to clean up strings and list, though not many have a history focus! The above steps are the ones I followed to create lists from historical text that would allow me to perform analysis on textual primary sources.  <a target="_blank" href="https://encodinghistory.com/using-python-to-trace-threads-through-history">See this post for a more complex discussion regarding functions that augmented my analysis of the gems and pearls in early modern history.</a> </p>
<p>Happy coding!</p>
]]></content:encoded></item></channel></rss>