<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Snowpark on stdin</title><link>https://stdin.org/tags/snowpark/</link><description>Recent content in Snowpark on stdin</description><generator>Hugo -- 0.161.1</generator><language>en</language><copyright>Isaac Kunen</copyright><lastBuildDate>Tue, 17 Aug 2021 00:00:00 +0000</lastBuildDate><atom:link href="https://stdin.org/tags/snowpark/index.xml" rel="self" type="application/rss+xml"/><item><title>Iterating Over Metadata With Snowpark</title><link>https://stdin.org/iterating-over-metadata-with-snowpark/</link><pubDate>Tue, 17 Aug 2021 00:00:00 +0000</pubDate><author>Isaac</author><guid>https://stdin.org/iterating-over-metadata-with-snowpark/</guid><description>&amp;lt;no value&amp;gt;</description><content type="text/html" mode="escaped"><![CDATA[<p><em>(This was ported from my original <a href="https://medium.com/snowflake/iterating-over-metadata-with-snowpark-aa59598169bf">Medium post</a>.)</em></p>
<p>Hi Folks,</p>
<p><a href="/basic-pii-detection-using-java/">Last time</a>
we saw how to create simple Java functions to detect and mask personally identifying information (PII). For example, we could take a table containing some email messages and mask out the PII in the bodies with a simple query:</p>
<p><img src="/assets/2021/08/iterating_1.png" alt="one masked column"></p>
<p>But let’s say we wanted to mask out all of the PII. And let’s say that we had many more fields like you might find in something like survey results.</p>
<p>In this case, masking out the PII would be easy, but tedious: we’d have to apply the function manually to each column. And if the schema of our table were to change &ndash; or if we wanted to run this masking routine on a different table &ndash; we’d have to rewrite the query.</p>
<p>What we’ve run into is a pretty fundamental limitation in SQL: the query is very tied to the underlying schema. There’s no way to pass a type parameter to the query or iterate over metadata.
<a href="https://docs.snowflake.com/en/developer-guide/snowpark/index.html">Snowpark</a>
doesn’t have this limitation: we can write code to inspect metadata and dynamically generate queries based on what we find.</p>
<p>To get started with Snowpark, you can follow the instructions on how to get it set up in your existing Scala development environment. Or you can follow the nice directions
<a href="https://medium.com/snowflake/from-zero-to-snowpark-in-5-minutes-72c5f8ec0b55">Zohar Nissare-Houssen has outlined here</a>
to get going using Docker.</p>
<p>Now using Snowpark for Scala, we can write a fully generic PII masking function:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="k">val</span> <span class="n">maskAllPii</span> <span class="k">=</span> <span class="o">(</span><span class="n">df</span><span class="k">:</span> <span class="kt">DataFrame</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="o">{</span>
</span></span><span class="line"><span class="cl">   <span class="k">val</span> <span class="n">toMask</span> <span class="k">=</span> <span class="n">df</span><span class="o">.</span><span class="n">schema</span>
</span></span><span class="line"><span class="cl">      <span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">dataType</span><span class="o">.</span><span class="n">typeName</span> <span class="o">==</span> <span class="s">&#34;String&#34;</span><span class="o">)</span>
</span></span><span class="line"><span class="cl">      <span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">name</span><span class="o">)</span>
</span></span><span class="line"><span class="cl">   <span class="n">df</span><span class="o">.</span><span class="n">withColumns</span><span class="o">(</span><span class="n">toMask</span><span class="o">,</span> 
</span></span><span class="line"><span class="cl">      <span class="n">toMask</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">c</span> <span class="k">=&gt;</span> <span class="n">callUDF</span><span class="o">(</span><span class="s">&#34;maskpii&#34;</span><span class="o">,</span> <span class="n">df</span><span class="o">.</span><span class="n">col</span><span class="o">(</span><span class="n">c</span><span class="o">))))</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span></code></pre></div><p>This function takes in a DataFrame, inspects the schema, and applies the PII masking function we already have registered in Snowflake to each string column it finds, leaving non-string columns untouched. The result is just another DataFrame.</p>
<p>Now we can very easily run this on our email data…</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">maskAllPii</span><span class="o">(</span><span class="n">sess</span><span class="o">.</span><span class="n">table</span><span class="o">(</span><span class="s">&#34;emails&#34;</span><span class="o">))</span>
</span></span></code></pre></div><p>…and fetch the results:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span><span class="mi">100</span><span class="o">)</span>  <span class="c1">// get the first three lines, format wide
</span></span></span></code></pre></div><p><img src="/assets/2021/08/iterating_2.png" alt="all masked columns"></p>
<p>As you can see, the <code>maskAllPii()</code> call has touched all of the String columns. Under the covers, Snowpark has dynamically generated a plan that corresponds a SQL query:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="s2">&#34;ID&#34;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">       </span><span class="n">maskpii</span><span class="p">(</span><span class="s2">&#34;SENDER&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s2">&#34;SENDER&#34;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">       </span><span class="n">maskpii</span><span class="p">(</span><span class="s2">&#34;SUBJECT&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s2">&#34;SUBJECT&#34;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">       </span><span class="n">maskpii</span><span class="p">(</span><span class="s2">&#34;BODY&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s2">&#34;BODY&#34;</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="k">SELECT</span><span class="w">  </span><span class="o">*</span><span class="w">  </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="n">emails</span><span class="p">))</span><span class="w">
</span></span></span></code></pre></div><p>When <code>show()</code> runs, it generates and issues the SQL,
wrapping this in an outer <code>LIMIT</code> clause and pretty-printing the result &ndash; that’s what <code>show()</code> does.</p>
<p>Of course, this query isn’t a hard one to write, though doing so does start to get a bit tedious as the column count goes up. And you have to do it again for each table or query you want to mask. Moreover, writing this yourself means more chances to make a mistake and miss a column.</p>
<p>In contrast, the Snowpark alternative is simple, robust, and reusable. And as a simple exercise, you can retool the example above to take a different function — or better yet, take an arbitrary function as a parameter.</p>
<p>Happy hacking!</p>
]]></content></item><item><title>Basic PII Detection and Masking in Snowflake Using Java</title><link>https://stdin.org/basic-pii-detection-using-java/</link><pubDate>Wed, 28 Jul 2021 00:00:00 +0000</pubDate><author>Isaac</author><guid>https://stdin.org/basic-pii-detection-using-java/</guid><description>&amp;lt;no value&amp;gt;</description><content type="text/html" mode="escaped"><![CDATA[<p><em>(This was ported from my original <a href="https://medium.com/snowflake/basic-pii-detection-and-masking-in-snowflake-using-java-1689ae63aa69">Medium post</a>.)</em></p>
<p>Hi Folks,</p>
<p>For my first foray into Medium, I wanted to share some code that I’ve used previously in demos. The examples here do basic detection and masking of personally-identifying information (PII) using Java’s built-in regular expression support.</p>
<p>Now, I make no assertion that these routines are good: if you really want to do robust PII detection, you probably want something more sophisticated than a few regexes. Snowflake is even working on
<a href="https://www.snowflake.com/blog/bringing-the-worlds-data-together-announcements-from-snowflake-summit/">data classification</a>
as a built-in feature.</p>
<p>But I like these examples because they do a good job of illustrating the basic pattern of Snowflake’s
<a href="https://docs.snowflake.com/en/developer-guide/udf/java/udf-java.html">Java functions</a>.
And they’re pretty malleable: you should be able to modify these examples to work for any situation where you need to detect or mask based on a set of regexes.</p>
<p>Let’s start with the code and then tear it apart. If you’re running on Snowflake and have Java functions enabled &ndash; any AWS account, for now &ndash; then you can define them right inline using this <code>create function</code>
command:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="n">haspii</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="nb">boolean</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">language</span><span class="w"> </span><span class="n">java</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;PIIDetector.hasPII&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">as</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">regex</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="k">public</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">PIIDetector</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">static</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="n">String</span><span class="p">[]</span><span class="w"> </span><span class="n">TARGETS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;\\d{3}-\\d{2}-\\d{4}&#34;</span><span class="p">,</span><span class="w">                 </span><span class="o">//</span><span class="w"> </span><span class="n">SSN</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}&#34;</span><span class="p">,</span><span class="w">  </span><span class="o">//</span><span class="w"> </span><span class="n">email</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[2-9]\\d{2}-\\d{3}-\\d{4}&#34;</span><span class="w">             </span><span class="o">//</span><span class="w"> </span><span class="n">phone</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="w"> </span><span class="n">patterns</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="n">PIIDetector</span><span class="p">()</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">patterns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TARGETS</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="n">patterns</span><span class="p">.</span><span class="k">add</span><span class="p">(</span><span class="n">Pattern</span><span class="p">.</span><span class="n">compile</span><span class="p">(</span><span class="n">s</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="nb">boolean</span><span class="w"> </span><span class="n">hasPII</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">Pattern</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">patterns</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">matcher</span><span class="p">(</span><span class="n">s</span><span class="p">).</span><span class="n">find</span><span class="p">())</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="k">return</span><span class="w"> </span><span class="k">true</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="k">false</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span></code></pre></div><p>With this in hand, anyone with permissions on the function can issue queries that use it without any knowledge of Java:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">select</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">haspii</span><span class="p">(</span><span class="n">body</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">from</span><span class="w"> </span><span class="n">emails</span><span class="w">
</span></span></span></code></pre></div><p>So let’s take the definition apart. The first section defines how the function will show up in SQL:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="n">haspii</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="nb">boolean</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">language</span><span class="w"> </span><span class="n">java</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;PIIDetector.hasPII&#39;</span><span class="w">
</span></span></span></code></pre></div><p>Most of this is pretty self explanatory: it’s a function that takes a string and returns a Boolean, and the language is Java. The <code>null on null input</code> bit lets me skip any null handling in my routine: nulls inputs will be handled without calling into Java at all.</p>
<p>The <code>handler</code> directive is new, and specifies where in the Java code to actually make a call. You may have many potential entry points, but in this case, Snowflake is going to call the <code>hasPII</code> method defined on the <code>PIIDetector</code> class.</p>
<p>The actual Java code is contained between the pairs of dollar signs. After a little boilerplate, we see a few regular expressions:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="o">[]</span><span class="w"> </span><span class="n">TARGETS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s">&#34;\\d{3}-\\d{2}-\\d{4}&#34;</span><span class="p">,</span><span class="w">                 </span><span class="c1">// SSN</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s">&#34;[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}&#34;</span><span class="p">,</span><span class="w">  </span><span class="c1">// email</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s">&#34;[2-9]\\d{2}-\\d{3}-\\d{4}&#34;</span><span class="w">             </span><span class="c1">// phone</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">};</span><span class="w">
</span></span></span></code></pre></div><p>These (highly USA-centric) expressions match the basic forms of Social Security numbers, email addresses, and phone numbers. You can very easily augment this list with more patterns to match your definition of PII.</p>
<p>Next, we see some initialization code:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="w"> </span><span class="n">patterns</span><span class="p">;</span><span class="kd">public</span><span class="w"> </span><span class="nf">PIIDetector</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">patterns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">for</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TARGETS</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">patterns</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">Pattern</span><span class="p">.</span><span class="na">compile</span><span class="p">(</span><span class="n">s</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Our handler points to an instance method in the PIIDetector class. When Snowflake runs a query that requires an instance of this class, Snowflake will will look for a default constructor to use to generate this instance. This provides a really easy way to do one-time initialization: in this case we compile up the regular expressions so they’re ready to go once per query, rather than doing so on each invocation &ndash; it should be much faster.</p>
<p>Finally, we have the actual method we’re binding to:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kt">boolean</span><span class="w"> </span><span class="nf">hasPII</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">for</span><span class="p">(</span><span class="n">Pattern</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">patterns</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="na">matcher</span><span class="p">(</span><span class="n">s</span><span class="p">).</span><span class="na">find</span><span class="p">())</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This just loops over the patterns and fires if any match. Easy peasy!</p>
<p>And there we have it: a simple PII detection routine that you can customize to your requirements (and local phone-number formats). But really, this is good for any situation where you have a number of regular expressions to match.</p>
<p>And with a little tweaking, you can mask out these matches instead. Here’s the code; I’ll let you dig into the details.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="n">maskpii</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="n">string</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">language</span><span class="w"> </span><span class="n">java</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;PIIDetector.maskPII&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">as</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">regex</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">public</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">PIIDetector</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">static</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="n">String</span><span class="p">[]</span><span class="w"> </span><span class="n">TARGETS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;\\d{3}-\\d{2}-\\d{4}&#34;</span><span class="p">,</span><span class="w">                 </span><span class="o">//</span><span class="w"> </span><span class="n">SSN</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}&#34;</span><span class="p">,</span><span class="w">  </span><span class="o">//</span><span class="w"> </span><span class="n">email</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[2-9]\\d{2}-\\d{3}-\\d{4}&#34;</span><span class="w">             </span><span class="o">//</span><span class="w"> </span><span class="n">phone</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">static</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">MASK</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">&#34;###&#34;</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="w"> </span><span class="n">patterns</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="n">PIIDetector</span><span class="p">()</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">patterns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TARGETS</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="n">patterns</span><span class="p">.</span><span class="k">add</span><span class="p">(</span><span class="n">Pattern</span><span class="p">.</span><span class="n">compile</span><span class="p">(</span><span class="n">s</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">maskPII</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">Pattern</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">patterns</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="n">matcher</span><span class="p">(</span><span class="n">s</span><span class="p">).</span><span class="n">replaceAll</span><span class="p">(</span><span class="n">MASK</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span></code></pre></div><p>Happy hacking!</p>
]]></content></item></channel></rss>