<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>Snowflake on stdin</title><link>https://stdin.org/tags/snowflake/</link><description>Recent content in Snowflake on stdin</description><generator>Hugo -- 0.161.1</generator><language>en</language><copyright>Isaac Kunen</copyright><lastBuildDate>Tue, 06 Sep 2022 00:00:00 +0000</lastBuildDate><atom:link href="https://stdin.org/tags/snowflake/index.xml" rel="self" type="application/rss+xml"/><item><title>Au Revoir, Snowflake!</title><link>https://stdin.org/au-revoir-snowflake/</link><pubDate>Tue, 06 Sep 2022 00:00:00 +0000</pubDate><author>Isaac</author><guid>https://stdin.org/au-revoir-snowflake/</guid><description>&amp;lt;no value&amp;gt;</description><content type="text/html" mode="escaped"><![CDATA[<p>Just reading this blog, you might guess
that all I do is leave jobs. First leaving
<a href="https://stdin.org/a-leopard-cant-change-his-spots-but-he-may-change-jobs/">Tableau</a>,
and now, four years later, departing
<a href="https://www.snowflake.com/">Snowflake</a>.</p>
<p>I&rsquo;m incredibly proud of what we accomplished at Snowflake, particularly with
<a href="https://www.snowflake.com/snowpark/">Snowpark</a>. Snowpark not only expands what
customers and partners can do with the platform, but also
provides a lot of flexibility
for Snowflake itself. I expect this to pay dividends for a long time.</p>
<p>Moreover,
the Snowpark team &ndash; and Snowflake engineering in gereral &ndash; was absolutely
top notch and a joy to work with.</p>
<p>So why leave?</p>
<p>Certainly not because of the people or for lack of interesting work.
Nor for doubts in the company: Snowflake
is absolutley
<a href="https://www.cnbc.com/2022/08/24/snowflake-shares-soar-following-revenue-beat.html">crushing it</a>.
(And as a stockholder, I look forward to them continuing to crush it.)</p>
<p>This was a much more personal decison. I&rsquo;ve had a longstanding
ambivalence towards the software industry. Software
has provided me with a lot of interesting,
worthwhile problems to solve, and smart, engaging people to solve them with.
And it has paid the bills quite handsomly.</p>
<p>On the other hand, I&rsquo;ve always found myself drawn to the less practical side
of computing, mathematics, and the sciences &ndash; maybe it runs in
<a href="https://en.wikipedia.org/wiki/Kenneth_Kunen">the family</a>.
I was in academia once: a graduate student for all the wrong reasons,
and a poor one as a result. Now I&rsquo;m in a position to explore again, this time
with a bit more perspective.</p>
<p>Exactly how will this exploration play out? I have some ideas, but the
truth is that I&rsquo;m not yet entirely sure.</p>
<p>In the short term, my plans are to take a little time off, get a little
more involved in my kids&rsquo; schools, and start thinking about the future.
I&rsquo;ll also try to write a bit more about non-employment topics here,
as well as get some pictures posted on our new
<a href="https://kunen.net">family blog</a>.</p>
<p>Stay tuned!</p>
]]></content></item><item><title>Iterating Over Metadata With Snowpark</title><link>https://stdin.org/iterating-over-metadata-with-snowpark/</link><pubDate>Tue, 17 Aug 2021 00:00:00 +0000</pubDate><author>Isaac</author><guid>https://stdin.org/iterating-over-metadata-with-snowpark/</guid><description>&amp;lt;no value&amp;gt;</description><content type="text/html" mode="escaped"><![CDATA[<p><em>(This was ported from my original <a href="https://medium.com/snowflake/iterating-over-metadata-with-snowpark-aa59598169bf">Medium post</a>.)</em></p>
<p>Hi Folks,</p>
<p><a href="/basic-pii-detection-using-java/">Last time</a>
we saw how to create simple Java functions to detect and mask personally identifying information (PII). For example, we could take a table containing some email messages and mask out the PII in the bodies with a simple query:</p>
<p><img src="/assets/2021/08/iterating_1.png" alt="one masked column"></p>
<p>But let’s say we wanted to mask out all of the PII. And let’s say that we had many more fields like you might find in something like survey results.</p>
<p>In this case, masking out the PII would be easy, but tedious: we’d have to apply the function manually to each column. And if the schema of our table were to change &ndash; or if we wanted to run this masking routine on a different table &ndash; we’d have to rewrite the query.</p>
<p>What we’ve run into is a pretty fundamental limitation in SQL: the query is very tied to the underlying schema. There’s no way to pass a type parameter to the query or iterate over metadata.
<a href="https://docs.snowflake.com/en/developer-guide/snowpark/index.html">Snowpark</a>
doesn’t have this limitation: we can write code to inspect metadata and dynamically generate queries based on what we find.</p>
<p>To get started with Snowpark, you can follow the instructions on how to get it set up in your existing Scala development environment. Or you can follow the nice directions
<a href="https://medium.com/snowflake/from-zero-to-snowpark-in-5-minutes-72c5f8ec0b55">Zohar Nissare-Houssen has outlined here</a>
to get going using Docker.</p>
<p>Now using Snowpark for Scala, we can write a fully generic PII masking function:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="k">val</span> <span class="n">maskAllPii</span> <span class="k">=</span> <span class="o">(</span><span class="n">df</span><span class="k">:</span> <span class="kt">DataFrame</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="o">{</span>
</span></span><span class="line"><span class="cl">   <span class="k">val</span> <span class="n">toMask</span> <span class="k">=</span> <span class="n">df</span><span class="o">.</span><span class="n">schema</span>
</span></span><span class="line"><span class="cl">      <span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">dataType</span><span class="o">.</span><span class="n">typeName</span> <span class="o">==</span> <span class="s">&#34;String&#34;</span><span class="o">)</span>
</span></span><span class="line"><span class="cl">      <span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">name</span><span class="o">)</span>
</span></span><span class="line"><span class="cl">   <span class="n">df</span><span class="o">.</span><span class="n">withColumns</span><span class="o">(</span><span class="n">toMask</span><span class="o">,</span> 
</span></span><span class="line"><span class="cl">      <span class="n">toMask</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">c</span> <span class="k">=&gt;</span> <span class="n">callUDF</span><span class="o">(</span><span class="s">&#34;maskpii&#34;</span><span class="o">,</span> <span class="n">df</span><span class="o">.</span><span class="n">col</span><span class="o">(</span><span class="n">c</span><span class="o">))))</span>
</span></span><span class="line"><span class="cl"><span class="o">}</span>
</span></span></code></pre></div><p>This function takes in a DataFrame, inspects the schema, and applies the PII masking function we already have registered in Snowflake to each string column it finds, leaving non-string columns untouched. The result is just another DataFrame.</p>
<p>Now we can very easily run this on our email data…</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">maskAllPii</span><span class="o">(</span><span class="n">sess</span><span class="o">.</span><span class="n">table</span><span class="o">(</span><span class="s">&#34;emails&#34;</span><span class="o">))</span>
</span></span></code></pre></div><p>…and fetch the results:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-scala" data-lang="scala"><span class="line"><span class="cl"><span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span><span class="mi">100</span><span class="o">)</span>  <span class="c1">// get the first three lines, format wide
</span></span></span></code></pre></div><p><img src="/assets/2021/08/iterating_2.png" alt="all masked columns"></p>
<p>As you can see, the <code>maskAllPii()</code> call has touched all of the String columns. Under the covers, Snowpark has dynamically generated a plan that corresponds a SQL query:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="s2">&#34;ID&#34;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">       </span><span class="n">maskpii</span><span class="p">(</span><span class="s2">&#34;SENDER&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s2">&#34;SENDER&#34;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">       </span><span class="n">maskpii</span><span class="p">(</span><span class="s2">&#34;SUBJECT&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s2">&#34;SUBJECT&#34;</span><span class="p">,</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="w">       </span><span class="n">maskpii</span><span class="p">(</span><span class="s2">&#34;BODY&#34;</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s2">&#34;BODY&#34;</span><span class="w"> 
</span></span></span><span class="line"><span class="cl"><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="w"> </span><span class="k">SELECT</span><span class="w">  </span><span class="o">*</span><span class="w">  </span><span class="k">FROM</span><span class="w"> </span><span class="p">(</span><span class="n">emails</span><span class="p">))</span><span class="w">
</span></span></span></code></pre></div><p>When <code>show()</code> runs, it generates and issues the SQL,
wrapping this in an outer <code>LIMIT</code> clause and pretty-printing the result &ndash; that’s what <code>show()</code> does.</p>
<p>Of course, this query isn’t a hard one to write, though doing so does start to get a bit tedious as the column count goes up. And you have to do it again for each table or query you want to mask. Moreover, writing this yourself means more chances to make a mistake and miss a column.</p>
<p>In contrast, the Snowpark alternative is simple, robust, and reusable. And as a simple exercise, you can retool the example above to take a different function — or better yet, take an arbitrary function as a parameter.</p>
<p>Happy hacking!</p>
]]></content></item><item><title>Basic PII Detection and Masking in Snowflake Using Java</title><link>https://stdin.org/basic-pii-detection-using-java/</link><pubDate>Wed, 28 Jul 2021 00:00:00 +0000</pubDate><author>Isaac</author><guid>https://stdin.org/basic-pii-detection-using-java/</guid><description>&amp;lt;no value&amp;gt;</description><content type="text/html" mode="escaped"><![CDATA[<p><em>(This was ported from my original <a href="https://medium.com/snowflake/basic-pii-detection-and-masking-in-snowflake-using-java-1689ae63aa69">Medium post</a>.)</em></p>
<p>Hi Folks,</p>
<p>For my first foray into Medium, I wanted to share some code that I’ve used previously in demos. The examples here do basic detection and masking of personally-identifying information (PII) using Java’s built-in regular expression support.</p>
<p>Now, I make no assertion that these routines are good: if you really want to do robust PII detection, you probably want something more sophisticated than a few regexes. Snowflake is even working on
<a href="https://www.snowflake.com/blog/bringing-the-worlds-data-together-announcements-from-snowflake-summit/">data classification</a>
as a built-in feature.</p>
<p>But I like these examples because they do a good job of illustrating the basic pattern of Snowflake’s
<a href="https://docs.snowflake.com/en/developer-guide/udf/java/udf-java.html">Java functions</a>.
And they’re pretty malleable: you should be able to modify these examples to work for any situation where you need to detect or mask based on a set of regexes.</p>
<p>Let’s start with the code and then tear it apart. If you’re running on Snowflake and have Java functions enabled &ndash; any AWS account, for now &ndash; then you can define them right inline using this <code>create function</code>
command:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="n">haspii</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="nb">boolean</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">language</span><span class="w"> </span><span class="n">java</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;PIIDetector.hasPII&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">as</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">regex</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="k">public</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">PIIDetector</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">static</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="n">String</span><span class="p">[]</span><span class="w"> </span><span class="n">TARGETS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;\\d{3}-\\d{2}-\\d{4}&#34;</span><span class="p">,</span><span class="w">                 </span><span class="o">//</span><span class="w"> </span><span class="n">SSN</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}&#34;</span><span class="p">,</span><span class="w">  </span><span class="o">//</span><span class="w"> </span><span class="n">email</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[2-9]\\d{2}-\\d{3}-\\d{4}&#34;</span><span class="w">             </span><span class="o">//</span><span class="w"> </span><span class="n">phone</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="w"> </span><span class="n">patterns</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="n">PIIDetector</span><span class="p">()</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">patterns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TARGETS</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="n">patterns</span><span class="p">.</span><span class="k">add</span><span class="p">(</span><span class="n">Pattern</span><span class="p">.</span><span class="n">compile</span><span class="p">(</span><span class="n">s</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="nb">boolean</span><span class="w"> </span><span class="n">hasPII</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">Pattern</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">patterns</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">matcher</span><span class="p">(</span><span class="n">s</span><span class="p">).</span><span class="n">find</span><span class="p">())</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">                </span><span class="k">return</span><span class="w"> </span><span class="k">true</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="k">false</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span></code></pre></div><p>With this in hand, anyone with permissions on the function can issue queries that use it without any knowledge of Java:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">select</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">haspii</span><span class="p">(</span><span class="n">body</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">from</span><span class="w"> </span><span class="n">emails</span><span class="w">
</span></span></span></code></pre></div><p>So let’s take the definition apart. The first section defines how the function will show up in SQL:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="n">haspii</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="nb">boolean</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">language</span><span class="w"> </span><span class="n">java</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;PIIDetector.hasPII&#39;</span><span class="w">
</span></span></span></code></pre></div><p>Most of this is pretty self explanatory: it’s a function that takes a string and returns a Boolean, and the language is Java. The <code>null on null input</code> bit lets me skip any null handling in my routine: nulls inputs will be handled without calling into Java at all.</p>
<p>The <code>handler</code> directive is new, and specifies where in the Java code to actually make a call. You may have many potential entry points, but in this case, Snowflake is going to call the <code>hasPII</code> method defined on the <code>PIIDetector</code> class.</p>
<p>The actual Java code is contained between the pairs of dollar signs. After a little boilerplate, we see a few regular expressions:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="o">[]</span><span class="w"> </span><span class="n">TARGETS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s">&#34;\\d{3}-\\d{2}-\\d{4}&#34;</span><span class="p">,</span><span class="w">                 </span><span class="c1">// SSN</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s">&#34;[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}&#34;</span><span class="p">,</span><span class="w">  </span><span class="c1">// email</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="s">&#34;[2-9]\\d{2}-\\d{3}-\\d{4}&#34;</span><span class="w">             </span><span class="c1">// phone</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">};</span><span class="w">
</span></span></span></code></pre></div><p>These (highly USA-centric) expressions match the basic forms of Social Security numbers, email addresses, and phone numbers. You can very easily augment this list with more patterns to match your definition of PII.</p>
<p>Next, we see some initialization code:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="w"> </span><span class="n">patterns</span><span class="p">;</span><span class="kd">public</span><span class="w"> </span><span class="nf">PIIDetector</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">patterns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">for</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TARGETS</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">patterns</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">Pattern</span><span class="p">.</span><span class="na">compile</span><span class="p">(</span><span class="n">s</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>Our handler points to an instance method in the PIIDetector class. When Snowflake runs a query that requires an instance of this class, Snowflake will will look for a default constructor to use to generate this instance. This provides a really easy way to do one-time initialization: in this case we compile up the regular expressions so they’re ready to go once per query, rather than doing so on each invocation &ndash; it should be much faster.</p>
<p>Finally, we have the actual method we’re binding to:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-java" data-lang="java"><span class="line"><span class="cl"><span class="kd">public</span><span class="w"> </span><span class="kt">boolean</span><span class="w"> </span><span class="nf">hasPII</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">for</span><span class="p">(</span><span class="n">Pattern</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">patterns</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="na">matcher</span><span class="p">(</span><span class="n">s</span><span class="p">).</span><span class="na">find</span><span class="p">())</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="p">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="p">}</span><span class="w">
</span></span></span></code></pre></div><p>This just loops over the patterns and fires if any match. Easy peasy!</p>
<p>And there we have it: a simple PII detection routine that you can customize to your requirements (and local phone-number formats). But really, this is good for any situation where you have a number of regular expressions to match.</p>
<p>And with a little tweaking, you can mask out these matches instead. Here’s the code; I’ll let you dig into the details.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="cl"><span class="k">create</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="n">maskpii</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="n">string</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">language</span><span class="w"> </span><span class="n">java</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">returns</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="k">null</span><span class="w"> </span><span class="k">input</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;PIIDetector.maskPII&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">as</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="n">regex</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="n">import</span><span class="w"> </span><span class="n">java</span><span class="p">.</span><span class="n">util</span><span class="p">.</span><span class="o">*</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="k">public</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="n">PIIDetector</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">static</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="n">String</span><span class="p">[]</span><span class="w"> </span><span class="n">TARGETS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;\\d{3}-\\d{2}-\\d{4}&#34;</span><span class="p">,</span><span class="w">                 </span><span class="o">//</span><span class="w"> </span><span class="n">SSN</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[\\w-\\.]+@([\\w-]+\\.)+[\\w-]{2,4}&#34;</span><span class="p">,</span><span class="w">  </span><span class="o">//</span><span class="w"> </span><span class="n">email</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="s2">&#34;[2-9]\\d{2}-\\d{3}-\\d{4}&#34;</span><span class="w">             </span><span class="o">//</span><span class="w"> </span><span class="n">phone</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">static</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">MASK</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">&#34;###&#34;</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="w"> </span><span class="n">patterns</span><span class="p">;</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="n">PIIDetector</span><span class="p">()</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="n">patterns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Pattern</span><span class="o">&gt;</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">TARGETS</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="n">patterns</span><span class="p">.</span><span class="k">add</span><span class="p">(</span><span class="n">Pattern</span><span class="p">.</span><span class="n">compile</span><span class="p">(</span><span class="n">s</span><span class="p">));</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="k">public</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">maskPII</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">for</span><span class="p">(</span><span class="n">Pattern</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">patterns</span><span class="p">)</span><span class="w"> </span><span class="err">{</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">            </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">p</span><span class="p">.</span><span class="n">matcher</span><span class="p">(</span><span class="n">s</span><span class="p">).</span><span class="n">replaceAll</span><span class="p">(</span><span class="n">MASK</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="err">$$</span><span class="w">
</span></span></span></code></pre></div><p>Happy hacking!</p>
]]></content></item></channel></rss>