<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ugosan.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ugosan.org/" rel="alternate" type="text/html" /><updated>2026-02-13T01:42:56+00:00</updated><id>https://ugosan.org/feed.xml</id><title type="html">ugosan</title><subtitle>public notes</subtitle><entry><title type="html">Grep search tip</title><link href="https://ugosan.org/Grep-wirn/" rel="alternate" type="text/html" title="Grep search tip" /><published>2024-03-29T00:00:00+00:00</published><updated>2024-03-29T00:00:00+00:00</updated><id>https://ugosan.org/Grep-wirn</id><content type="html" xml:base="https://ugosan.org/Grep-wirn/"><![CDATA[<p>Search for a string in specific files within the current directory and its children, with context lines before and after the term and excluding certain directories.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>grep -wirn ${PWD} \
  --include=\*.md \
  -e 'foo bar'    \
  -C3             \
  --exclude-dir=libraries
</code></pre></div></div>]]></content><author><name></name></author><category term="grep" /><summary type="html"><![CDATA[Search for a string in specific files within a directory, with context and excluding certain directories.]]></summary></entry><entry><title type="html">Using virtualenv on Jupyter Notebook</title><link href="https://ugosan.org/jupyter-virtualenv/" rel="alternate" type="text/html" title="Using virtualenv on Jupyter Notebook" /><published>2024-01-19T00:00:00+00:00</published><updated>2024-01-19T00:00:00+00:00</updated><id>https://ugosan.org/jupyter-virtualenv</id><content type="html" xml:base="https://ugosan.org/jupyter-virtualenv/"><![CDATA[<ol>
  <li>
    <p>Create a virtual environment: <code class="language-plaintext highlighter-rouge">virtualenv venv</code></p>
  </li>
  <li>
    <p>Activate it: <code class="language-plaintext highlighter-rouge">source ./venv/bin/activate</code></p>
  </li>
  <li>
    <p>Install ipykernel: <code class="language-plaintext highlighter-rouge">python -m pip install ipykernel</code></p>
  </li>
  <li>
    <p>Add a Jupyter kernel: <code class="language-plaintext highlighter-rouge">python -m ipykernel install --user --name venv</code></p>
  </li>
  <li>
    <p>Launch Jupyter Notebook: <code class="language-plaintext highlighter-rouge">jupyter notebook</code></p>
  </li>
</ol>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>virtualenv venv
source ./venv/bin/activate
python -m pip install ipykernel
python -m ipykernel install --user --name venv
jupyter notebook

</code></pre></div></div>]]></content><author><name></name></author><category term="virtualenv" /><category term="Jupyter" /><category term="Notebook" /><summary type="html"><![CDATA[Using virtualenv on Jupyter Notebook]]></summary></entry><entry><title type="html">Authenticating to GKE without gcloud CLI</title><link href="https://ugosan.org/Authenticating-to-GKE-without-gcloud/" rel="alternate" type="text/html" title="Authenticating to GKE without gcloud CLI" /><published>2022-11-15T00:00:00+00:00</published><updated>2022-11-15T00:00:00+00:00</updated><id>https://ugosan.org/Authenticating-to-GKE-without-gcloud</id><content type="html" xml:base="https://ugosan.org/Authenticating-to-GKE-without-gcloud/"><![CDATA[<p>How to authenticate to GKE without gcloud CLI</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create serviceaccount k8sadmin -n kube-system

kubectl create clusterrolebinding k8sadmin --clusterrole=cluster-admin --serviceaccount=kube-system:k8sadmin

kubectl create token k8sadmin -n=kube-system

kubectl config set-credentials k8sadmin-token --token=&lt;YOUR-TOKENHERE&gt;

kubectl config set-context gke_elastic-pme-team_us-central1-a_ugo-cluster-arm  --cluster=GKE_CLUSTER_NAME --user=k8sadmin-token

</code></pre></div></div>]]></content><author><name></name></author><category term="kubernetes" /><category term="gcloud" /><summary type="html"><![CDATA[How to authenticate to GKE without gcloud CLI]]></summary></entry><entry><title type="html">Debugging Logstash and Filebeat pipelines with Logshark</title><link href="https://ugosan.org/Logshark-getting-started/" rel="alternate" type="text/html" title="Debugging Logstash and Filebeat pipelines with Logshark" /><published>2022-11-14T00:00:00+00:00</published><updated>2022-11-14T00:00:00+00:00</updated><id>https://ugosan.org/Logshark-getting-started</id><content type="html" xml:base="https://ugosan.org/Logshark-getting-started/"><![CDATA[<p>I worked for many years as a <a href="https://www.elastic.co/consulting">consultant in Elastic</a>, and coding pipelines in Logstash and Filebeat for large Observability use cases that ingested terabytes worth of logs every day was our bread and butter.</p>

<p>Coding pipelines using those tools (and others) is a highly iterative process, specially when dealing with grok patterns to parse unstructured logs: you get some sample data, feed it to an input, and you will be repeating 1) coding the pipeline logic (<em>filters</em> in Logstash and <em>processors</em> in Filebeat) and 2) inspecting the output, until the logs are correctly parsed.</p>

<p>I always felt this iteration cycle of changing the pipeline and inspecting the output to be somewhat slow - sure you have the <code class="language-plaintext highlighter-rouge">console</code> output in both Logstash and Filebeat, but you end up with a mixed of those programs’ outputs and <em>your</em> output, you will be surely scrolling a lot. And sure, you also have the <code class="language-plaintext highlighter-rouge">file</code> output on both tools but its easy to get lost when dealing with documents with hundreds of fields, since they are all written in a single line per document, no pretty-print.</p>

<p>I needed a way to tell right away if the output of my pipeline was correct or not, to have <mark>pretty printed</mark> and <mark>navigable</mark> output were my main requirements, if I had that my development iteration would be so much faster! Such tool didn’t existed, so I’ve created it and called it <a href="https://github.com/ugosan/logshark/">Logshark</a> (inspired by Wireshark, a popular network inspection tool)</p>

<div class="header">
<img src="https://github.com/ugosan/logshark/raw/main/_doc/demo.gif" width="70%" />
</div>

<p>It’s a CLI application with a terminal UI written in Go. It works by starting a small webserver that mimmicks Elasticsearch’s behavior by accepting <code class="language-plaintext highlighter-rouge">_bulk</code> requests, so all you need to do is to redirect your Logstash/Filebeat <code class="language-plaintext highlighter-rouge">elasticsearch</code> output to the tool.</p>

<p>This tool is particularly handy when changing <em>production</em> pipelines, as you can add a second elasticsearch output to your pipeline just to inspect the events, it will by default collect the first 100 events it sees and accept but discard the rest, you can inspect the next batch by hitting <code class="language-plaintext highlighter-rouge">r</code> to refresh it.</p>

<p>It will also tell you <em>events per second</em> and the <em>average document size</em>, which are handy when you need to optimize your throughput by adjusting the bulk/batch size, something really important if, say, your are collecting logs from a machine in the southern hemisphere to send to an elasticsearch cluster in the northern.</p>

<p>You can run it using the <a href="https://github.com/ugosan/logshark/releases">binary directly</a> (&lt;5mb) or on <a href="https://github.com/ugosan/logshark/blob/main/docker-compose.yml">docker</a>. The UI can be used on anything that can emulate a terminal, like your regular Linux terminal, iTerm, tmux, PuTTY and even VSCode.</p>

<h2 id="getting-started">Getting Started</h2>

<h2 id="1-start-the-server">1) Start the server</h2>

<h3 id="binary">binary</h3>

<div class="language-perl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">./</span><span class="nv">logshark</span> <span class="o">--</span><span class="nv">host</span> <span class="mf">0.0.0.0</span> <span class="o">--</span><span class="nv">port</span> <span class="mi">9200</span> <span class="o">--</span><span class="nv">max</span> <span class="mi">1000</span>
</code></pre></div></div>

<h3 id="docker">docker</h3>

<div class="language-perl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">docker</span> <span class="nv">run</span> <span class="o">-</span><span class="nv">p</span> <span class="mi">9200</span><span class="p">:</span><span class="mi">9200</span> <span class="o">-</span><span class="nv">it</span> <span class="nv">ugosan</span><span class="o">/</span><span class="nv">logshark</span> <span class="o">-</span><span class="nv">host</span> <span class="mf">0.0.0.0</span> <span class="o">-</span><span class="nv">port</span> <span class="mi">9200</span>
</code></pre></div></div>

<h3 id="docker-composeyml">docker-compose.yml</h3>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">version</span><span class="pi">:</span> <span class="s2">"</span><span class="s">3.2"</span>
<span class="na">services</span><span class="pi">:</span>

  <span class="na">logshark</span><span class="pi">:</span>
    <span class="na">image</span><span class="pi">:</span> <span class="s">ugosan/logshark</span>
    <span class="na">tty</span><span class="pi">:</span> <span class="no">true</span>
    <span class="na">stdin_open</span><span class="pi">:</span> <span class="no">true</span>
</code></pre></div></div>
<p><mark>note</mark> you should not use “docker-compose up” but instead “docker-compose run logshark sh” since docker-compose doesnt attach to containers with “up”. e.g. docker-compose run -p 9200:9200 logshark -port 9200</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker-compose run -p 9200:9200 logshark -port 9200
</code></pre></div></div>

<h2 id="2-point-your-logstash-pipelines-output-to-it">2) Point your Logstash pipeline’s output to it</h2>

<p>Just like a normal <code class="language-plaintext highlighter-rouge">elasticsearch</code> output:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input</span> <span class="p">{}</span>

<span class="n">filter</span> <span class="p">{}</span>

<span class="n">output</span> <span class="p">{</span>
  <span class="n">elasticsearch</span> <span class="p">{</span>
    <span class="n">hosts</span> <span class="o">=&gt;</span> <span class="p">[</span><span class="s2">"http://host.docker.internal:9200"</span><span class="p">]</span>
  <span class="p">}</span>
  
<span class="p">}</span>   
</code></pre></div></div>

<p>When using docker, you can reach the logshark container from another container using <code class="language-plaintext highlighter-rouge">host.docker.internal</code> like <code class="language-plaintext highlighter-rouge">docker run --rm byrnedo/alpine-curl -v -XPOST -d '{"hello":"test"}' http://host.docker.internal:9200</code></p>]]></content><author><name></name></author><category term="logstash" /><category term="filebeat" /><category term="logshark" /><summary type="html"><![CDATA[Debugging Logstash and Filebeat pipelines with Logshark]]></summary></entry><entry><title type="html">Test all fonts in Figlet</title><link href="https://ugosan.org/Figlet-and-lolcat/" rel="alternate" type="text/html" title="Test all fonts in Figlet" /><published>2022-10-12T00:00:00+00:00</published><updated>2022-10-12T00:00:00+00:00</updated><id>https://ugosan.org/Figlet-and-lolcat</id><content type="html" xml:base="https://ugosan.org/Figlet-and-lolcat/"><![CDATA[<div class="header">
<img src="/images/figlet.gif" width="100%" />
</div>

<p>On Mac, using Homebrew</p>

<p>Install <a href="http://www.figlet.org/">figlet</a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew install figlet
</code></pre></div></div>

<p>Install <a href="https://github.com/busyloop/lolcat">lolcat</a> because why not?</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew install lolcat
</code></pre></div></div>

<p>Find out your font directory using <code class="language-plaintext highlighter-rouge">brew list figlet</code>, then based on the installed version (mine was 2.2.5) do:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for file in /opt/homebrew/Cellar/figlet/2.2.5/share/figlet/fonts/*.flf;\
do echo "\n\n$file"; figlet -c -w 150 -f "$file" Hello World |\
lolcat; done
</code></pre></div></div>]]></content><author><name></name></author><category term="figlet" /><category term="lolcat" /><summary type="html"><![CDATA[Using Figlet and Lolcat]]></summary></entry><entry><title type="html">A Short Emergency Response Guide for Elasticsearch</title><link href="https://ugosan.org/Short-Elasticsearch-emergency-guide/" rel="alternate" type="text/html" title="A Short Emergency Response Guide for Elasticsearch" /><published>2022-09-27T00:00:00+00:00</published><updated>2022-09-27T00:00:00+00:00</updated><id>https://ugosan.org/Short-Elasticsearch-emergency-guide</id><content type="html" xml:base="https://ugosan.org/Short-Elasticsearch-emergency-guide/"><![CDATA[<h2 id="help-production-is-on-fire">Help! Production is on fire!</h2>

<p>Our job is first to identify what exactly is on fire, and for that we need to get as much <mark>context</mark> as possible, as fast as possible.</p>

<h4 id="step-1-check-_catnodes-for-resource-usage"><mark-soft>Step 1:</mark-soft> Check <code class="language-plaintext highlighter-rouge">_cat/nodes</code> for resource usage</h4>

<p>Lets first check which nodes are actually in trouble.</p>

<p>Run:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GET</span> <span class="nx">_cat</span><span class="o">/</span><span class="nx">nodes</span><span class="p">?</span><span class="nx">v</span><span class="o">&amp;</span><span class="nx">h</span><span class="o">=</span><span class="nx">ip</span><span class="p">,</span><span class="nx">name</span><span class="p">,</span><span class="nx">cpu</span><span class="p">,</span><span class="nx">ram</span><span class="p">.</span><span class="nx">max</span><span class="p">,</span><span class="nx">heap</span><span class="p">.</span><span class="nx">max</span><span class="p">,</span><span class="nx">heap</span><span class="p">.</span><span class="nx">percent</span><span class="p">,</span><span class="nx">node</span><span class="p">.</span><span class="nx">role</span><span class="p">,</span><span class="nx">diskAvail</span><span class="p">,</span><span class="nx">master</span>
</code></pre></div></div>

<p>The output looks like this.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  master ip           name                cpu ram.max heap.max heap.percent node.role diskAvail
  -      10.47.48.170 instance-0000000009  26     2gb    844mb           60 himrst       48.7gb
  -      10.47.48.127 instance-0000000008  16     2gb    844mb           64 himrst       52.7gb
  *      10.47.48.118 instance-0000000003   7     2gb    844mb           69 himrst         50gb
  -      10.47.48.61  instance-0000000005   0     1gb    268mb           41 lr            1.8gb
</code></pre></div></div>

<p>If we notice a high CPU usage on a group of nodes, verify the <code class="language-plaintext highlighter-rouge">node.role</code> of those roles, if you have data tiers (a hot-warm architecture) it might be that only <code class="language-plaintext highlighter-rouge">warm</code> nodes are high, and <code class="language-plaintext highlighter-rouge">hot</code> nodes are unaffected.</p>

<p>There are more variables you can check like <code class="language-plaintext highlighter-rouge">search.query_current</code> and <code class="language-plaintext highlighter-rouge">search.fetch_current</code> which will show us the amount of time is being currently spent for search in query and fetch phases respectively. <code class="language-plaintext highlighter-rouge">GET _cat/nodes?help</code> is your friend</p>

<h4 id="step-2-check-the-hot-threads"><mark>Step 2:</mark> Check the hot threads</h4>

<p>This API yields a breakdown of the hot threads on each selected node in the cluster. The output is plain text with a breakdown of each node’s top hot threads.</p>

<p>Lets sample them by 1 second:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GET</span> <span class="nx">_nodes</span><span class="o">/</span><span class="nx">hot_threads</span><span class="p">?</span><span class="nx">interval</span><span class="o">=</span><span class="mi">1</span><span class="nx">s</span><span class="o">&amp;</span><span class="nx">ignore_idle_threads</span>
</code></pre></div></div>

<p>The output will be something like this, with only the important parts:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>::: {warm-xxx}{XXXXXXX}{YYYYYYYY}{10.10.10.10}{10.10.10.10:9300}{aws_availability_zone=us-west-2b, data_type=warm, ml.machine_memory=64388997120, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
   Hot threads at 2019-12-30T23:22:24.304Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   44.0% (440.2ms out of 1000ms) cpu usage by thread 'elasticsearch[warm-xxx][management][T#1]'
      ... omitted ...

   42.7% (413.4ms out of 1000ms) cpu usage by thread 'elasticsearch[warm-xxx][search][T#2]'                    
     ... omitted ...

   41.8% (408.9ms out of 1000ms) cpu usage by thread 'elasticsearch[warm-xxx][search][T#7]'
     ... omitted ...

</code></pre></div></div>

<h4 id="step-3-check-tasks"><mark>Step 3:</mark> Check Tasks</h4>

<p>The <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html">Task Management API</a> can give you a lot of information about the operations being executed at the cluster at any given time.</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GET</span> <span class="nx">_tasks</span><span class="p">?</span><span class="nx">human</span><span class="o">&amp;</span><span class="nx">detailed</span>
</code></pre></div></div>

<p>They can be filtered to include only read (search) operations</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">GET</span> <span class="nx">_tasks</span><span class="p">?</span><span class="nx">human</span><span class="o">&amp;</span><span class="nx">detailed</span><span class="o">&amp;</span><span class="nx">actions</span><span class="o">=</span><span class="nx">indices</span><span class="p">:</span><span class="nx">data</span><span class="o">/</span><span class="nx">read</span><span class="cm">/*
</span></code></pre></div></div>

<p>The output will show us the <code class="language-plaintext highlighter-rouge">running_time</code> and, in case of searches, the content of the specific search request being executed:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>      "0l67b6iLSzmp7v3TNtkjbQ:138659665" : {
          "node" : "0l67b6iLSzmp7v3TNtkjbQ",
          "id" : 138659665,
          "type" : "transport",
          "action" : "indices:data/read/search",
          "description" : "indices[tasks], types[], search_type[QUERY_THEN_FETCH], source[{\"size\":0,\"query\":{\"bool\":{\"must\":[{\"terms\":{\"User.Actions.ActionId\":[\"f6583dbd-4079-4efd-80c4-28e3f0606c1f\",\"2f80c480-18a4-4079-4efd-d3bdf9361164\",
          ...
          "start_time" : "2022-06-28T15:27:20.766Z",
          "start_time_in_millis" : 1656430040766,
          "running_time" : "36.6s",
          "running_time_in_nanos" : 36665783627,
</code></pre></div></div>

<p>The task <code class="language-plaintext highlighter-rouge">0l67b6iLSzmp7v3TNtkjbQ:138659665</code> is a <strong>search</strong> task that has been running for 36 seconds, its <code class="language-plaintext highlighter-rouge">source</code> is also there, which might help identify the culprit.</p>

<h4 id="step-4-cancel-long-running-tasks"><mark>Step 4:</mark> Cancel long running tasks</h4>

<p>If you see queries that are impacting performance for too long and you want to cancel them, you can with <code class="language-plaintext highlighter-rouge">PUT _tasks/&lt;id&gt;/_cancel</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>POST _tasks/0l67b6iLSzmp7v3TNtkjbQ:138659665/_cancel
</code></pre></div></div>

<p>However if you find yourself cancelling tasks every day, your team should really rethink their data structure, query design or cluster size.</p>

<h3 id="problem-solved-lets-get-a-little-bit-more-proactive-by">Problem solved? Lets get a little bit more proactive by:</h3>

<p>1) having a dedicated <a href="https://www.elastic.co/guide/en/elasticsearch/reference/8.3/monitor-elasticsearch-cluster.html">Monitoring Cluster</a></p>

<ul>
  <li>In Elastic Cloud that is as easy as creating a secondary (small) cluster <a href="https://www.elastic.co/blog/monitoring-elastic-cloud-deployment-logs-and-metrics">and pointing Logs and Metrics over there</a>.</li>
  <li>In on-premises clusters you need to start an instance of Metricbeat and <a href="https://www.elastic.co/guide/en/elasticsearch/reference/8.3/configuring-metricbeat.html">point it to your production cluster</a>:
    <ul>
      <li>
        <p>The elasticsearch module will fetch monitoring info from your <code class="language-plaintext highlighter-rouge">host</code> with the following configuration (metricbeat.yml):</p>

        <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="na">metricbeat.modules</span><span class="pi">:</span>
 <span class="pi">-</span> <span class="na">module</span><span class="pi">:</span> <span class="s">elasticsearch</span>
   <span class="na">xpack.enabled</span><span class="pi">:</span> <span class="no">true</span>
   <span class="na">period</span><span class="pi">:</span> <span class="s">10s</span>
   <span class="na">hosts</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">https://prod-cluster:9200"</span><span class="pi">]</span> 
   <span class="na">scope</span><span class="pi">:</span> <span class="s">cluster</span>
   <span class="na">username</span><span class="pi">:</span> <span class="s2">"</span><span class="s">user"</span>
   <span class="na">password</span><span class="pi">:</span> <span class="s2">"</span><span class="s">secret"</span>
   <span class="na">ssl.enabled</span><span class="pi">:</span> <span class="no">true</span>
   <span class="na">ssl.certificate_authorities</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">/etc/pki/root/ca.pem"</span><span class="pi">]</span>
   <span class="na">ssl.verification_mode</span><span class="pi">:</span> <span class="s2">"</span><span class="s">certificate"</span>

 <span class="na">output.elasticsearch</span><span class="pi">:</span>
   <span class="na">hosts</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">https://my-monitoring-cluster:9200"</span><span class="pi">]</span>
   <span class="na">username</span><span class="pi">:</span> <span class="s2">"</span><span class="s">metricbeat_writer"</span>
   <span class="na">password</span><span class="pi">:</span> <span class="s2">"</span><span class="s">secret"</span>
</code></pre></div>        </div>
      </li>
    </ul>
  </li>
</ul>

<p>2) Enabling <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html">Slow Logs</a></p>]]></content><author><name></name></author><category term="elasticsearch" /><summary type="html"><![CDATA[Help! Production is on fire!]]></summary></entry><entry><title type="html">Using Regex groups in Logstash’s Gsub</title><link href="https://ugosan.org/Using-Regex-groups-in-Logstash-Gsub-regex/" rel="alternate" type="text/html" title="Using Regex groups in Logstash’s Gsub" /><published>2022-09-26T00:00:00+00:00</published><updated>2022-09-26T00:00:00+00:00</updated><id>https://ugosan.org/Using-Regex-groups-in-Logstash-Gsub-regex</id><content type="html" xml:base="https://ugosan.org/Using-Regex-groups-in-Logstash-Gsub-regex/"><![CDATA[<p><mark>Problem</mark></p>

<p><code>
Exception caught in json filter {JSON} :exception=&gt;#&lt;RuntimeError: Invalid FieldReference: proc.aname[2]&gt;}
</code></p>

<p>Original <a href="https://elasticstack.slack.com/archives/CNKF2D325/p1664183139854179">Slack thread</a></p>

<p>When you have <code class="language-plaintext highlighter-rouge">proc.aname[2]</code> and want to have <code class="language-plaintext highlighter-rouge">proc_aname2</code> - you can use regex groups to automatically change all occurrences of that string:</p>

<p><mark> Solution </mark></p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mutate</span> <span class="p">{</span>
    <span class="nb">gsub</span> <span class="o">=&gt;</span> <span class="p">[</span> <span class="s2">"message"</span><span class="p">,</span> <span class="s2">"proc</span><span class="se">\.</span><span class="s2">aname</span><span class="se">\[</span><span class="s2">([0-9]+)</span><span class="se">\]</span><span class="s2">"</span><span class="p">,</span> <span class="s2">"proc_aname</span><span class="se">\1</span><span class="s2">"</span><span class="p">]</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Basically parenthesis <code class="language-plaintext highlighter-rouge">()</code> will make groups that can be later referenced by its number ( <code class="language-plaintext highlighter-rouge">\1</code> for the first group, <code class="language-plaintext highlighter-rouge">\2</code> for the second and so on.</p>

<p><kbd><img src="/images/2022-09-27-10-22-27.png" /></kbd></p>

<p>Used <a href="https://github.com/ugosan/logshark">Logshark</a> for debugging</p>]]></content><author><name></name></author><category term="logstash" /><category term="mutate" /><category term="gsub" /><summary type="html"><![CDATA[A simple Logstash Gsub trick]]></summary></entry><entry><title type="html">Deploying a headless Logstash on Kubernetes</title><link href="https://ugosan.org/Deploying-a-headless-logstash-kubernetes/" rel="alternate" type="text/html" title="Deploying a headless Logstash on Kubernetes" /><published>2022-08-02T00:00:00+00:00</published><updated>2022-08-02T00:00:00+00:00</updated><id>https://ugosan.org/Deploying-a-headless-logstash-kubernetes</id><content type="html" xml:base="https://ugosan.org/Deploying-a-headless-logstash-kubernetes/"><![CDATA[<div class="header">

<!-- The script tag should live in the head of your page if at all possible -->
<script type="text/javascript" async="" src="https://play.vidyard.com/embed/v4.js"></script>

<!-- Put this wherever you would like your player to appear -->
<img style="width: 100%; margin: auto; display: block;" class="vidyard-player-embed" src="https://play.vidyard.com/b9RX6iApHo2jphdtsRrnii.jpg" data-uuid="b9RX6iApHo2jphdtsRrnii" data-v="4" data-type="inline" />

</div>

<p>Top 2 reasons to deploy Logstash on Kubernetes:</p>
<ul>
  <li>You can easily scale up and down to deal with throughput</li>
  <li>You can easily deploy ETL pipelines to hundreds of instances with a single click</li>
</ul>

<p>A “headless” Logstash is a Logstash that doesnt contain any pipeline logic in itself, instead it will fetch the pipelines definitions from a <mark> centralized location</mark> (Elasticsearch itself), so we can create our ETL pipelines through Kibana and deploy them automatically to several Logstash instances. This feature is called <a href="https://www.elastic.co/guide/en/logstash/current/logstash-centralized-pipeline-management.html">Centralized Pipeline Management</a>.</p>

<div class="premonition info">
  <i class="premonition pn-info"></i>
  <div class="content">
    <p class="header">TLDR</p><p>Jump to the full Kubernetes <a href="#all-in-one-manifest">manifest file here</a></p>
  </div>
</div>

<p>We are then going to create a Secret to store credentials, a ConfigMap to configure Logstash, a Deployment that will expose some container ports and finally a Service so Logstash is reachable from the outside.</p>

<h2 id="-elasticsearch-api-key-"><mark> Elasticsearch API Key </mark></h2>
<p>We first need to create an <a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html">API Key</a> so Logstash can communicate to Elasticsearch to fetch pipeline definitions.</p>

<p>Let’s define two roles: <code class="language-plaintext highlighter-rouge">my_role</code> which allows Logstash to fetch the pipeline definitions from Elasticsearch and <code class="language-plaintext highlighter-rouge">my_write_role</code> which will allow us to write to a datastream from our output. Those could be two different API Keys, but we are using just one to keep it simple.</p>

<p>Open up Kibana and then run the following command:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">POST</span> <span class="o">/</span><span class="nx">_security</span><span class="o">/</span><span class="nx">api_key</span>
<span class="p">{</span>
  <span class="dl">"</span><span class="s2">name</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">logstash</span><span class="dl">"</span><span class="p">,</span>   
  <span class="dl">"</span><span class="s2">role_descriptors</span><span class="dl">"</span><span class="p">:</span> <span class="p">{</span> 
    <span class="dl">"</span><span class="s2">my_role</span><span class="dl">"</span><span class="p">:</span> <span class="p">{</span>
      <span class="dl">"</span><span class="s2">cluster</span><span class="dl">"</span><span class="p">:</span> <span class="p">[</span><span class="dl">"</span><span class="s2">monitor</span><span class="dl">"</span> <span class="p">,</span><span class="dl">"</span><span class="s2">manage_logstash_pipelines</span><span class="dl">"</span><span class="p">]</span>
    <span class="p">},</span>
    <span class="dl">"</span><span class="s2">my_write_role</span><span class="dl">"</span><span class="p">:</span> <span class="p">{</span>
      <span class="dl">"</span><span class="s2">index</span><span class="dl">"</span><span class="p">:</span> <span class="p">[</span>
        <span class="p">{</span>
          <span class="dl">"</span><span class="s2">names</span><span class="dl">"</span><span class="p">:</span> <span class="p">[</span><span class="dl">"</span><span class="s2">logs-*</span><span class="dl">"</span><span class="p">],</span>
          <span class="dl">"</span><span class="s2">privileges</span><span class="dl">"</span><span class="p">:</span> <span class="p">[</span><span class="dl">"</span><span class="s2">all</span><span class="dl">"</span><span class="p">]</span>
        <span class="p">}</span>
      <span class="p">]</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The response will have the <code class="language-plaintext highlighter-rouge">encoded</code> field, copy it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
  ...
  "encoded": "RlNXaEU0TUJXQWVtRnlHU3p6d0o6STkzVE9yX1RSTy03TGdiMHU5YWlZZw=="
}
</code></pre></div></div>

<h2 id="-secret-"><mark> Secret </mark></h2>

<p>Next, we need to create a Kubernetes Secret with our API key, paste the value from the <code class="language-plaintext highlighter-rouge">encoded</code> field to a variable we are calling <code class="language-plaintext highlighter-rouge">ELASTICSEARCH_API_KEY</code>:</p>
<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Secret</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-secrets</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">Opaque</span>
<span class="na">data</span><span class="pi">:</span>
  <span class="na">ELASTICSEARCH_API_KEY</span><span class="pi">:</span> <span class="s">RlNXaEU0TUJXQWVtRnlHU3p6d0o6STkzVE9yX1RSTy03TGdiMHU5YWlZZw==</span>
</code></pre></div></div>

<h2 id="-configmap-"><mark> ConfigMap </mark></h2>

<p>The <code class="language-plaintext highlighter-rouge">ELASTICSEARCH_API_KEY</code> variable will be used in a ConfigMap that will represent our <code class="language-plaintext highlighter-rouge">logstash.yml</code> with a very simple configuration that tells Logstash:</p>

<p>1) Where to fetch the pipeline configs (the <code class="language-plaintext highlighter-rouge">xpack.management.elasticsearch.hosts</code>)</p>

<p>2) What are the pipeline names to fetch (<code class="language-plaintext highlighter-rouge">xpack.management.pipeline.id</code>), in our case it will be anything starting with <code class="language-plaintext highlighter-rouge">my-pipeline-*</code></p>

<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-configmap</span>
<span class="na">data</span><span class="pi">:</span>
  <span class="na">logstash.yml</span><span class="pi">:</span> <span class="pi">|</span>
    <span class="s">http.host: "0.0.0.0"</span>
    <span class="s">log.level: info</span>
    <span class="s">xpack.management.enabled: true</span>
    <span class="s">xpack.management.elasticsearch.hosts: ["https://my-cluster.es.us-east-1.aws.found.io:443"]  </span>
    <span class="s">xpack.management.elasticsearch.api_key: "${ELASTICSEARCH_API_KEY}"</span>
    <span class="s">xpack.management.logstash.poll_interval: 5s</span>
    <span class="s">xpack.management.pipeline.id: ["my-pipeline-*"]</span>
</code></pre></div></div>

<h2 id="-deployment-"><mark> Deployment </mark></h2>

<p>Next, we create a Deployment that will expose a couple <code class="language-plaintext highlighter-rouge">containerPort</code>, use the <code class="language-plaintext highlighter-rouge">ELASTICSEARCH_API_KEY</code> from the Secret <code class="language-plaintext highlighter-rouge">logstash-secrets</code> as an environment variable, and use our ConfigMap.</p>
<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-deployment</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
  <span class="na">revisionHistoryLimit</span><span class="pi">:</span> <span class="m">0</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">matchLabels</span><span class="pi">:</span>
      <span class="na">app</span><span class="pi">:</span> <span class="s">logstash</span>
  <span class="na">template</span><span class="pi">:</span>
    <span class="na">metadata</span><span class="pi">:</span>
      <span class="na">labels</span><span class="pi">:</span>
        <span class="na">app</span><span class="pi">:</span> <span class="s">logstash</span>
    <span class="na">spec</span><span class="pi">:</span>
      <span class="na">containers</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">logstash</span>
        <span class="na">image</span><span class="pi">:</span> <span class="s">docker.elastic.co/logstash/logstash:8.3.3</span>
        <span class="na">ports</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">containerPort</span><span class="pi">:</span> <span class="m">5044</span>
        <span class="pi">-</span> <span class="na">containerPort</span><span class="pi">:</span> <span class="m">5045</span>
        <span class="na">resources</span><span class="pi">:</span>
            <span class="na">limits</span><span class="pi">:</span>
              <span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2Gi"</span>
              <span class="na">cpu</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2500m"</span>
            <span class="na">requests</span><span class="pi">:</span> 
              <span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1Gi"</span>
              <span class="na">cpu</span><span class="pi">:</span> <span class="s2">"</span><span class="s">300m"</span>
        <span class="na">env</span><span class="pi">:</span> 
          <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">ELASTICSEARCH_API_KEY</span>
            <span class="na">valueFrom</span><span class="pi">:</span>
              <span class="na">secretKeyRef</span><span class="pi">:</span>
                <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-secrets</span>
                <span class="na">key</span><span class="pi">:</span> <span class="s">ELASTICSEARCH_API_KEY</span>
        <span class="na">volumeMounts</span><span class="pi">:</span>
          <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config-volume</span>
            <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/usr/share/logstash/config</span>
      <span class="na">volumes</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config-volume</span>
        <span class="na">configMap</span><span class="pi">:</span>
          <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-configmap</span>
          <span class="na">items</span><span class="pi">:</span>
            <span class="pi">-</span> <span class="na">key</span><span class="pi">:</span> <span class="s">logstash.yml</span>
              <span class="na">path</span><span class="pi">:</span> <span class="s">logstash.yml</span>
</code></pre></div></div>

<h2 id="-service-"><mark> Service </mark></h2>

<p>The service will expose the ports 5044 and 5045 to the outside through a LoadBalancer, depending on the provider that is enough - in GKE you will get  public ClusterIP automatically assigned.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kind: Service
apiVersion: v1
metadata:
  name: logstash-service
  labels: 
    app: logstash
spec:
  type: LoadBalancer
  selector:
    app: logstash
  ports:
  - protocol: TCP
    port: 5044
    targetPort: 5044
    name: my-pipeline-1
  - protocol: TCP
    port: 5045
    targetPort: 5045
    name: my-pipeline-2
</code></pre></div></div>

<h2 id="-logstash-pipeline-"><mark> Logstash Pipeline </mark></h2>

<p>Our pipeline can open up a HTTP input, Beats input even Syslog on the available ports, for instance:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input</span> <span class="p">{</span>
  <span class="n">http</span> <span class="p">{</span>
    <span class="n">port</span> <span class="o">=&gt;</span> <span class="mi">5045</span>
    <span class="n">codec</span> <span class="o">=&gt;</span> <span class="n">json</span>
    <span class="n">user</span> <span class="o">=&gt;</span> <span class="s2">"webhook_admin"</span>
    <span class="n">password</span> <span class="o">=&gt;</span> <span class="s2">"verysecretpassword"</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="n">filter</span> <span class="p">{</span> <span class="p">}</span>

<span class="n">output</span> <span class="p">{</span>
    <span class="n">elasticsearch</span> <span class="p">{</span>
        <span class="n">hosts</span> <span class="o">=&gt;</span> <span class="s2">"https://my-cluster.es.us-east-1.aws.found.io:443"</span>
        <span class="n">ssl</span> <span class="o">=&gt;</span> <span class="kp">true</span>
        <span class="n">api_key</span> <span class="o">=&gt;</span> <span class="s2">"${ELASTICSEARCH_API_KEY}"</span>
        <span class="n">data_stream</span> <span class="o">=&gt;</span> <span class="kp">true</span>
        <span class="n">data_stream_type</span> <span class="o">=&gt;</span> <span class="s2">"logs"</span>
        <span class="n">data_stream_dataset</span> <span class="o">=&gt;</span> <span class="s2">"my-datastream"</span>
        <span class="n">data_stream_namespace</span> <span class="o">=&gt;</span> <span class="s2">"dev"</span>
    <span class="p">}</span>
<span class="p">}</span>

</code></pre></div></div>

<h2 id="all-in-one-manifest">All-in-one manifest</h2>

<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Secret</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-secrets</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">Opaque</span>
<span class="na">data</span><span class="pi">:</span>
  <span class="na">ELASTICSEARCH_API_KEY</span><span class="pi">:</span> <span class="s">RlNXaEU0TUJXQWVtRnlHU3p6d0o6STkzVE9yX1RSTy03TGdiMHU5YWlZZw==</span>

<span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-deployment</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
  <span class="na">revisionHistoryLimit</span><span class="pi">:</span> <span class="m">0</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">matchLabels</span><span class="pi">:</span>
      <span class="na">app</span><span class="pi">:</span> <span class="s">logstash</span>
  <span class="na">template</span><span class="pi">:</span>
    <span class="na">metadata</span><span class="pi">:</span>
      <span class="na">labels</span><span class="pi">:</span>
        <span class="na">app</span><span class="pi">:</span> <span class="s">logstash</span>
    <span class="na">spec</span><span class="pi">:</span>
      <span class="na">containers</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">logstash</span>
        <span class="na">image</span><span class="pi">:</span> <span class="s">docker.elastic.co/logstash/logstash:8.4.2</span>
        <span class="na">ports</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="na">containerPort</span><span class="pi">:</span> <span class="m">5044</span>
        <span class="pi">-</span> <span class="na">containerPort</span><span class="pi">:</span> <span class="m">5045</span>
        <span class="na">resources</span><span class="pi">:</span>
            <span class="na">limits</span><span class="pi">:</span>
              <span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2Gi"</span>
              <span class="na">cpu</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2500m"</span>
            <span class="na">requests</span><span class="pi">:</span> 
              <span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">1Gi"</span>
              <span class="na">cpu</span><span class="pi">:</span> <span class="s2">"</span><span class="s">300m"</span>
        <span class="na">env</span><span class="pi">:</span> 
          <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">ELASTICSEARCH_API_KEY</span>
            <span class="na">valueFrom</span><span class="pi">:</span>
              <span class="na">secretKeyRef</span><span class="pi">:</span>
                <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-secrets</span>
                <span class="na">key</span><span class="pi">:</span> <span class="s">ELASTICSEARCH_API_KEY</span>
        <span class="na">volumeMounts</span><span class="pi">:</span>
          <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config-volume</span>
            <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/usr/share/logstash/config</span>
      <span class="na">volumes</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config-volume</span>
        <span class="na">configMap</span><span class="pi">:</span>
          <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-configmap</span>
          <span class="na">items</span><span class="pi">:</span>
            <span class="pi">-</span> <span class="na">key</span><span class="pi">:</span> <span class="s">logstash.yml</span>
              <span class="na">path</span><span class="pi">:</span> <span class="s">logstash.yml</span>
<span class="nn">---</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">ConfigMap</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-configmap</span>
<span class="na">data</span><span class="pi">:</span>
  <span class="na">logstash.yml</span><span class="pi">:</span> <span class="pi">|</span>
    <span class="s">http.host: "0.0.0.0"</span>
    <span class="s">log.level: info</span>
    <span class="s">xpack.management.enabled: true</span>
    <span class="s">xpack.management.elasticsearch.hosts: ["https://my-cluster.es.us-east-1.aws.found.io:443"]  </span>
    <span class="s">xpack.management.elasticsearch.api_key: "${ELASTICSEARCH_API_KEY}"</span>
    <span class="s">xpack.management.logstash.poll_interval: 5s</span>
    <span class="s">xpack.management.pipeline.id: ["my-pipeline-*"]</span>

<span class="s">---</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Service</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">logstash-service</span>
  <span class="na">labels</span><span class="pi">:</span> 
    <span class="na">app</span><span class="pi">:</span> <span class="s">logstash</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">LoadBalancer</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">app</span><span class="pi">:</span> <span class="s">logstash</span>
  <span class="na">ports</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">protocol</span><span class="pi">:</span> <span class="s">TCP</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">5044</span>
    <span class="na">targetPort</span><span class="pi">:</span> <span class="m">5044</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">my-pipeline-1</span>
  <span class="pi">-</span> <span class="na">protocol</span><span class="pi">:</span> <span class="s">TCP</span>
    <span class="na">port</span><span class="pi">:</span> <span class="m">5045</span>
    <span class="na">targetPort</span><span class="pi">:</span> <span class="m">5045</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">my-pipeline-2</span>
</code></pre></div></div>]]></content><author><name></name></author><category term="logstash" /><category term="kubernetes" /><category term="datastreams" /><summary type="html"><![CDATA[Easily scale up and down to deal with throughput and deploy ETL pipelines to hundreds of instances with a single click]]></summary></entry><entry><title type="html">Logstash 8.x - Deploying, Ingesting and Testing the right way</title><link href="https://ugosan.org/Logstash-best-practices/" rel="alternate" type="text/html" title="Logstash 8.x - Deploying, Ingesting and Testing the right way" /><published>2022-07-01T00:00:00+00:00</published><updated>2022-07-01T00:00:00+00:00</updated><id>https://ugosan.org/Logstash-best-practices</id><content type="html" xml:base="https://ugosan.org/Logstash-best-practices/"><![CDATA[<div class="header">

<img src="https://images.pexels.com/photos/2569842/pexels-photo-2569842.jpeg?auto=compress&amp;cs=tinysrgb&amp;h=450&amp;dpr=2&amp;fit=crop" />
</div>

<p>The <strong>L</strong> in ELK, Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite “stash”, currently more than 50 outputs are supported, from Elasticsearch to Syslog.</p>

<p>Logstash is also currently the only option within the Elastic Stack if you want to fetch data that lives in a <mark>relational database</mark>, or if you want to write the same event to <mark>multiple outputs</mark>. But all that power also comes with a cost, a Logstash instance will require a lot more resources than an Elastic Agent with Filebeat, for instance.</p>

<p>Here are some of the best practices for deploying Logstash in production in 8.x version, separated in three areas: Deployment, Data Ingestion and Testing</p>

<h2 id="deploying-logstash">Deploying Logstash</h2>

<p>We have several options for deploying Logstash</p>

<h3 id="as-standalone">As standalone</h3>

<p>Deploy Logstash always as a <mark>system service</mark>, using the official packages for YUM or APT based distributions:</p>

<p><a href="https://www.elastic.co/guide/en/logstash/current/installing-logstash.html#package-repositories">Installing Logstash from Package Repositories</a></p>

<h3 id="in-containers">In containers</h3>

<p>The main advantage here is that you can run several logstash instances at the same time, they might even have different versions. You also have a better process isolation at the system-level.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>- Bind-mounted config files
</code></pre></div></div>

<h3 id="in-kubernetes">In Kubernetes</h3>

<h2 id="data-ingestion">Data Ingestion</h2>

<ul>
  <li>Use Datastreams instead of indices
    <ul>
      <li>Special type of index, data streams are well-suited for logs, events, metrics, and other continuously generated data that are rarely, if ever, updated.</li>
      <li>standardized names</li>
      <li>a common set of best practices for mappings
        <div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">elasticsearch</span> <span class="p">{</span>
  <span class="n">hosts</span> <span class="o">=&gt;</span> <span class="s2">"elasticsearch:9200"</span>
  <span class="n">user</span> <span class="o">=&gt;</span> <span class="s2">"elastic"</span>
  <span class="n">password</span> <span class="o">=&gt;</span> <span class="s2">"..."</span>
  <span class="n">data_stream</span> <span class="o">=&gt;</span> <span class="kp">true</span>
  <span class="n">data_stream_type</span> <span class="o">=&gt;</span> <span class="s2">"logs"</span>
  <span class="n">data_stream_dataset</span> <span class="o">=&gt;</span> <span class="s2">"hasura"</span>
  <span class="n">data_stream_namespace</span> <span class="o">=&gt;</span> <span class="s2">"%{log_type}"</span>
<span class="p">}</span>
</code></pre></div>        </div>
      </li>
    </ul>
  </li>
  <li>Pipelines
    <ul>
      <li>Maintain order? if order is not important, disable it</li>
      <li>Dissect instead of Grok whenever possible</li>
    </ul>
  </li>
</ul>

<h2 id="testing">Testing</h2>

<ul>
  <li>Testing
    <ul>
      <li>Use a Load Generator for testing</li>
    </ul>
  </li>
</ul>]]></content><author><name></name></author><summary type="html"><![CDATA[Some of the best practices for deploying Logstash in production in 8.x version, separated in Deployment, Data Ingestion and Testing]]></summary></entry><entry><title type="html">How to get and use the Root CA Certificate Fingerprint in the Elastic Stack</title><link href="https://ugosan.org/using-ca-trusted-fingerprint/" rel="alternate" type="text/html" title="How to get and use the Root CA Certificate Fingerprint in the Elastic Stack" /><published>2022-07-01T00:00:00+00:00</published><updated>2022-07-01T00:00:00+00:00</updated><id>https://ugosan.org/using-ca-trusted-fingerprint</id><content type="html" xml:base="https://ugosan.org/using-ca-trusted-fingerprint/"><![CDATA[<p>Latest developments in Beats, Elastic Agent and Logstash now include a new parameter that makes easier to trust a self-signed certificate, we would just need <mark>A HEX encoded SHA-256 of a CA certificate</mark>.</p>

<h2 id="getting-the-fingerprint-from-a-server">Getting the fingerprint from a server</h2>

<p>We can get this fingerprint by simply using <code class="language-plaintext highlighter-rouge">openssl</code> and connecting to the server you want to extract the fingerprint:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>openssl s_client \
  -connect demos.es.us-east1.gcp.elastic-cloud.com:9243 \
  -servername demos.es.us-east1.gcp.elastic-cloud.com \
  -showcerts &lt; /dev/null 2&gt;/dev/null | \
  openssl x509 -in /dev/stdin -sha256 -noout -fingerprint | \
  sed 's/://g'  
</code></pre></div></div>

<h2 id="getting-the-fingerprint-from-a-file">Getting the fingerprint from a file</h2>

<p>if you have an actual certificate file for the CA you can just load it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> openssl x509 -in ca_file.pem -sha256 -fingerprint | grep SHA256 | sed 's/://g'
</code></pre></div></div>

<p>The response will be something like</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SHA256 Fingerprint=65C080BF18DBFA8F57606DBA0ED11D32DF42CF63B55CC07C7A764AA9597A9403
</code></pre></div></div>

<p>So you can use this fingerprint in <code class="language-plaintext highlighter-rouge">output.elasticsearch</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">output.elasticsearch</span><span class="pi">:</span>
  <span class="na">hosts</span><span class="pi">:</span> <span class="s">...</span>
  <span class="na">api_key</span><span class="pi">:</span> <span class="s">...</span>
  <span class="na">index</span><span class="pi">:</span> <span class="s">...</span>
  <span class="na">ssl.ca_trusted_fingerprint</span><span class="pi">:</span> <span class="s2">"</span><span class="s">65C080BF18DBFA8F57606DBA0ED11D32DF42CF63B55CC07C7A764AA9597A9403"</span>
  <span class="na">ssl.verification_mode</span><span class="pi">:</span> <span class="s">full</span>
</code></pre></div></div>]]></content><author><name></name></author><summary type="html"><![CDATA[We need a HEX encoded SHA-256 of a CA certificate to use `ca_trusted_fingerprint`]]></summary></entry></feed>