<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Elasticsearch on Blog GoHugo de Fredô : Linux, Proxmox, IA, Trail, Course, Randonnée, Gravel, Ski de Randonnée</title>
    <link>https://move.cyber-neurones.org/tags/elasticsearch/</link>
    <description>Recent content in Elasticsearch on Blog GoHugo de Fredô : Linux, Proxmox, IA, Trail, Course, Randonnée, Gravel, Ski de Randonnée</description>
    <generator>Hugo</generator>
    <language>fr</language>
    <lastBuildDate>Mon, 25 Nov 2019 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://move.cyber-neurones.org/tags/elasticsearch/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>MacOS : Python : Suppression des doublons d&#39;emails avec l’API Python Elasticsearch/Kibana (Version V3)</title>
      <link>https://move.cyber-neurones.org/post/2019/11/2019-11-25-macos-python-suppression-des-doublons-demails-avec-lapi-python-elasticsearch-kibana-version-v3/</link>
      <pubDate>Mon, 25 Nov 2019 00:00:00 +0000</pubDate>
      <guid>https://move.cyber-neurones.org/post/2019/11/2019-11-25-macos-python-suppression-des-doublons-demails-avec-lapi-python-elasticsearch-kibana-version-v3/</guid>
      <description>&lt;p&gt;Finalement dans les 200.000 emails je pense avoir des doublons &amp;hellip; je vais donc profiter de l&amp;rsquo;export vers Elastciseach/Kibana pour voir si j&amp;rsquo;ai des doublons. L&amp;rsquo;email qu&amp;rsquo;il va avoir la même taille et le même checksum MD5 sera considéré comme un doublons.&lt;/p&gt;&#xA;&lt;p&gt;Voici donc la version V3 (sans la suppression de fichier : &lt;strong&gt;os.unlink(path)&lt;/strong&gt; )&lt;/p&gt;&#xA;&#xA;&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;#!/usr/bin/env python3&#xA;&#xA;import email&#xA;import plistlib&#xA;import hashlib&#xA;import re&#xA;import glob, os&#xA;import string&#xA;from datetime import datetime&#xA;from email.utils import parsedate_to_datetime&#xA;from email.header import Header, decode_header, make_header&#xA;from elasticsearch import Elasticsearch &#xA;&#xA;class Emlx(object):&#xA;        def __init__(self):&#xA;            super(Emlx, self).__init__()&#xA;            self.bytecount = 0&#xA;            self.msg_data = None&#xA;            self.msg_plist = None&#xA;&#xA;        def parse(self, filename_path):&#xA;            with open(filename_path, &amp;#34;rb&amp;#34;) as f:&#xA;                self.bytecount = int(f.readline().strip())&#xA;                self.msg_data = email.message_from_bytes(f.read(self.bytecount))&#xA;                self.msg_plist = plistlib.loads(f.read())&#xA;            return self.msg_data, self.msg_plist&#xA;&#xA;def md5(fname):&#xA;    hash_md5 = hashlib.md5()&#xA;    with open(fname, &amp;#34;rb&amp;#34;) as f:&#xA;        for chunk in iter(lambda: f.read(4096), b&amp;#34;&amp;#34;):&#xA;            hash_md5.update(chunk)&#xA;    return hash_md5.hexdigest()&#xA;&#xA;if __name__ == &amp;#39;__main__&amp;#39;:&#xA;   msg = Emlx()&#xA;   nb_parse = 0&#xA;   nb_error = 0&#xA;   save_space = 0&#xA;   list_email = []&#xA;   printable = set(string.printable)&#xA;   path_mail = &amp;#34;/Users/MonLogin/Library/Mail/V6/&amp;#34;&#xA;   es_keys = &amp;#34;mail&amp;#34;&#xA;   es=Elasticsearch([{&amp;#39;host&amp;#39;:&amp;#39;localhost&amp;#39;,&amp;#39;port&amp;#39;:9200}])&#xA;   for root, dirs, files in os.walk(path_mail):&#xA;      for file in files:&#xA;          if file.endswith(&amp;#34;.emlx&amp;#34;):&#xA;             file_full = os.path.join(root, file)&#xA;             my_check = md5(root+&amp;#39;/&amp;#39;+file)&#xA;             my_count = list_email.count(my_check)&#xA;             list_email.append(my_check)&#xA;             message, plist = msg.parse(file_full)&#xA;             statinfo = os.stat(file_full)&#xA;             if (my_count &amp;gt; 0):&#xA;                save_space += int(statinfo.st_size)&#xA;                #os.unlink(root+&amp;#39;/&amp;#39;+file)&#xA;             my_date = message[&amp;#39;Date&amp;#39;]&#xA;             my_id = message[&amp;#39;Message-ID&amp;#39;]&#xA;             my_server = message[&amp;#39;Received&amp;#39;]&#xA;             my_date_str = &amp;#34;&amp;#34;&#xA;             if my_date is not None and my_date is not Header:&#xA;                 try:&#xA;                   my_date_str = datetime.fromtimestamp(parsedate_to_datetime(my_date).timestamp()).strftime(&amp;#39;%Y-%m-%dT%H:%M:%S&amp;#39;)&#xA;                 except :&#xA;                   my_date_str = &amp;#34;&amp;#34;&#xA;             my_email = str(message[&amp;#39;From&amp;#39;])&#xA;             my_email = str(make_header(decode_header(my_email)))&#xA;             if my_email is not None:&#xA;                 my_domain = re.search(&amp;#34;@[\w.\-\_]+&amp;#34;, str(my_email))&#xA;                 if my_domain is not None:&#xA;                      my_domain_str = str(my_domain.group ());&#xA;                      my_domain_str = my_domain_str.lower()&#xA;             if my_email is not None:&#xA;                 my_name = re.search(&amp;#34;[\w.\-\_]+@&amp;#34;, str(my_email))&#xA;                 if my_name is not None:&#xA;                      my_name_str = str(my_name.group ());&#xA;                      my_name_str = my_name_str.lower()&#xA;             json = &amp;#39;{&amp;#34;checksum&amp;#34;:&amp;#34;&amp;#39;+my_check+&amp;#39;&amp;#34;,&amp;#34;count&amp;#34;:&amp;#34;&amp;#39;+str(my_count)+&amp;#39;&amp;#34;,&amp;#34;size&amp;#34;:&amp;#39;+str(statinfo.st_size)&#xA;             if my_domain is not None:&#xA;                 #print(my_domain.group())&#xA;                 #print(my_name.group())&#xA;                 json = json+&amp;#39;,&amp;#34;name&amp;#34;:&amp;#34;&amp;#39;+my_name_str+&amp;#39;&amp;#34;,&amp;#34;domain&amp;#34;:&amp;#34;&amp;#39;+my_domain_str+&amp;#39;&amp;#34;&amp;#39;&#xA;             else:&#xA;                 my_email = my_email.replace(&amp;#34;,&amp;#34;,&amp;#34;&amp;#34;)&#xA;                 my_email = my_email.replace(&amp;#39;&amp;#34;&amp;#39;,&amp;#39;&amp;#39;)&#xA;                 my_email = str(re.sub(r&amp;#39;[^\x00-\x7f]&amp;#39;,r&amp;#39;&amp;#39;, my_email)) &#xA;                 my_email = my_email.lower()&#xA;                 json = json+&amp;#39;,&amp;#34;name&amp;#34;:&amp;#34;&amp;#39;+my_email+&amp;#39;&amp;#34;,&amp;#34;domain&amp;#34;:&amp;#34;None&amp;#34;&amp;#39;;&#xA;             if my_date is not None and len(my_date_str) &amp;gt; 1:&#xA;                 json = json+&amp;#39;,&amp;#34;date&amp;#34;:&amp;#34;&amp;#39;+my_date_str+&amp;#39;&amp;#34;,&amp;#34;id&amp;#34;:&amp;#39;+str(nb_parse)&#xA;             else:&#xA;                 json = json+&amp;#39;,&amp;#34;id&amp;#34;:&amp;#39;+str(nb_parse)&#xA;             if my_server is not None and my_server is not Header:&#xA;                 ip = re.search(r&amp;#39;\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}&amp;#39;, str(my_server))&#xA;                 if ip is not None:&#xA;                    my_ip = ip.group()&#xA;                    json = json+&amp;#39;,&amp;#34;ip&amp;#34;:&amp;#34;&amp;#39;+str(my_ip)+&amp;#39;&amp;#34;&amp;#39;&#xA;                 else:&#xA;                    my_ip = &amp;#34;&amp;#34;&#xA;                 #ip = re.findall(r&amp;#39;\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b&amp;#39;,my_server)&#xA;                 #ip = re.findall( r&amp;#39;[0-9]+(?:\.[0-9]+){1,3}&amp;#39;, my_server )&#xA;                 #ip = re.findall(r&amp;#39;[\d.-]+&amp;#39;, my_server) &#xA;             else:&#xA;                 json = json&#xA;             if my_id is not None and my_id is not Header:&#xA;                 try:&#xA;                    my_id =my_id.strip()&#xA;                    my_id =my_id.strip(&amp;#39;\n&amp;#39;)&#xA;                    json = json+&amp;#39;,&amp;#34;Message-ID&amp;#34;:&amp;#34;&amp;#39;+my_id+&amp;#39;&amp;#34;,&amp;#34;file&amp;#34;:&amp;#34;&amp;#39;+file+&amp;#39;&amp;#34;}&amp;#39;&#xA;                 except:&#xA;                    json = json+&amp;#39;,&amp;#34;file&amp;#34;:&amp;#34;&amp;#39;+file+&amp;#39;&amp;#34;}&amp;#39;&#xA;             else:&#xA;                 json = json+&amp;#39;,&amp;#34;file&amp;#34;:&amp;#34;&amp;#39;+file+&amp;#39;&amp;#34;}&amp;#39;&#xA;             print(json)&#xA;             try:&#xA;                res = es.index(index=es_keys,doc_type=&amp;#39;emlx&amp;#39;,id=nb_parse,body=json)&#xA;             except:&#xA;                nb_error += 1   &#xA;             nb_parse += 1&#xA;             #print(plist)&#xA;   print(nb_parse)&lt;/code&gt;&lt;/pre&gt;&#xA;&lt;p&gt;A suivre pour la V4 !&lt;/p&gt;</description>
    </item>
    <item>
      <title>OSMC/Raspberry  : Ajout de ELK ( Elasticsearch / Logstash  / Kibana / Beats / Nginx )</title>
      <link>https://move.cyber-neurones.org/post/2018/09/2018-09-19-osmc-raspberry-ajout-de-elk-elasticsearch-logstash-kibana-beats-nginx/</link>
      <pubDate>Wed, 19 Sep 2018 00:00:00 +0000</pubDate>
      <guid>https://move.cyber-neurones.org/post/2018/09/2018-09-19-osmc-raspberry-ajout-de-elk-elasticsearch-logstash-kibana-beats-nginx/</guid>
      <description>&lt;p&gt;Le but est d&amp;rsquo;installer ELK sur un OSMC/Raspberry déjà fonctionnel &amp;hellip; afin de ne pas acheter un nouveau Raspberry Pi &amp;hellip;&lt;/p&gt;&#xA;&lt;p&gt;Pour l&amp;rsquo;installation de OSMC voir : &lt;a href=&#34;https://www.cyber-neurones.org/2016/09/installation-un-media-center-avec-osmc-sur-un-raspberry-pi-3-model-b/&#34;&gt;https://www.cyber-neurones.org/2016/09/installation-un-media-center-avec-osmc-sur-un-raspberry-pi-3-model-b/&lt;/a&gt; . &amp;ldquo;Installation un media-center avec OSMC sur un Raspberry Pi 3 Model B&amp;rdquo; ( fait le &lt;a href=&#34;https://www.cyber-neurones.org/2016/09/installation-un-media-center-avec-osmc-sur-un-raspberry-pi-3-model-b/&#34; title=&#34;16:31&#34;&gt;30/09/2016&lt;/a&gt;) . &lt;/p&gt;&#xA;&lt;p&gt;La première étape est d&amp;rsquo;ouvrir un console, puis faire un SSH : sur l&amp;rsquo;IP avec le login &lt;strong&gt;osmc&lt;/strong&gt; et le mot de passe &lt;strong&gt;osmc&lt;/strong&gt; ( si pas changé ) :&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
