Indexing with ElasticSearch

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Single Page Apps with Breeze and Ruby.
Amaze business, make your devs happy
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
1 Vic Hargrave |
Information Retrieval in Practice
Log Monitoring, Management and Analysis with Nagios
Graph databases …the other end of the NoSQL spectrum. Material taken from NoSQL Distilled and Seven Databases in Seven Weeks.
Overview of Search Engines
Google App Engine and Java Application: Clustering Internet search results for a person Aleksandar Kartelj Faculty of Mathematics,
Platform as a Service (PaaS)
Google App Engine Danail Alexiev Technical Trainer SoftAcad.bg.
Implementing search with free software An introduction to Solr By Mick England.
Elasticsearch in Dashboard Data Management Applications David Tuckett IT/SDC 30 August 2013 (Appendix 11 November 2013)
Nutch in a Nutshell (part I) Presented by Liew Guo Min Zhao Jin.
Clemens Düpmeier (KIT / IAI)
COLD FUSION Deepak Sethi. What is it…. Cold fusion is a complete web application server mainly used for developing e-business applications. It allows.
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Revolutionizing enterprise web development Searching with Solr.
NOSQL DATABASES Please remember to read the NOSQL Distilled book and the Seven Databases book.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Realtime insight in your application usage with NodeJs, ElasticSearch and Kibana Onno de Haan.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
807 - TEXT ANALYTICS Massimo Poesio Lab 2: (Quick intro to) SOLR Document clustering with MAHOUT.
2 Floor, , Sunnae-Dong,Kangdong-Gu Seoul, Korea T | F | SEOJINDSA CO. LTD Enterprise LDAP Team LDAP.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
#SummitNow Inspecting Alfresco – Tools and Techniques Nathan McMinn Technical Consultant - Alfresco.
A presentation on ElasticSearch
Information Retrieval in Practice
Wataru Takase, Tomoaki Nakamura, Yoshiyuki Watase, Takashi Sasaki
Platform as a Service (PaaS)
Introduction to YouSeer
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Platform as a Service (PaaS)
Information Retrieval in Practice
Backdooring enemies with a Proxy …..
WinCC-OA Log Analysis SCADA Application Service - Reporting
Searching and Indexing
Open Source distributed document DB for an enterprise
Spark Presentation.
Microsoft Build /26/2018 2:16 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Combining Metrics and Logs for Holistic System/Application Analysis
Microsoft Ignite /22/2018 3:27 PM BRK2121
Getting started with Alfresco Development
New Mexico State University
Gen-Tao Chiang Data and Analytic Engineer
Google App Engine Danail Alexiev
CS6604 Digital Libraries IDEAL Webpages Presented by
Indexing with Elasticsearch
Overview of big data tools
another noSql customization for the HDB++ archiving system
Get your ETL flow under statistical process control
Database Software.
Introduction to Elasticsearch with basics of Lucene May 2014 Meetup
REST APIs Maxwell Furman Department of MIS Fox School of Business
Getting Started With Solr
Rafał Kuć – Sematext sematext.com
Academic Map Report And Exhibition Group24.
NoSQL Overview + Elasticsearch Quick Dive
Python and REST Kevin Hibma.
Query Interface using Django
Factual Claim Validation Models
Lecture 34: Testing II April 24, 2017 Selenium testing script 7/7/2019
Intro to Azure Search Julie Smith 2019.
Web Application Development Using PHP
The real Benefits of IBM - C exam. IBM - C : Cloud Solutions Certification Provider:IBM Exam Code:C Exam Name:IBM Cloud Private.
Intro to Azure Search Julie Smith 2019.
Presentation transcript:

Indexing with ElasticSearch IST 441 - Spring 2019 Shaurya Rohatgi IST, PhD Student Credits and Thanks to Ruslan Zavacky

Recap Crawled a demo URL (Dr. Jian Wu’s webpage) Dumped the data crawled into data.jl

Next Steps Crawled a demo URL (Dr. Jian Wu’s webpage) Dumped the data crawled into data.jl Index this data using ElasticSearch

Overview Advertise ElasticSearch Look at the tools and features available Index Demo using Kibana and ES’s Python API

Why ElasticSearch ? Not Solr ? Elasticsearch or Solr, both of which are steadily ranked in the top two spots among open source and commercial search engines, according to DB-Engines. Link to Google Trends ElasticSearch is the only open source search- engine with a official Python API http://solr-vs-elasticsearch.com/ Source : https://db-engines.com/en/ranking/search+engine https://db-engines.com/en/system/Elasticsearch

real time analytics high availability Search isn’t just free text search anymore - it’s about exploring your data. Understanding it. Gaining insights that will make your business better or improve your product. high availability Elasticsearch clusters are resilient - they will detect and remove failed nodes, and reorganise themselves to ensure that your data is safe and accessible.

document oriented Store complex real world entities in Elasticsearch as structured JSON documents. All fields are indexed by default, and all the indices can be used in a single query, to return results at breath taking speed.

per-operation persistence restful api per-operation persistence Elasticsearch puts your data safety first. Document changes are recorded in transaction logs on multiple nodes in the cluster to minimise the chance of any data loss. Elasticsearch is API driven. Almost any action can be performed using a simple RESTful API using JSON over HTTP. An API already exists in the language of your choice. apache 2 open source license Elasticsearch can be downloaded, used and modified free of charge. It is available under the Apache 2 license, one of the most flexible open source licenses available.

Other ElasticSearch Benefits Features are based on what their have asked for in the past - not random Fast version releases Great community and online resources - Actual developers answer your questions Actual support engineers are hired by the company to help It is fast and scalable Tools like Elastic Cloud Enterprise - Manage and replicate multiple ElasticSearch clusters Caching for frequent queries

Elastic Success Stories Source: https://www.elastic.co/use-cases/

An Exhibition of ElasticSearch capabilities GITHUB - Search An Exhibition of ElasticSearch capabilities

Unstructured search

Structured search

Enrichment

Sorting

Pagination

Aggregation

Suggestions

Capabilities Store schema less data Or create a schema for your data Manipulate your data record by record Or use Multi-document APIs to do Bulk ops Perform Queries/Filters on your data for insights Or if you are DevOps person, use APIs to monitor Do not forget about built-in Full-Text search and analysis

Autocompletion in ES Multiple Inputs Single Unified Output Scoring Payloads Synonyms Ignoring stopwords Going fuzzy Statistics

ES Software Stack - Development Tools – the ELK stack ElasticSearch - Core indexing and Searching API Should be thought of as a database with querying and searching capabilities inherited from Lucene - the word search is deceiving Logstash - Useful in importing data from sources to ElasticSearch “database” Prebuilt importers for Databases - SQL, Server logs, metrics etc. Kibana - GUI Dashboard and Console - Visual Tool Discovering insights

ES Software Stack - Development Tools – the ELK stack ElasticSearch - Core indexing and Searching API Should be thought of as a database with querying and searching capabilities inherited from Lucene - the word search is deceiving Logstash - Useful in importing data from sources to ElasticSearch “database” Prebuilt importers for Databases - SQL, Server logs, metrics etc. Kibana - GUI Dashboard and Console - Visual Tool Discovering insights

Kibana “Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack, so you can do anything from learning why you're getting paged at 2:00 a.m. to understanding the impact rain might have on your quarterly numbers.” Demo

Time to Index !

Prerequisites - HTTP Methods GET retrieve data from a server at the specified resource PUT send data to the API sever to create or update a resource POST create or update a resource. The difference is that PUT requests are idempotent.

What all has already been done for you ! Elastic Requires Java. It has been installed on the server Elastic’s Python API has been installed Elastic is running as a service for every team on their ports – 9201, 9202, 9203 . . 9209 Kibana is also running as a service for each team – 5601, 5602 . . . 5609 Kibana has been made accessible on IST network at from ist441.ist.psu.edu:<team_port_number> Code is also available in indexing directory in respective team folders

Accessing Kibana Open Browser in Vlabs ist441.ist.psu.edu:56<team_number> Team 1 - ist441.ist.psu.edu:5601 , Team 2 - ist441.ist.psu.edu:5602 . . and So on List all index names in ES GET /_cat/indices?v Get all ! Just to check if kibana can communicate with ES

Ssh into ist441.ist.psu.edu Using Putty

Python Code for Indexing Commands to run – 1. Move to folder cd /data/team<team_number>/indexing/src Code to run index_jl.py Change this

Python Code for Indexing Commands to run – 1. Move to folder cd /data/team<team_number>/indexing/src 2. Run index_jl.py python3 index_jl.py or python index_jl.py Change this

Querying Using Kibana Json request – (copy please) GET jian_urls/_search { "query": { "match": { "body": "Penn state university" } }

Next Steps Optimizing ES – after 2 weeks Trying other analysers/tokenizers – link Set it up on your local machine and try working with it - http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html