Elasticsearch
Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010 by Elasticsearch N.V.
(now known as Elastic) and ever since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases.
Elasticsearch is developed in Java and is dual-licensed under the source-available Server Side Public License and the Elastic license, while other parts fall under the proprietary (source-available) Elastic License. Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is the central component of the Elastic Stack, a set of free and open tools for data ingestion, enrichment, storage, analysis, and visualization. Elasticsearch is generally used as the underlying engine/technology that powers applications that have complex search features and requirements. Elasticsearch provides us a distributed system on top of Lucene StandardAnalyzer for indexing and automatic type guessing and utilizes a JSON based REST API to refer to Lucene features.
Elasticsearch is today one of the most popular database systems available today and is considered as the living heart of what is today’s the most popular log analytics platform — the ELK Stack (Elasticsearch, Logstash and Kibana).
What Does Elasticsearch Do?
Elasticsearch can be used to search all kinds of data. It provides a scalable search solution, has near real-time search and support for multi tenancy. Elasticsearch takes in unstructured data from different locations, stores and indexes it according to user-specified mapping (which can also be derived automatically from data), and makes it searchable. Its distributed architecture makes it possible to search and analyze huge volumes of data in near real time. It allows us to start with one machine and scale to hundreds. Elasticsearch makes it easy to run a full-featured search cluster, though running it at scale still requires a substantial level of expertise
Below, there are some of Elasticsearch’s primary use cases:
- Application search - For applications that rely heavily on a search platform for the access, retrieval, and reporting of data.
- Website search - Websites which store a lot of content find Elasticsearch a very useful tool for effective and accurate searches.
- Logging and log analytics - Elasticsearch is commonly used for ingesting and analyzing log data in near-real-time and in a scalable manner.
- Security analytics - Another major analytics application of Elasticsearch is security analysis. Access logs and similar logs concerning system security can be analyzed with the ELK stack, providing a more complete picture of what’s going on across your systems in real-time.
Why use Elasticsearch?
- Elasticsearch is fast - As Elasticsearch is built on top of `Lucene`, it excels at full-text search. Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short — typically one second.
- Elasticsearch comes with a wide set of features - In addition to its speed, scalability, and resiliency, Elasticsearch has a number of powerful built-in features that make storing and searching data even more efficient, such as data rollups and index lifecycle management.
- The Elastic Stack simplifies data ingest, visualization, and reporting - Integration of Elasticsearch with Beats and Logstash makes it easy to process data before indexing into Elasticsearch. And Kibana provides real-time visualization of Elasticsearch data as well as UIs for quickly accessing application performance monitoring (APM), logs, and infrastructure metrics data.
Elasticsearch - Backend components:
To better understand Elasticsearch and its usage is good to have a general understanding of the main backend components:-
- Node - A node is a single server that is part of a cluster, stores our data, and participates in the cluster’s indexing and search capabilities. Just like a cluster, a node is identified by a name which by default is a random Universally Unique Identifier (UUID) that is assigned to the node at startup.
- Cluster - A cluster is a collection of one or more nodes that together holds your entire data and provides federated indexing and search capabilities. There can be N nodes with the same cluster name. Elasticsearch operates in a distributed environment: with cross-cluster replication, a secondary cluster can spring into action as a hot backup.
- Index - The index is a collection of documents that have similar characteristics. For example, we can have an index for a specific customer, another for a product information, and another for a different typology of data. An index is identified by a unique name that refers to the index when performing indexing search, update, and delete operations.
- Document - A document is a basic unit of information that can be indexed. For example, we can have an index about our product and then a document for a single customer. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.
- Shard and Replicas - Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When we create an index, we can simply define the number of shards that we want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.
Get started with Elasticsearch:
The simplest way we can set up Elasticsearch is by creating a managed deployment with Elasticsearch Service on Elastic Cloud. If anyone prefer to install and manage Elasticsearch yourself, we can download the latest version from here. For more installation options, we can also see the Elasticsearch installation documentation here.
Build from source
Elasticsearch uses Gradle for its build system.
To build a distribution for our local OS and to print its output location upon completion, we have to run:
./gradlew localDistro
and to build distributions for all supported platforms, run:
./gradlew assemble
Distributions are output to distributions/archives
and how to run the test suite, see here.
Contributing:
Elasticsearch is an free and open project managed by Elastic. The code base includes contributions from developers both inside and outside of Elastic. Anyone can submit a pull request in the Elasticsearch GitHub repository. Elastic conducts a transparent review of all pull requests before merging them into the code base.
If anyone interested in being a contributor and want to get involved in developing the Elasticsearch framework, can see CONTRIBUTING for details on contributing to Elasticsearch.
Communication:
- To report a bug or request a feature, we need to create a GitHub Issue and as per contributing guidelines we have to ensure someone else hasn’t created an issue for the same topic.
- Need help using Elasticsearch? Reach out on the Elastic Forum or Slack. A fellow community member or Elastic engineer will be happy to help you out.
References:
- Elastisearch Docs: https://www.elastic.co/what-is/elasticsearch
- An overview on Elasticsearch: https://towardsdatascience.com/an-overview-on-elasticsearch-and-its-usage-e26df1d1d24a
- Knowi Blogs: https://www.knowi.com/blog/what-is-elastic-search/
- Elasticsearch Github: https://github.com/elastic/elasticsearch