storing large scale of data

“…we see an emerging data storage mechanism for storing large scale of data. These storage solution differs quite significantly with the RDBMS model and is also known as the NOSQL…”
NOSQL PatternsNoSQL GraphDB , both articles were from  by Ricky Ho’s Blog
System Properties Comparison Cassandra vs. HBase vs. MongoDB
http://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB
Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison
NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.
http://nosql-database.org/Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Couchbase vs OrientDB vs Aerospike vs Neo4j vs Hypertable vs ElasticSearch vs Accumulo vs VoltDB vs Scalaris comparison
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redisWhy use a NoSQL Database, and why not?
https://adamfowlerml.wordpress.com/2013/01/04/why-use-a-nosql-database-and-why-not/A deep dive into NoSQL: A complete list of NoSQL databases
http://bigdata-madesimple.com/a-deep-dive-into-nosql-a-complete-list-of-nosql-databases/

Elasticsearch

 

Why do I need a broker for my production ELK stack + machine specs?
“As for Redis, it acts as a buffer in case logstash and/or elasticsearch are down or slow. If you’re using the full logstash or logstash-forwarder as a shipper, it will detect when logstash is unavailable and stop sending logs (remembering where it left off, at least for a while).

So, in a pure logstash/logstash-forwarder environment, I see little reason to use a broker like redis.

When it becomes important is for sources that don’t care about logstash’s status and don’t buffer in their side. syslog, snmptrap, and others fall into this category. Since your sources include syslog, I would bring up brokers in your setup.”
http://stackoverflow.com/questions/30361568/why-do-i-need-a-broker-for-my-production-elk-stack-machine-specs

 

How to Deploy the ELK Stack in Production
http://logz.io/blog/deploy-elk-production/

 

Get Started with Elasticsearch
https://www.elastic.co/webinars/get-started-with-elasticsearch?baymax=rtp&elektra=downloads&iesrc=ctr

Getting Started with ElasticSearch
https://dzone.com/articles/elasticsearch-getting-started

ElasticSearch in 5 minutes
http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html

ElasticSearch 101 – a getting started tutorial
http://joelabrahamsson.com/elasticsearch-101/

 

AWS Tutorial Series – Elasticsearch – Logstash – Kibana 4 (ELK Stack) Setup Tutorial
https://www.youtube.com/watch?v=ge8uHdmtb1M&app=desktop

 

 

 

Introducing Apache Spark 2.0
https://databricks.com/blog/2016/07/26/introducing-apache-spark-2-0.html

 

Using Spark and Elasticsearch for Real-time Data Analysis- Costin Leau (Elasticsearch)
https://www.youtube.com/watch?v=afy4PkSJuzk

 

Realtor Search: Elasticsearch and Python in Practice
https://www.youtube.com/watch?v=emT4nHd49cc#t=0.87474
http://www.pyvideo.org/video/3545/realtor-search-elasticsearch-and-python-practice

 

Adding Spark (and Security) to Elasticsearch for Hadoop
https://www.elastic.co/webinars/adding-spark-and-security-to-elasticsearch-for-hadoop?baymax=rtp&elektra=docs&iesrc=ctr

 

 

Spark Streaming

Spark Streaming Programming Guide
http://spark.apache.org/docs/latest/streaming-programming-guide.html

Big Data Processing with Apache Spark – Part 3: Spark Streaming
https://www.infoq.com/articles/apache-spark-streaming
Spark Streaming – A Simple Example
http://henning.kropponline.de/2015/03/22/spark-streaming-simple-example/
How-to: Do Near-Real Time Sessionization with Spark Streaming and Apache Hadoop
http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/

 

Spark Streaming + Kafka Integration Guide
http://spark.apache.org/docs/latest/streaming-kafka-integration.html

 

Kafka

“Apache Kafka and the Next 700 Stream Processing Systems” by Jay Kreps
https://www.youtube.com/watch?v=9RMOc0SwRro#t=168.272675

 

Setting Up and Running Apache Kafka on Windows OS
https://dzone.com/articles/running-apache-kafka-on-windows-os

 

Getting started with Esper in 5 minutes
https://coffeeonesugar.wordpress.com/2009/07/21/getting-started-with-esper-in-5-minutes/

 

 

KIP-28 – Add a processor client
https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client