Sunday, August 3, 2014

Apache Storm WordCount Example by Hortonnworks

Please follow the simple and flawless guidelines for the setup and basic WordCount example in Apache Storm.

http://hortonworks.com/hadoop-tutorial/processing-streaming-data-near-real-time-apache-storm/

THE ONLY MISSING POINT: NOT STOPPING THE JOB i.e. TOPOLOGY
TURN OFF YOUR JOB AFTER A FEW MINUTES, OTHERWISE BEING A STREAM PROCESSING, YOUR WORKER LOGS /usr/lib/storm/logs/worker*.log WILL KEEP INCREASING.

To do so:
1) go to UI : http://localhost:8744/
2) Under Topology Summary, click on WordCount
3) On newly directed page, under Topology Actions, click on "Deactivate" or "Kill"






Saturday, August 2, 2014

YARN : Complete picture of Apache Hadoop Ecosystem



Above schematic explains the complete overview of Apache Hadoop Ecosystem using YARN for:
- Batch
- Interactive
- Realtime
- Search
- In Memory

operations ...


Following image shows the broad view for data ingestion, operations and management for whole process...



Source: http://hortonworks.com/blog/pivotal-hortonworks-shared-vision-operations-enterprise-hadoop/

Thursday, June 19, 2014

Apache Yarn - Hadoop 2.x Concept

Following is the excellent explanation about the idea of YARN by Arun Murthi and Rohit Bakshi from Hortonworks. It really makes clear the much needed new architecture which not only supports Map Reduce applications but also other applications like Tez, Storm on the same cluster and HDFS base.

A must watch!


Tuesday, February 4, 2014

Hive: Add A Partition Only If It Does Not Exist


Alter Partition

  Add Partitions

  ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1']