Friday, February 3, 2017

TRAINING - BigData Administrator

  • What is Data (& Consequently BigData...)
    • RDBMS,NoSql,TimeSeries,Graph,FileSystem,Stream,Sensor,Spatial.
  • Hadoop 2.0
    • TimeLine & History.
    • What - Fundamentals, Core Components, Rack Awareness, Node & Cluster Concept.
    • Why - Solution Types, Distributions & Specialties.
    • When - (And When Not) Challenges & Complexity.
    • Use cases.
  • Linux, FileSystems & Other Terminology
    • RedHat, CentOS, Ubuntu - VM, Server, AWS & Other Cloud Options.
    • Ext3, Ext4, SAN, NAS, NFS, RAID, S3, ZFS, Alluxio, QuantcastFS, XtreemFS, BeeGFS, MooseFS, OrangeFS, LizardFS, Lucene.
    • OpenLDAP, DNS, DHCP, NTP, Kerberos, CA, SSH, Putty.
  • Role Expectation, Job Description, Responsibilities & Growth Plan
    • Data Modeling, Designing, ETL (Development / Process) Management, Capacity Planning, Proposal, POCs & Deployment, for (New / Expansion) of Hardware and Software Environments, with Systems Engineering, Infrastructure, Network, Database, Application, Data Delivery and Business Intelligence teams, to ensure business applications are highly available and performing within agreed SLAs.
    • Installation, Implementation, Administration, Configuration, Connectivity, Scaling, Backup, Recovery, Updates, Upgrades, Security, (OS {Primarily Linux} / Memory / Network / Disk / File / User / Node / Volume) Management, Performance Monitoring, Tuning, Task Automation {Bash Scripting}, Maintenance, Support, CI Integration, Log Review (Data Exhaust), Quality Audit, (Develop / Document) Best Practices & Benchmarking for New, Ongoing & Existing Enterprise Cluster, Based on specific / generic Distro or Cloud Provider, and Apache Hadoop.
    • Primary Point Of Contact for Vendor Selection, Management & Escalation.
  • Stack Insight
    • HDFS In Depth (Block Size, Replication Factor, Daemons, HA, Fault Tolerance, Federation, Quotas, Erasure Coding etc...).
    • YARN “The Hadoop OS” In Depth (HA, RM, Scheduler, Queues, Node Labels etc...).
    • HortonWorks Local Cluster “First Look”
      • Ambari
      • HDFS Web UI & CLI
      • Hue, Oozie, Other Components
  • Ensuring All Lab & Participants System Prerequisites are fulfilled for further proceedings.
  • Hortonworks “OnPremise” Multi Node Cluster SetUp (HDP 2.3)
    • Capacity Planning, Hardware / Virtualization Options, Network SetUp & Nodes Enlisting.
    • OS Modifications & MySql Installation.
    • Stack Installation & Configuration
  • Rack Aware HDFS 2.0 with HA Enabled Through Zookeeper, Journal & NFS.
  • YARN RM HA with Capacity Scheduler & Multiple Queues.
  • Spark & Scala.
  • Security & Audit through Kerberos, Ranger, Knox, (Atlas, Eagle & Metron Discussion).
  • Manage Storage, Files, Directories, ACLs, Ports, Users, Edge Node & Services.
  • Space Quotas, TDE / Crypto & Snappy / Lzo Compression.
  • Node Commission / DeCommission.
  • Trash & Automated FileCopy.
  • Cluster Balancing, Node Labeling, Erasure Coding, Metrics, Snapshots & Archive Storage.
  • File Formats Avro, JSON, ORC & Parquet.
  • Hive, Hcatalog, Tez Engine, Pig, (Hawk Discussion).
  • Hbase with Phoenix, Trafodion.
  • ETL through Talend Open Studio / Pentaho Kettle into Hive / HDFS.
  • Oozie Scheduling for Hive / Pig Batch Process through Eclipse Plugin, Falcon.
  • Hue, Kafka, Flume, Flink.
  • Cluster BenchMarking.
  • Ganglia, Nagios, Netdata Monitoring Tools.
  • Solr, Zeppelin, Drill, Mahout, Giraph, Nifi.
  • Hortonworks “AWS” Multi Node Cluster SetUp (HDP 2.3)
    • AWS Node SetUp & Remote Access Management.
    • Following Same Process as for OnPremise Installation as above.
  • Cloudera “OnPremise” & “AWS” Multi Node Cluster SetUp (CDH 5.1)
    • Install & Configure major components as in Hortonworks stack.
    • Kudu, Impala, Kite, Sentry, RecordService, Navigator To Replace Respective Entities.
  • Copying Data Between Clusters Using Distcp.
  • Apache Hadoop “OnPremise” Multi Node Cluster SetUp (Hadoop 2.7.3)
    • Manual Installation through tar files and automated scripts.
    • Flexible Choice Matrix of tools from both above stacks.
  • Sharing VMs, Software, Installation Guides & Other Learning / Facilitation Materials as applicable.
  • Knowledge Transfer of Architecture, based out of Real Time Projects.
  • Guidance for Resume Preparation & Interview Questions.
  • POCs Suggestions for further practice and pursual.
  • Mock Interviews for Interested Candidates.
  • Discussing Current Market Standards & New / Incubating Tech.
  • Other Q & A, Doubt Clearance.