Search this site
Skip to main content
Skip to navigation
Einext Blog
About Us
Hadoop
Apache Kafka
Apache Solr Basics
Big Data Use Cases
Data Analysis Using Pig
Hadoop Logging
Hadoop MR Project Using Maven
Hadoop Security
Hadoop Stress Test
HBase - Bulk Load Into HBase Table
HBase Fundamentals
Apache Phoenix
HDFS Commands
Hive - Connecting to Hive JDBC
Hive - Optimize Joins
Hive - Table Partitions
Hive - Window Functions
Hive and Sqoop: CDC
Hive Bucketing Example
Hive File Format and Compression
Hive Join Example
Hive Table - Indexing
Hive Table Using Regex Serde
Hive: Enable mysql Metastore
Sqoop - import from RDBMS
Limitations of HDFS
Map Reduce Algorithm
Oozie - Incremental Table Load Workflow
Performance Enhancement of MR Jobs
Scheduling Job Using Oozie
Setup CDH cluster on AWS
Setup Eclipse for Hadoop MapReduce Development
Storage Format and Compression
Submit MapReduce Job
Verifying Zookeeper
YARN Resource Allocation
Yelp Academic Dataset
Apache Beam
Hive - Sampling
Hive - CBO
Impala
Oozie Hortonworks (Sandbox)
HBase Filters Examples
Kafka SSL
Kafka Connect
Presto
Install HDP + Ranger + Knox + Kerberos
Apache Cassandra
Security
Cassandra - SASI Index
Cassandra Snapshot and Restore
Complex join queries using Spark
CQL (Cassandra Query Language)
Migrate data from RDBMS
Query Cassandra Tables using Spark
Related Technologies
SSTables
Stress Testing Cassandra
DSE Search
Cassandra Cluster on Kubernetes
Performance Testing with YCSB
Apache Solr
Solr: Custom Request Handler
Solr: Index pdf, word etc (Tika)
Solr: Indexing using spark
Solr: Morphline for ETL
Solr: Query parameters
Solr: Text Analyzers
Apache Spark
Bucket By
Catalyst and Tungsten
Codegen in Spark 2.0
Compress Output Files in Spark
Convert String to Timestamp for SparkSQL
Create Spark Project in Eclipse
Create UDF
DAG (Directed Acyclic Graph)
Data Virtualization Using Spark
Dataframe Summary
File Format
Hive Metastore in Spark
HiveContext vs Spark SQLContext
Jupyter Notebook for Pyspark
Kryo Serialization
Loading Data into HBase using Spark
ML Using SparkR
PredictionIO for Machine Learning
PIO Client For EventServer of PowerPlant
Programming Language Support for Spark
Pyspark working with HBase
RDD Operations (Scala)
RDD Partition Behaviour
Running Spark on Windows
Sbt build manager
Scala for Spark
Scala UDF in Pyspark
Setup a Spark Cluster in Standalone Mode
Setup Spark Cluster
Setup Zeppelin
Simple Dataframe Operations
Simple Stream Producer
Spark Dataframe with Python (Pyspark)
Spark Memory Management
Spark SQL over REST API
Spark to Read from S3
Stream Processing RDBMS
Streaming RDBMS Tables
Thrift Service on Spark SQL and JDBC
Twitter Kafka Spark Streaming
Twitter Stream as Kafka Source
Window Functions in Spark SQL
Twitter live streaming (python)
Working with AWS S3 Storage Using Spark
Working with MySQL from Spark SQL
XML Doc and Blob Field
Read from nested directories
SparkKinesis
Mapr XD
Mapr-DB
MaprStream to MaprDB
MapR Sandbox
Mapr Streaming
Drill for Interactive Query
Google Cloud
Machine Learning
Datasources
AWS Hosted Datasources
Download Stock Prices
Extracting Text
Live Tweets using Streaming API
NLP
Async Tasks Management
Anaconda on windows
CUDA, OpenCL and OpenGL
How to draw png output from decision tree
Learning resources
Stanford NLP
Tensorflow with GPU
Xgboost for Python in MacOS
Data Analysis using R
01 CRAN packages
02 Data Sources and Getting Data into R
03 Slice and Dice in R
04 Joining Datasets using R
05 Create Composite Variable
06 Grouping and Aggregation in R
07 Sampling using R
08 Statistics for Single Variable
09 Working with Missing Data
10 Correlation Analysis
11 Working with colors in R
12 Plotting Variables in R
13 Association Plots in R
14 Plotting Heatmap
15 Overlaying plots
16 Outlier Analysis
17 Data Transformation of a Variable
18 Tidy Data
19 Writing Function in R
Appendix 01 Benchmarking R Performance
Appendix 02 Machine Learning Resources
ggplot
ML - 01 Linear Regression
ML - 02 Classification Metrics
Test Your Python Skills
C Extension with Python
Cython Compilation
ZMQ - Python and Java Communication
Miscellaneous
AWS SSH Tunneling
Create A Big Data Sandbox
Create NFS Sharable Directory
Install Scala on CentOS
Passwordless SSH and SCP
Python Useful Commands
Scala / Java Commons
Setup the VM
Sign up for a dev account of twitter
Useful AWS Commands
Useful Git Commands
Useful Java Tips
Useful Linux Commands
Useful MySQL Commands
VirtualBox Commands
Dev Ops Tools
Dynamic Code
PostgresSQL
CORS
Docker and Kubernetes
Cassandra on docker
Deployment Automation
Build Custom Image
Kubernetes Basics
Kafka Single Node
Case Studies
Einext Blog
CUDA, OpenCL and OpenGL
CUDA OpenCL and OpenGL
Report abuse
Page details
Page updated
Report abuse