Search this site
Einext Blog
  • About Us
  • Hadoop
    • Apache Kafka
    • Apache Solr Basics
    • Big Data Use Cases
    • Data Analysis Using Pig
    • Hadoop Logging
    • Hadoop MR Project Using Maven
    • Hadoop Security
    • Hadoop Stress Test
    • HBase - Bulk Load Into HBase Table
    • HBase Fundamentals
    • Apache Phoenix
    • HDFS Commands
    • Hive - Connecting to Hive JDBC
    • Hive - Optimize Joins
    • Hive - Table Partitions
    • Hive - Window Functions
    • Hive and Sqoop: CDC
    • Hive Bucketing Example
    • Hive File Format and Compression
    • Hive Join Example
    • Hive Table - Indexing
    • Hive Table Using Regex Serde
    • Hive: Enable mysql Metastore
    • Sqoop - import from RDBMS
    • Limitations of HDFS
    • Map Reduce Algorithm
    • Oozie - Incremental Table Load Workflow
    • Performance Enhancement of MR Jobs
    • Scheduling Job Using Oozie
    • Setup CDH cluster on AWS
    • Setup Eclipse for Hadoop MapReduce Development
    • Storage Format and Compression
    • Submit MapReduce Job
    • Verifying Zookeeper
    • YARN Resource Allocation
    • Yelp Academic Dataset
    • Apache Beam
    • Hive - Sampling
    • Hive - CBO
    • Impala
    • Oozie Hortonworks (Sandbox)
    • HBase Filters Examples
    • Kafka SSL
    • Kafka Connect
    • Presto
    • Install HDP + Ranger + Knox + Kerberos
  • Apache Cassandra
    • Security
    • Cassandra - SASI Index
    • Cassandra Snapshot and Restore
    • Complex join queries using Spark
    • CQL (Cassandra Query Language)
    • Migrate data from RDBMS
    • Query Cassandra Tables using Spark
    • Related Technologies
    • SSTables
    • Stress Testing Cassandra
    • DSE Search
    • Cassandra Cluster on Kubernetes
    • Performance Testing with YCSB
  • Apache Solr
    • Solr: Custom Request Handler
    • Solr: Index pdf, word etc (Tika)
    • Solr: Indexing using spark
    • Solr: Morphline for ETL
    • Solr: Query parameters
    • Solr: Text Analyzers
  • Apache Spark
    • Bucket By
    • Catalyst and Tungsten
    • Codegen in Spark 2.0
    • Compress Output Files in Spark
    • Convert String to Timestamp for SparkSQL
    • Create Spark Project in Eclipse
    • Create UDF
    • DAG (Directed Acyclic Graph)
    • Data Virtualization Using Spark
    • Dataframe Summary
    • File Format
    • Hive Metastore in Spark
    • HiveContext vs Spark SQLContext
    • Jupyter Notebook for Pyspark
    • Kryo Serialization
    • Loading Data into HBase using Spark
    • ML Using SparkR
    • PredictionIO for Machine Learning
      • PIO Client For EventServer of PowerPlant
    • Programming Language Support for Spark
    • Pyspark working with HBase
    • RDD Operations (Scala)
    • RDD Partition Behaviour
    • Running Spark on Windows
    • Sbt build manager
    • Scala for Spark
    • Scala UDF in Pyspark
    • Setup a Spark Cluster in Standalone Mode
    • Setup Spark Cluster
    • Setup Zeppelin
    • Simple Dataframe Operations
    • Simple Stream Producer
    • Spark Dataframe with Python (Pyspark)
    • Spark Memory Management
    • Spark SQL over REST API
    • Spark to Read from S3
    • Stream Processing RDBMS
    • Streaming RDBMS Tables
    • Thrift Service on Spark SQL and JDBC
    • Twitter Kafka Spark Streaming
    • Twitter Stream as Kafka Source
    • Window Functions in Spark SQL
    • Twitter live streaming (python)
    • Working with AWS S3 Storage Using Spark
    • Working with MySQL from Spark SQL
    • XML Doc and Blob Field
    • Read from nested directories
    • SparkKinesis
  • Mapr XD
    • Mapr-DB
    • MaprStream to MaprDB
    • MapR Sandbox
    • Mapr Streaming
    • Drill for Interactive Query
  • Google Cloud
  • Machine Learning
    • Datasources
      • AWS Hosted Datasources
      • Download Stock Prices
      • Extracting Text
      • Live Tweets using Streaming API
      • NLP
    • Async Tasks Management
    • Anaconda on windows
    • CUDA, OpenCL and OpenGL
    • How to draw png output from decision tree
    • Learning resources
    • Stanford NLP
    • Tensorflow with GPU
    • Xgboost for Python in MacOS
    • Data Analysis using R
      • 01 CRAN packages
      • 02 Data Sources and Getting Data into R
      • 03 Slice and Dice in R
      • 04 Joining Datasets using R
      • 05 Create Composite Variable
      • 06 Grouping and Aggregation in R
      • 07 Sampling using R
      • 08 Statistics for Single Variable
      • 09 Working with Missing Data
      • 10 Correlation Analysis
      • 11 Working with colors in R
      • 12 Plotting Variables in R
      • 13 Association Plots in R
      • 14 Plotting Heatmap
      • 15 Overlaying plots
      • 16 Outlier Analysis
      • 17 Data Transformation of a Variable
      • 18 Tidy Data
      • 19 Writing Function in R
      • Appendix 01 Benchmarking R Performance
      • Appendix 02 Machine Learning Resources
      • ggplot
      • ML - 01 Linear Regression
      • ML - 02 Classification Metrics
    • Test Your Python Skills
    • C Extension with Python
    • Cython Compilation
    • ZMQ - Python and Java Communication
  • Miscellaneous
    • AWS SSH Tunneling
    • Create A Big Data Sandbox
    • Create NFS Sharable Directory
    • Install Scala on CentOS
    • Passwordless SSH and SCP
    • Python Useful Commands
    • Scala / Java Commons
    • Setup the VM
    • Sign up for a dev account of twitter
    • Useful AWS Commands
    • Useful Git Commands
    • Useful Java Tips
    • Useful Linux Commands
    • Useful MySQL Commands
    • VirtualBox Commands
    • Dev Ops Tools
    • Dynamic Code
    • PostgresSQL
    • CORS
  • Docker and Kubernetes
    • Cassandra on docker
    • Deployment Automation
    • Build Custom Image
    • Kubernetes Basics
    • Kafka Single Node
  • Case Studies
Einext Blog

Appendix 02 Machine Learning Resources

Machine Learning Source of Information

http://archive.ics.uci.edu/ml/datasets.html

http://www.datasciencecentral.com/

https://www.r-bloggers.com/

https://docs.databricks.com/spark/latest/mllib/index.html

https://databricks.com/blog/category/engineering/machine-learning

Google alert on machine learning

https://techcrunch.com/

https://www.kaggle.com/competitions

https://www.datacamp.com/'

https://www.tensorflow.org/

http://www.kdnuggets.com/

Scikit Learn Tutorial Series

Mooc:

Analytics Edge - MIT https://www.edx.org/course/analytics-edge-mitx-15-071x-2

Machine Learning - Stanford https://www.coursera.org/learn/machine-learning

Recommender System - https://www.coursera.org/learn/recommender-systems-introduction

Certification:

https://education.emc.com/guest/campaign/data_science.aspx

Report abuse
Page details
Page updated
Report abuse