Source: Machine Learning & Data Science Skills You Need To Get Hired In Fortune 500 Companies
===
-
Proficient in querying and manipulating large data sets for analytical purposes using SQL-like languages (Hive / Impala)
-
Apache Ecosystem – Hadoop, Hadoop File System (HDFS), MapReduce/YARN (Yet Another Resource Negotiator), Hive (Data warehouse infrastructure), HBase (Distributed Column-oriented NoSQL Database), Oozie Workflow, Sqoop Data Ingestion, Zookeeper, Pig Scripting, Ambari (Hadoop Clusters Management Platform), Spark (Big Data Processing Engine), Flink (Streaming dataflow / analytics engine), Storm (Real-time data processing), Flume (Log data processing), Avro (Data serialization)
-
Machine learning techniques such as Neural networks, Hidden Markov Model (HMM), Maximum entropy models and other popular algorithms
-
Feature engineering and statistical modeling methods such as Conditional Random Field (CRF), HMM, Support Vector Machine (SVM), Gradient Boosting Decision Tree(GBDT) etc.
-
Statistical methods such as Categorical Data Analysis, Multivariate Analysis, Regression Analysis, Survey Sampling Design, Survival/Reliability analysis, Design of experiments, Analysis of variance.
-
Building machine learning systems for modern parallel-computing environments (GPU, Multicore Symmetric Multiprocessing (SMP), Distributed Clusters); CUDA kernels
-
Machine learning frameworks such as Caffe, Theano, Torch, TensorFlow, MXNet, Apache Mahout, Spark MLlib; scikit-learn, scipy, numpy; Amazon Machine Learning
-
Convolutional Neural Networks (CNN), Recurrent Neural Network(RNN), Supervised and Unsupervised learning, and optimization techniques
-
Traditional/Modern statistical techniques, including SVM, Regularization, Boosting, Random Forests, and other Ensemble Methods
-
Natural language processing(NLP) problems, including predictive typing, input method conversion, tokenization, tagging, language modeling, language identification, sentiment analysis, named entity recognition, lemmatization, summarization
-
Building solutions for spell corrections, related searches, synonym/acronym expansions, query rewrites, metrics accumulation, spam prevention, ranking, and recommendations
-
Proficiency in predictive modeling and data mining tools such as SQL, R, SAS, JMP, Python, Watson, and Aster
-
Experience with data visualization tools such as D3.js, Tableau, Qlikview etc.
-
Familiarity with commercial ETL platforms like Informatica, SSIS, Talend, etc