Saturday, 28 October 2017

MACHINE LEARNING AND ITS LIFE-CYCLE

MACHINE LEARNING–WHAT, WHY & WHERE

Before we go into the details of Machine Learning, it becomes very important for us to understand its need and the approaches used previously to address these needs, but, failed and now, how they are fulfilled by Machine Learning.
Algorithm is a process or set of rules to be followed in calculations or problem- solving operations on an input and produces desired output. Any problem, easy or complex can be solved with the help of algorithm. But, there are scenarios where input data is large, complex or unstructured that it becomes very difficult to write algorithms for them. Some common examples are to filter the spam emails, to find the information from images and videos, to predict customer behaviour, to diagnose the diseases and the list goes on. This is where the Machine Learning comes to the rescue. It takes the past experiences or datasets as input, extract useful information from it and then apply a set of complex rules and different models so that a usable output can be achieved.
For example, in case of stock Market, using the historical datasets, we can try to predict if particular share is going to be profitable or loss making. It’s easier to diagnose a disease which otherwise would have been very difficult using trial and error. Some other examples where Machine Language is used are:
  • Image Recognition
  • Face Recognition
  • Anamoly Detection
  • Medical Diagnosis
  • Search Engines
  • Handwriting Recognition
  • Financial Services
  • Crypto-currency


In short, Machine Learning helps us in solving many complex real world problems by taking past experiences as input and provides good future prediction.


Life-Cycle of Machine Learning

The process of Machine learning can discussed with the below figure.

1. Problem definition: First step is always to define the problem and then determine if Machine Learning can be used to solve it or not. 

2.Data Preparation: Once problem is defined, data is collected from the source systems and transformed into consumable files. This is the most important part of the Machine Learning as it helps to convert the raw data collected from different sources into useful information.


Steps for Data Preparation:
The raw data might contain the data which is not useful and hindering the ultimate performance.  More the accurate data, more the prediction will be accurate.
  • Firstly this raw data should be formatted in some format like csv files, database with headers, so that using the python libraries like ‘Pandas’, it can be stored as data frame and it became easy to operate on different rows and columns. The Pandas, sklearn libraries in the Python is very useful in preprocessing of the data.
  • Now, we can perform the different operations on these data-frames which hinder the performance like the unwanted records need to remove from the datasets, replacing and removing of missing data, encoding of categorical data, feature extraction (using only specific data from the dataset).
  • Then, the above processed data is divided between testing and training dataset generally in ratio of 40:60 or 30: 70.

3. Data Modelling: Different algorithms that can help in resolving the problem are identified and applied on the training dataset to prepare a predictive model with the highest accuracy possible. Once the model is ready, it is applied on testing dataset to evaluate the performance.
Data modelling usually is an iterative process. There are very high chances that model may not produce expected results in the first run. Hence, if the outcome of testing dataset are not as desired, model is reevaluated. Sometimes performance is fixed by tuning the model parameters and in some cases model is rebuild. This process continues until we desired outcome is achieved.

TYPES OF MACHINE LEARNING

There are three types of Machine Learning. Let’s discuss in brief about them.

Supervised Learning

The type of learning in which prediction is made with the given attributes or features of the dataset. The goal of supervised algorithm is to analyse the training data-set and generate a function that maps the input values with the desired output values. This training process with the given input will continue until the model or algorithm start demonstrating good performance.

Supervised Learning can be divided into two types on the basis of desired output variable.

1. ClassificationTo predict and label the output as category. For example - 
  • Knowledge Extraction: To divide the customers in a bank into ‘high risk’ and ‘low risk’ categories for the purposes of identifying the loan or credit card eligibility. 
  • Message filtering:  To filter out the malicious emails and flag them as ‘spam’.
  • Medical Diagnosis: Diagnose the patients with the help of their symptoms.
  • Recognising a person on the basis of his handwriting, face or speech.
  • Weather Prediction
2. RegressionRegression Learning have continuous, numerical value as their output.
  • To find out the number of mangoes in a basket of different fruits.
  • To predict the approximate salary of employee in an organisation based on the qualification and number of years of experience of the employees.
  • To predict the age of the person on the basis of its height and weight.
The different types of Supervised algorithm used in Machine Learning are:
  • Linear Regression
  • Polynomial Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest
  • Naïve Bayes
  • Support Vector Machines
  • Kernel SVM
  • K-NN

UnSupervised Learning

In this type of learning, we only have input data available without any corresponding output variable to achieve. There is very little idea of the result or no prior prediction in unsupervised learning.

The categories of the data are unknown in unsupervised learning. The aim of this learning is to identify the regularities, structures or patterns in the data and divide the data among similar groups in order to learn more about it.

The unsupervised learning can be achieved through ‘clustering’. The goal of the clustering is to find out similar patterns or clusters in a given input. For example, to find the customer behaviour on the different products launched in the market. It helps to analyse the market value of the product and help to maintain customer relationship by providing good service.


Unsupervised learning also helps in face recognition from the image based on the pixels, colour, size etc.


Thanks all for reading this blog!!

You can find more information on these topic from the below reference which help me to take knowledge before writing this blog.


No comments:

Post a Comment