Home > SSAS > SSAS – Data Mining

SSAS – Data Mining

Data mining is one of my interesting areas in BI space. Data mining gives the prediction of information based on the existing data just similar to the task completion estimation using SCRUM agile methodology in a project.Data mining involves a lot of data analysis and specialised skills besides business process knowledge with well-defined business goal.
Last month, one of friend who is new to BI space asked me to go through Data mining so I plan to do documentation on data mining development process.
 
Development Process
 
Step 1: Creating a new Analysis Services Project using BIDS.

Analysis Services Project

Analysis Services Project


 
Step 2: Create the Data Source as AdventureWorksDW2008R2 source database.
Create the Data Source View with ProspectiveBuyer and vTargetMail as shown below
Data Source View

Data Source View


 
Step 3: Create a new Mining Structure as
Right Click -> select New Mining Structure
Data mining structure

Data mining structure


Notes:
1. Data mining structure will be acts as source to data mining model
2. One mining structure can be the source to more than one mining model
 
Step 4: Creating the mining structure based on existing database i.e. DSV and an alternate option to use the cube, here creating based on existing database
Mining structure using existing Database

Mining structure using existing Database


 
Step 5: Data mining model development depends on the mining algorithms, choosing a particular algorithm depends on the required analysis. The algorithm section will be the data mining data analysis
Here choose the Decision Tree as primary goal is to make a decision based on existing data
Decision algorithm

Decision algorithm


 
Step 6: select the source tables for mining structure from DSV
Mining structure source tables

Mining structure source tables


 
Step 7: select the source table for analysis, the “Case” selection is the table data will use for the data mining
Data mining source tables

Data mining source tables


 
Step 8: Select the source table columns for data mining
Key: The column will be the primary key of the source table
Input: The column will include in the process prediction but not consider as input
Predictable: The columns will be predicted by the data mining model based on key columns data
Data mining source columns

Data mining source columns


 
Step 9: Select BikeBuyer and Region for prediction and input columns as below
Data mining columns

Data mining columns


Data mining columns

Data mining columns


 
Step 10: Select the content type of source data for analysis, the wizard will do auto detect of content type on most of the columns
Columns' content and data type

Columns' content and data type


 
Step 11: Specify any one of the options on how to include the source data for analysis. The description of the wizard is giving more information as below, here taken 30% of source data with 1000 as data set.
The selection process depends on the requirement scenario but considering large data for analysis will take more time to process
Cases for data mining

Cases for data mining


 
Step 12: Choose the name of the model, here selected the default name of table. The columns icons will represent the status of columns in the data mining
Mining model

Mining model


 
Step 13: The Mining structure has completed and view of the structure as
Mining structure view

Mining structure view


 
Step 14: Select the Mining model by using the mining structure with target mining algorithm, here Microsoft_Decision_Trees is the algorithm for data analysis
Mining model view

Mining model view


 
Step 15: The solution will required needs to deploy for viewing and the deployment process gives the SQL statements while deploying the server as below
Deployment - 1

Deployment - 1


Deployment - 2

Deployment - 2


Process deployment

Process deployment


 
Step 16: The model view as below after the deployment, here the model is predicting the Bike buyers with respect to the input values prediction.
We don’t have any particular scenario for analysis and it depends on the business requirements and varies to different scenarios.
Mining model analysis - Bike Buyer

Mining model analysis - Bike Buyer


Predicting the Regions with respect to the input values
Mining model analysis - Region

Mining model analysis - Region


Mining model linear analysis

Mining model linear analysis


 
Step 17: Viewer has two options to view the predicted data, Generic content tree viewer is another option to get all the analysis data in the tabular form.
Mining model viewer options

Mining model viewer options


 
Step 18: The mining model can browsing using SSMS as below
Model browsing using SSMS

Model browsing using SSMS


 
Step 19: The dependency tracker gives the dependencies among input source data to predict the mining model results
Model dependency tracker

Model dependency tracker


 
Points to consider:
 
1. We have not considered the source tables data analysis from AdventureWorksDW2008R2 database, in practical analysis of source data is one of the main requirement before developing any data mining model
2. Source data quality not taken into consideration, data mining depends on the quality of the source data

 

Categories: SSAS Tags:
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: