The Data Mining Blog : Data Mining : Business Intelligence : Analytics : Marketing : Finance:

Predictive Modeling 101

Posted in Data Mining, Predictive Modeling by Pankaj Gudimella on May 22, 2008

I read a very good article from marketingsherpa which explaines the basics of predictive modeling. A very good read for someone who is looking for an introduction to the art and science of predictive modeling. Enjoy the article!

How to Create a Predictive Model
A predictive model determines the probability of a certain outcome based on a target — what you want to predict. You use data-mining software to sift through your customer database.

Every category of customer information — age or favorite color or buying frequency or how many times a customer visited your store in the past year — is a variable collected as a predictor of future behavior. A predictor is your model’s central building block.

For example, you want to predict which customers will visit your store at least five times in the next 12 months. Here’s a simplified version of what you need to do:

-> Step #1. Prepare your data

Preparing data is the most difficult and complicated step in the process. We’ll talk about why and what you can do about it later.

“It’s estimated that 70% to 80% of the time devoted to an analytical project is devoted to data preparation. It’s just getting the data in the one place in the right form to actually start building models,” says Richard Hren, Director Product Marketing, SPSS.

->Step #2. Set your target

Your target is the customers who will visit your store five times in the next year. For this example, the target is the same as one of the variables — customers who visited the store five times in the past year.

->Step #3. Determine the most important variables

Determine which variables are most relevant to your target. Some types of data mining software will dig through data and tell you. Other packages depend on your judgment to determine which variables matter most. Some software will do both: tell you what it likes and allow a statistician to tweak it.

->Step #4. Run program to get a model

The software weighs the importance of each variable and creates a model — think of it as an equation. You fill in each variable in the equation and then the model calculates and gives higher scores to customers with the greatest probability of visiting your store more times in the next year.

Usually, you don’t have to score one customer at a time. You can build a model to automatically score a database of these higher probability customers.