The Data Mining Blog : Data Mining : Business Intelligence : Analytics : Marketing : Finance:

User Generated Content

Posted in Amazon, Business Intelligence, Data, Data Mining by Pankaj Gudimella on March 27, 2009

Dave Winer says

The thing I like best about shopping at Amazon are the user comments. They really are good. And I often base purchasing decisions on what the other users say. It got so bad that when I went shopping at Fry’s for some sound equipment I fumbled around until I realized what I was missing was the advice of other shoppers. I did the unfair thing, listened to a bunch of stuff and then went home and bought what I liked and what the others liked, from Amazon.

The gold mine of data Amazon is collecting from its user’s via their reviews has been increasing their bottom line for years’ now. Amazon is very prudent in how it uses this data and provides it to the customer.

Facebook is the other company that is sitting on such a gold mine and will unleash its true potential soon. Here is a piece from Scoble about facebook and Zuckerberg and the phase the business is in.

Data sets from Amazon

Posted in Amazon, Business Intelligence, Data by Pankaj Gudimella on February 25, 2009

Amazon announced four new data sets available to the public yesterday. You can find more on this here at the Amazon Web Services Blog.

It would be interesting to know the findings/insights from the developers who would work with these data sets.


Posted in Analytics, Business Intelligence, Data Mining by Pankaj Gudimella on February 12, 2009

An algorithm is a set of instructions that allows you to solve a problem.

Each instruction is simple and repeatable. It’s important to understand that the instructions work on all similar problems, not just one.

Here’s an algorithm for sorting any set of numbers, to get them into order. Start with 4,3,5,6,2 for example.

The bubble sort algorithm is simple. Compare two numbers. If the first number is higher than the second, switch them. So now it’s 3,4,5,6,2. Next step is to compare positions two and three. If the second is higher than the third (it’s not) switch them. Repeat for the whole string. Then start over. Do it over and over again until you can go the whole way with no switching. Done.

More here from Seth.


Posted in Analytics, Book, Business Intelligence by Pankaj Gudimella on February 2, 2009

Recently completed reading Numerati by Stephen Baker.

Good read for someone looking for an introduction to analytics and how it is being used in various industries today.

Tagged with: , ,

Data mining in the credit crisis

Posted in Analytics, Business Intelligence, Data Mining, NYTimes by Pankaj Gudimella on February 2, 2009

In recent months, American Express has gone far beyond simply checking your credit score and making sure you pay on time. The company has been looking at home prices in your area, the type of mortgage lender you’re using and whether small-business card customers work in an industry under siege. It has also been looking at how you spend your money, searching for patterns or similarities to other customers who have trouble paying their bills.

More here

Your Mobile Phone Data —> Your Habits

Posted in Data Mining by Pankaj Gudimella on June 4, 2008

The whereabouts of more than 100,000 mobile phone users have been tracked in an attempt to build a comprehensive picture of human movements.

The study concludes that humans are creatures of habit, mostly visiting the same few spots time and time again.

Most people also move less than 10km on a regular basis, according to the study published in the journal Nature.

The results could be used to help prevent outbreaks of disease or forecast traffic, the scientists said.

“It would be wonderful if every [mobile] carrier could give universities access to their data because it’s so rich,” said Dr Marta Gonzalez of Northeastern University, Boston, US, and one of the authors of the paper.

Dr William Webb, head of research and development at the UK telecoms regulator, Ofcom, agreed that mobile phone data was still underexploited.

“This is just the tip of the iceberg,” he told BBC News.

More here from BBC

Tagged with: ,

Predictive Modeling 101

Posted in Data Mining, Predictive Modeling by Pankaj Gudimella on May 22, 2008

I read a very good article from marketingsherpa which explaines the basics of predictive modeling. A very good read for someone who is looking for an introduction to the art and science of predictive modeling. Enjoy the article!

How to Create a Predictive Model
A predictive model determines the probability of a certain outcome based on a target — what you want to predict. You use data-mining software to sift through your customer database.

Every category of customer information — age or favorite color or buying frequency or how many times a customer visited your store in the past year — is a variable collected as a predictor of future behavior. A predictor is your model’s central building block.

For example, you want to predict which customers will visit your store at least five times in the next 12 months. Here’s a simplified version of what you need to do:

-> Step #1. Prepare your data

Preparing data is the most difficult and complicated step in the process. We’ll talk about why and what you can do about it later.

“It’s estimated that 70% to 80% of the time devoted to an analytical project is devoted to data preparation. It’s just getting the data in the one place in the right form to actually start building models,” says Richard Hren, Director Product Marketing, SPSS.

->Step #2. Set your target

Your target is the customers who will visit your store five times in the next year. For this example, the target is the same as one of the variables — customers who visited the store five times in the past year.

->Step #3. Determine the most important variables

Determine which variables are most relevant to your target. Some types of data mining software will dig through data and tell you. Other packages depend on your judgment to determine which variables matter most. Some software will do both: tell you what it likes and allow a statistician to tweak it.

->Step #4. Run program to get a model

The software weighs the importance of each variable and creates a model — think of it as an equation. You fill in each variable in the equation and then the model calculates and gives higher scores to customers with the greatest probability of visiting your store more times in the next year.

Usually, you don’t have to score one customer at a time. You can build a model to automatically score a database of these higher probability customers.

The next generation web and data mining

Posted in Data Mining, Web Analytics by Pankaj Gudimella on May 22, 2008

I truly believe the next battleground will be based on scaling the back end and more importantly mining all of that clickstream data to offer a better service to users. Those that can do it cheaply and effectively will win. The tools are getting more sophisticated, the data sizes are growing exponentially, and companies don’t want to break the bank nor wait for Godot to deliver results.

More here from Ed Sim from BeyondVC.

Tagged with: ,

Guessing the Online Customer’s Next Want

Posted in Data Mining, NYTimes, Online Marketing by Pankaj Gudimella on May 19, 2008

Marketers have always tried to predict what people want, and then get them to buy it.

Among online retailers, pushing customers toward other products they might want is a common practice. Both Amazon and Netflix, two of the best-known practitioners of targeted upselling, have long recommended products or movie titles to their clientele. They do so using a technique called collaborative filtering, basing suggestions on customers’ previous purchases and on how they rate products compared to other consumers.

Figuring that out is not so easy. For one thing, people do not always buy what they like. Someone may buy a sweater for their grandmother even though they dislike it and would never get it again. Similarly, a person who rents a movie may actually detest it but knows her child likes it. Or a film that was seen on a small airplane screen may garner a lower rating than if it were seen at a large multiplex.

More here from NYTimes.

Inscreased Usability gives an edge to SAS

Posted in Business Intelligence, Data Mining, SAS by Pankaj Gudimella on May 14, 2008

It’s been a busy spring for SAS Institute Inc., which recently unveiled version 9.2 release of its flagship business intelligence (BI) platform, picked up additional text mining and analytic technology (by acquiring Teragram), and announced an expansion of its relationship with data warehousing (DW) powerhouse Teradata Corp. (see

Despite a year of unprecedented consolidation in the BI market by a trio of BI giants (IBM Corp., Oracle Corp., and SAP AG), it’s business as usual at SAS, the Cary, N.C.-based BI, DW, and statistical analysis player, according to Ken Hausman, the company’s product marketing manager for data integration.

If anything, Hausman argues, rampant BI consolidation has only helped SAS refine its message. “There’s a certain part of our sales pitch that says SAS is a stable company, we’ve been around for 31 years, and we’ve had a fairly consistent focus over those years,” he comments. “With all that’s happened [with consolidation], that’s [a pitch that is] resonating with customers.” There’s also SAS’ focus on R&D, Hausman stresses. The company reinvests about one-fifth of its annual revenues into additional research and development activities.

Read more here.