Some of the world’s biggest tech companies from Google to Facebook are data-driven, but few startup founders have any idea what a data scientist does, never mind whether they should hire one. Here is VentureBeat’s guide to data science for startups.
What does a data scientist do?
DJ Patil led LinkedIn’s data science team and is now the Data Scientist in residence at Greylock Partners. His free ebook “Building Data Science Teams” provides an excellent introduction to the basic areas of data science and how to build a team.
For startups, the most relevant applications of data science are probably decision science and product and marketing analytics. Decision science, as the name implies, allows you to identify and monitor key metrics for your business and answer strategic questions like “Which country should we expand into next?” or “What is the impact on the business if we lose this client?”. Google’s data science team even drives its HR policies.
Product analytics covers anything from how users are reacting to new features to developing standalone data products. LinkedIn’s “People you may know” feature and Amazon’s recommendation system are data-driven features that attempt to keep users on the site longer or drive more sales.
Using data to showcase or market a product is the domain of marketing analytics. One of the best known examples is okCupid’s okTrends blog, which features posts like “The case for an older woman” or “The 4 big myths of profile photos”. The blog drives massive traffic to the site and is regularly covered in the media.
Who are the data scientists?
Since data science is a new area, practitioners often migrate from other fields. You may see maths, statistics, machine learning or computer science on their resumes or a data-intensive field like meteorology. Data scientists want to be of central importance to a business, especially when it’s a startup. The best data scientists are both intensely curious and great communicators. They answer important questions and tell good stories using data.
What is data infrastructure?
Data scientists need specialized tools to manage and process large amounts of data. The minimum you need to get started is simple data access, usually via a database. Larger-scale or less uniform data may require a tool like Hadoop, an open source platform for distributed processing of large data sets across clusters of computers, as well as someone with the technical expertise to use it. Data stores like Cassandra are designed to perform well on very large datasets. These are some of the most commonly used tools, but there are many others for tasks such as streaming data collection, querying non-relational databases and job scheduling.
When do you need to hire a data scientist?
VentureBeat talked to data scientist Cathy O’Neil, who herself works for a startup (Intent Media), about when you need to hire a data scientist. If your data volume is growing, you don’t know if you are seeing noise or information in your data, or in general, if you are not running your business sufficiently quantitatively, then you may need to consider hiring.