Inleiding to Big Data

Video transcript

  • Big data is a lot of data.
  • Big data is an amount of data that you cannot deal with using traditional methods and it's very relative because big data five years ago is not big data today. So it's constantly evolving.
  • I think it's when you have a great amount of data. Not only can you not store it, but you can’t process this data with your home computers.
  • Big data is any amount of data that can't fit in memory. It's an evolving definition that changes with the times.
  • Whenever your data set extends beyond the capacity of your systems to store and manipulate, it's big data. It means that if you have a laptop and your data doesn't fit on your laptop, that's big data for you. But if you are a very large firm and you've got large clusters of storage space and even then your data exceeds the storage capacity of your systems, then that's big data for you; so big data is not something that you can say, well, 50 gigabytes or 50 petabytes would make it big data, it is whenever a person, individual or firm's storage capacity or the ability to analyze data is exceeded by the amount of data that they have, that becomes big data for them.
  • We can play with big data. If we have a lot of data we can do visualization, we can do analytics.
  • We are generating more knowledge nowadays and this is difficult for humans to get used to it.
  • More data has been generated over the course of this interview than was probably generated in human history before my birth. That's remarkable; really, really remarkable.
  • Data science is relevant today because we have tons of data available. We used to worry about lack of data, now we have a data deluge. In the past we didn't have algorithms, now we have algorithms. In the past the software was expensive, now it's open source and it's free. In the past we couldn't store large amounts of data, now for a fraction of a cost we can have gazillions of data sets for very low cost. So the tools to work with data, the very availability of data and the ability to store and analyze data, it's all cheap, it's all available, it's all ubiquitous, it's here, there's never been a better time to be a data scientist.