When I load a new dataset, one of the first things I check is how many missing values we have. Seeing all those NaN values are disappointing and discouraging. It is not unusual to have columns with more than 50% missing values. We are all so excited about the new datasets, mainly because new datasets mean more data and more improvement. But like any other previous datasets, the new datasets are full of USELESS MISSING VALUES!!!!
But wait! Are missing values really invaluable? Can I get some extra information or insight from the missing values?
In this short article, I…
Logging is a popular solution for tracking events in a code or debugging. Many of us (Python programmers and data scientists) have this bad habit of using print() to debug and track events in our codes.
Why using print() for logging and debugging is not a good practice?
Building Python applications that have graphical user interfaces and are doing sophisticated tasks might look difficult. In a recently published article (see the link below), I mentioned how only 7 Python libraries are needed to start building applications.
This article will show you how to build a simple translation application in a few lines. I only use two Python libraries: requests and ipywidgets.
I chose to write a translator application as an example to show you how it is easy to build applications in Python. This application gets an English text and shows its Spanish translation. Very easy and straightforward.
…
Applying artificial intelligence (AI) to personal and business decision-making problems depends on how AI can perceive and handle uncertainty and risk. To understand the role of uncertainty in a decision-making problem, let’s have a quick review on decision making first. The most naïve anatomy of decision making is shown in the figure below.
I have taught different topics of data science to different groups of scientists (mostly non-data scientists). I had a simple question at the beginning of the class to break the ice. Interestingly, most non-data scientists, unlike data scientists, found it a hard question. I am going to share the question with you. Also, I’ll show you why many non-data scientists could not answer it. I hope it helps you to change your mindset and become a better data scientist.
Here is the question that I call “Crazy Boss Puzzle.”
Imagine a situation like this. I am working on a data…
In the era of digital transformation, companies are collecting almost every type of data. Sensors, financial, and logistic data are just a few examples. However, at least one important type of data that most businesses miss is internal “Decision-Making Data.”
The “Decision-Making Data” is probably the most important, expensive, and complex data that every organization produces internally but rarely collected. To understand this type of data, let’s start with a straightforward example. Many of you have played chess. Some of you are familiar with PGN or Portable Game Notation. PGN records chess games in a computer-processable format. PGN records chess…
What do I need to learn to make my first data science application? What about web deployment? Do I need to learn Flask or Django for web applications? Do I need to learn TensorFlow to make a deep learning application? How should I make my user interface? Do I need to learn HTML, CSS, and JS too?
When I started my journey to learn data science, those were the questions that I always had in my mind. My intention to learn data science was not only to develop models or clean data. I wanted to make applications that people can…
In data science and programming, boolean (True and False) data are considered simple and boring. I believe booleans are fun and sometimes unpredictable if you don’t know their logic. I will convince many of you that some interesting features are associated with boolean objects that you probably ignored.
Let me start with an easy one.
This code should be able to recognize int and boolean objects. If the input object is not boolean or integer, it tells you that the object is “Something Else!”. In the end, I tested the function with four examples. What is your guess about…
I was working on a large dictionary in Python for a data science project. The Resouce Monitor (a windows utility that displays information about the use of hardware) showed an enormous amount of memory usage in a short amount of time. I knew that my draft code was not optimal, but the rate of memory utilization was not making sense with the growth rate of my dictionary length. It seemed that my dictionary length did not have a linear relationship with the dictionary object's size in memory. I decided to check the size of my dictionary in memory. I was…
It took me 14 years to finish my bachelor’s, master’s, and Ph.D. degrees. I could not imagine anything that can replace my 14 years of academic education until I took a number of online courses. I was wrong about the quality and job impact of online courses and certificates. My personal experience and working with many brilliant minds who took different education paths convinced me that the future of education is in online courses and certificates. Let me make it clear at the beginning. Although academic education and degrees will be valuable for a long time, online certificates will replace…
I am a data scientist working for Shell. My mission is to teach my readers those things that I learned in a hard way.