Or what are the Dimensional Design principles to make a good data warehouse?

For data scientists, machine learning models and visualization packages are essential tools. But more importantly, data scientists rely on data and data infrastructures to do their analytics and modeling. Without data and databases, all developed analytics tools and techniques are useless. Many data scientists get their data in raw formats from several sources of information. But, for many data scientists as well as business decision-makers, especially in large enterprises, the main sources of information are corporate data warehouses. A data warehouse is a structured organization of all available data (ideally) in the company. …


Running and managing MySQL databases remotely

Running databases on local computers is easy and sometimes sufficient enough at the development step. But, deploying most applications needs to run databases on a remote server. There are thousands of solutions for deploying databases remotely. This article shows you how to create a simple database on AWS EC2 services and remotely manage it.

This article is written for beginners who have no cloud database deployment experience. Also, as said, there are many cloud-based and non-cloud-based solutions to deploy databases. For example, AWS has a dedicated service, called AWS RDS, for deploying databases on Cloud. We will discuss some of…


Getting valuable information from missing values.

When I load a new dataset, one of the first things I check is how many missing values we have. Seeing all those NaN values are disappointing and discouraging. It is not unusual to have columns with more than 50% missing values. We are all so excited about the new datasets, mainly because new datasets mean more data and more improvement. But like any other previous datasets, the new datasets are full of USELESS MISSING VALUES!!!!

But wait! Are missing values really invaluable? Can I get some extra information or insight from the missing values?

In this short article, I…


How to use Python Logging library tools for tracking code events and debugging

Logging is a popular solution for tracking events in a code or debugging. Many of us (Python programmers and data scientists) have this bad habit of using print() to debug and track events in our codes.

Why using print() for logging and debugging is not a good practice?

  1. The print() statement fails if your code does not have access to the console.
  2. To define basic logging needs, several lines of code are needed.
  3. Including additional logging information is not easy.
  4. The print() statement only displays messages on the console. …


Build MVPs with a few lines of code

Building Python applications that have graphical user interfaces and are doing sophisticated tasks might look difficult. In a recently published article (see the link below), I mentioned how only 7 Python libraries are needed to start building applications.

This article will show you how to build a simple translation application in a few lines. I only use two Python libraries: requests and ipywidgets.


I chose to write a translator application as an example to show you how it is easy to build applications in Python. This application gets an English text and shows its Spanish translation. Very easy and straightforward.


Types of uncertainties in AI decision-making problems.

Applying artificial intelligence (AI) to personal and business decision-making problems depends on how AI can perceive and handle uncertainty and risk. To understand the role of uncertainty in a decision-making problem, let’s have a quick review on decision making first. The most naïve anatomy of decision making is shown in the figure below.


An example of changing the mindset to solve data science problems better.

I have taught different topics of data science to different groups of scientists (mostly non-data scientists). I had a simple question at the beginning of the class to break the ice. Interestingly, most non-data scientists, unlike data scientists, found it a hard question. I am going to share the question with you. Also, I’ll show you why many non-data scientists could not answer it. I hope it helps you to change your mindset and become a better data scientist.

The Question

Here is the question that I call “Crazy Boss Puzzle.”

Imagine a situation like this. I am working on a data…


Why collecting decision-making data inside a business is necessary

In the era of digital transformation, companies are collecting almost every type of data. Sensors, financial, and logistic data are just a few examples. However, at least one important type of data that most businesses miss is internal “Decision-Making Data.”

Decision-Making Data

The “Decision-Making Data” is probably the most important, expensive, and complex data that every organization produces internally but rarely collected. To understand this type of data, let’s start with a straightforward example. Many of you have played chess. Some of you are familiar with PGN or Portable Game Notation. PGN records chess games in a computer-processable format. PGN records chess…

Seven Python libraries to make your first data science MVP application

How to build data science applications.
What do I need to learn to make my first data science application? What about web deployment? Do I need to learn Flask or Django for web applications? Do I need to learn TensorFlow to make a deep learning application? How should I make my user interface? Do I need to learn HTML, CSS, and JS too?

When I started my journey to learn data science, those were the questions that I always had in my mind. My intention to learn data science was not only to develop models or clean data. I wanted to make applications that people can…


Some fun puzzles with answers about the boolean data type.

In data science and programming, boolean (True and False) data are considered simple and boring. I believe booleans are fun and sometimes unpredictable if you don’t know their logic. I will convince many of you that some interesting features are associated with boolean objects that you probably ignored.

Easy Puzzle

Let me start with an easy one.

This code should be able to recognize int and boolean objects. If the input object is not boolean or integer, it tells you that the object is “Something Else!”. In the end, I tested the function with four examples. What is your guess about…

Naser Tamimi

I am a data scientist working for Shell. My mission is to teach my readers those things that I learned in a hard way.

