## The bare minimum you should be able to cover in an technical interview (II)

This constitutes the second part of a post. Find the first part here.

## 5-Solid understanding of databases.

It might seem that the “old” database systems (MySQL, postgres, Oracle…) are not as cool as the new noSQL systems or other big-data solutions, however the truth is that they are the core of the vast majority of systems.

Understanding the benefits, limitations and trade offs of database systems is key, in an interview you should be able to clearly and confidently talk about:

1. The SQL query language, specifically you should be able to to write queries with a number of joins and filters.
2. The indexes of tables and their limitations: Why adding an index is (normally) the way to solve a slow query? what happens when that is not the case? what is the penalty for having an index? which kind of data structure does a database use internally to represent an index? All of those should be questions that a strong candidate must be able to answer with confidence.
3. Query improvement: Are you able to analyze a query? you should be familiar with different techniques and tools to figure out where the bottleneck of a given query is.
4. Connection leaks and connection pools: What does each of those terms mean? why use a connection pool, what is the effect of connection leaks? It is the job of any decent developer to be aware of those elements.
5. Security: Databases end up storing very sensitive information, there are different techniques to protect such information and there are, likewise, a bunch of bugs/malpractices that can expose such information to the wrong people. Yes, I am talking about SQL injection (among other problems), one would think that by this time developers would have learn their lesson, right?

No spoiler alert here: we have all made mistakes, I have quite a bunch in my own history… fortunately I have learn from all of them, and I have really not repeated them.

If you think that is not your case, think again. Unless you have been in the industry for too little, chances are you have been involved in some sort of mess up: maybe it was not your code, maybe you just approved the pull request, or maybe you just designed the software in a way that it became a total mess: it is all ok, really it is.

Even the NASA engineers have made memorable mistakes, why would you be free of such mistakes? The only good thing of mistakes is what you learn from them, try to visit your own failures: what happened? why? did you take responsibility? It is true that some times software developers are put in a position where it is very likely they will make a mistake: projects with terrible architecture, managers that are just pushing for dead lines… the list goes on. But it is still YOUR responsibility to make things work or raise your voice to warn about the consequences.

Whenever I interview an engineer who claims he/she makes no mistakes I always think that the person is either a genius or an arrogant candidate. It takes quite a level of maturity to recognize your mistakes, I certainly respect that.

## 7-Talk about your own history and your … successes.

Finally: be ready to talk about that project where you nailed it, or that bug that was so hard to fix but ended up fixing that problem that everybody had already given up on.

One of the most satisfactory parts of being a software engineer is the professional reward that we can experiment almost every week by either delivering a new feature or improving an existing one.

You should have 2-3 cases where you where particularly brilliant. Even if you are just a junior developer with 1-2 years experience there must be something that you are proud of, or something that went better than expected, hell you might have even deliver a project ahead of schedule (a rare circumstance… but absolutely epic whenever it happens).

## 8-You should be able to convince your interviewer that you are a nice person.

This might sound too obvious or maybe out or place, but it is fundamental.

A technical interview is a process that tries to answer two simple questions

1. Is the candidate technically sound?
2. Would I have a beer/tea/coffee with the candidate?

We have well covered part (1), but you should really be ready to cover part (2), after all software building is a team effort, developer teams go through a lot: good and bad times, moments of joy and tension… You really need to be a nice person to fit in well.

If you think this is not so important, think again: A junior developer can improve over time to became a senior one, but a senior developer who is a jerk cannot be fixed. Who would you be willing to work with?

This concludes the minimum you should be able to cover… I really hope these advices will help you in your interview processes.

Happy coding.

## The bare minimum you should be able to cover in an technical interview (I)

I have been doing technical interviews for around 8 years, and it is not easy to find great candidates, probably because great candidates last very little in the market, however time after time I deal with candidates that, unfortunately, are not able to cover certain basics, lets go through them.

## 1-Big-oh notation, time and space complexity.

Being able to determine when it is worth to optimize an algorithm is absolutely fundamental, I have seen codes where collections of 100 elements were sorted because a search will be performed once or twice… that says a lot about developers.

Also, it is important to understand how bad or how well out algorithm will do as a function of the size of the input. An image is worth more than words, so here is the fundamental picture to keep in mind.

Anybody doing an interview should have that picture really clear in their heads. The reason? lets go through a simple question: lets say you have 1 million numbers, unsorted, and you will need to search whether or not a given number is in your million numbers, such operation is to be repeated 100 times. Is it worth sorting your 1 million numbers or not?

It is a simple question, and it should have a simple and definite answer, however without the help of the big-oh notation and some knowledge about the running time of sorting and searching algorithms, it is impossible to answer it.

However, with the help of big-oh one can answer this question very effectively.

Doing a sequential search will take linear time, that is $$O(n)$$
Doing a binary search will take log time, that is $$O(log(n))$$ but in order to perform a binary search we first need to sort. Sorting takes $$O(n*log(n))$$ now, in our case we are specifically dealing with $$log_2(10^6)$$ which is approximately 20 as $$2^{20} \simeq 10^6$$

This means that

1. Using a binary search will take around 20 operations (worst case) to find a number
2. Using a sequential search will take 1 million operations (worst case) to find a number
3. The cost of preparing our numbers to be able to perform binary searches is 20 million operations

Now, lets ask ourselves the question again. Is it worth sorting 1 million numbers if we are going to search 100 times within them? The answer is yes, as our initial cost is 20 million operations, but after that, the cost of a binary search is almost none, while a sequential search will take 1 million operations each time.

## 2-Basic data structures.

The simple data structure is an array, but everyone should be familiar with stacks, queues, linked lists, hash tables, trees, graphs and possibly also sets and heaps.

Furthermore, I expect people to be able to implement any of the structures mentioned above. Sure, you will NEVER implement those in real life, but I think it is an interesting exercise to be able to implement those by yourself. I tend to ask people how to implement a simple Stack, and some people really struggle to do so, I would never ask anybody to implement a full hashtable, but I will certainly expect them to be able to do so if a hash function is provided to them.

Being familiar with data structures is key for being able to solve problems, here comes an example: Imagine you are given all the correct works of the English dictionary, how could you implement a spell checker that will not only inform the user if the work they type is correct, but also provide them suggestions in the case the wrote an incorrect work?

Most people will initially resort to a hashtable, however there is a problem with hashtables: they only help you with exact searches, if you type “helo” there is no way to let the user know that maybe he wanted to type “hello” or “hell”. How would you do that? The solution is a trie, a type of tree where each level contains each of the letter of the alphabet, if a particular node is a word, then it will contain a flag indicating so, again lets illustrate this with a picture.

This is a simple yet interesting example on how a problem that might seem very complex at first could be effectively solved using the right structure. Most people operate with linked lists, sets and hashtables pretty much every day, however trees, graphs and other structures might not be used so often but they are incredibly powerful in certain situations. As a software engineer it is your job to know your tools, and data structures are pretty much tools 101 here.

## 3-Basic data types.

What should you know about floating point numbers and monetary calculations? they are no-no, and for good reasons. Due to the internals of how floating point numbers are stored, they cannot represent fractions accurately, the loss of precision is not too terrible for scientific calculations, but it can be really bad for monetary calculations (or any other calculations where precision is critical). As an example a simple piece of python code (the same problem will occur in java)

Now, the expected output of that calculation should be 1, right? well, not really, the actual result is

So, if that happens after only 1000 operations, imagine what floating point numbers could do to multiplications and divisions. And then imagine what would happen if those multiplications and division involve money (currency conversion, tax calculations, margins…) Again, it is your job as a IT professional to know this and use the proper tools. Most languages provide mechanisms to deal with this problem, do you know them?

## 4-Basic object orientation understanding.

Becoming an expert on object oriented programming takes years, even I struggle sometime to find the right abstractions, it is a never ending process but, when done correctly, object oriented programming can solve many problems, as it allows us to work with abstractions rather than low level details.

Can you clearly explain what is polymorphism? without resorting to an example? could you explain it to your mum? that is the level you should be able to provide in an interview.

Do you fully understand inheritance and composition? Have you realize that inheritance can actually break encapsulation (if done wrong)?

I am surprised, quite often I ask candidates about information hiding and most of them are able to explain the mechanisms that languages like java provide for such technique (private, protected, public…) however when asked “why would you ever want to make something private? can you give me an example?” many people actually struggle, furthermore, many people will simply answer that you make attributes private and then provide getters and setters.

You need to understand how information hiding allows you to keep the internal state of the object separated from the outside world, you should be aware of the consequences of not doing so, and you should be able to clearly and confidently explain that.

This concludes the first part of this story.

Happy coding.

## What I have learned after 12 years as a software developer (III).

This is the final post about my experience as a developer, you might want to check part 1 and part 2.

#### 15-Fear-based management is the lowest level of management.

Nothing surprising here, if a company is managed based on fear, then people will essentially tend to try to hide errors. Sure, development errors can be easily traced back, but do you really want to generate an atmosphere where people will be making it hard?

The reality is that errors will happen, is not a matter of if, it is a matter of when, more important than the errors is the reaction to it.

Once I worked for a company and a bug was discovered, it was not too bad, but I felt that I must inform the owner, an incredible amount of people almost begged me not to do so, just to avoid the rage… It certainly got me thinking? what kind of place was that? Ultimately I communicated the problem anyway and actually the reaction to it was not as bad as people told me.

In short: do not use fear, people will not respect you, they will only avoid you, a leader will be followed to the battle field if required, a tyrant will not.

#### 16-A single branching strategy is fundamental.

It does not matter what control version system you use, git, mercurial, svn… but whatever you do, you must have a clear and standard branching strategy, I have personally found that gitflow works very well, but I recognize that it might be too complex for simple organizations.

That said, the days of committing directly to the master branch (or trunk in svn) should be in the past, with continuous delivery one cannot afford the main branch to get polluted with bugs, the extra time required to merge two branches is minimal nowadays.

I have personally experienced the situation of a company not having a single branch strategy, the pain of getting everyone in the same standard and the benefits of it, the larger the company, the harder it gets… but the reward will also be bigger.

A branching strategy should allow you to easily.

• Deploy your code easily and safely at any time.
• Allow you to rollback to a previous version easily and quickly.
• Make hotfixes easy (although hopefully this is something you will rarely do).
• Permit flexibility in development.

My advice is to use a strategy already tested such as gitflow, if you think you do not need a branching strategy, think again, it can prevent a lot of headaches.

#### 17-Compiler warnings must be address immediately.

Not all languages benefit from this, but many do (Java, C, C++, .NET). The compiler is, with the debugger, the best friend of any developer, it saves us from ourselves constantly.

So, if the compiler is giving you a warning, trust me, you need to address it, sure the code most likely will work, but it is a ticking bomb. Also, the fact that you did not write the code generating the warning is no excuse not to fix it.

#### Conclusion.

Most likely, these 17 points will need revision with time, but they truly represent the most relevant lessons I have learned in 12 years.

Happy coding.

## What I have learned after 12 years as a software developer (II).

12 Years as a coder (let it be junior developer, senior, team lead or principal) have taught me a lot, these are the rest of the points that I consider the most important ones, this is the second part of the article, find the first part here.

#### 7-Having a proper build pipeline is fundamental.

There are different ways to do this, and it will certainly depend on the circumstances of each company, but as a bare minimum you should have an integration server where you can have repeatable builds.

Of course this also means that your build process should be as simple as possible, I found it acceptable to have to execute a couple of commands to get a full build (in order to get test coverage and so on), but that should all be clearly encapsulated in your build server.

Goes without saying that if your builds fail with certain frequency then you need to stop whatever you are doing and make them stable, after all, what’s the point of deploying stuff that does not always build correctly.

It is also very advisable to control the build times, I have seen projects that take 45 minutes to compile, no matter what you do, if your build process takes 45 minutes you are not agile (and I do not mind if you do standups or not).

#### 8-Deadlines are just wrong.

To a certain extent I understand that the business wants to get an idea of when something is going to be done (even when that “something” is not clearly defined) and I think that having a certain horizon can help to keep a bit of tension and motivation, but having deadlines is, in the vast majority of cases, simply wrong.

Lets face it, we have been doing software engineering for less than 40-50 years, we are still learning, and believe it or not, the vast majority of software projects are delayed, getting obsessed with deadlines will only lead to poor quality code that will then lead to bugs and that will cost the company.

Do not get me wrong, I am not saying that timing is not important, of course it is, what I am saying is that it should not drive the company behavior. There are interesting alternatives to plain time estimation, such as story points, t-shirt sizes, poker planning… They do work incredibly well, but It will take time until your team gets good at it.

#### 9-Big bang deployments do not work, stop doing them.

It is 2018 and I still have conversations with developers telling me that in their own companies there is a deployment to production every month (or even every 3 months), that is simply a recipe for chaos.

It is much simpler, to deploy small and deploy often than making big deployments, and for good reasons: first, any process that is done often will get automated quickly, second if you make mistakes you will have a much quicker feedback cycle to improve it and third the business will be actually happier as the mean time to deploy a feature will be greatly reduced.

Saying that you do not have the capacity to automate deployments so they are more often is like saying you cannot go and grab a fire extinguisher to put off a fire because you are too busy using little glasses of water instead.

In short: automate deployments, make them common, small and non-scary, everyone will be happier.

#### 10-Developers need to understand the business.

This seems obvious but unfortunately it is quite common to work in many places and when a dev is asked “why are you doing this?” it might take a few questions until you actually get a satisfactory answer, example:

-Why are you adding a table?
-So we can store product types?
-And why do we need to store product types?
-Because we want to display them in a dashboard?
-And why do we want to display products in a dashboard by type?
-So the business can decide which type needs more marketing

There we go!! this is surprisingly common, it is not due to lack of interest or lack of communication, it is just that we developers sometimes get too focus on the technical part, but I have found it incredibly beneficial for me to understand WHY things were being done and what was the ultimate goal of the feature I was building.

This is specially important in startups where things are a bit crazy (or fast-paced, whatever term you prefer), in those environments there is always a fight between developers wanting to rewrite part of the systems and the business desperately trying to close the next financing round.

#### 11-Everything else being equal, consistency wins.

We all have, as developers, our own preferences, some of them can quickly turn into religious wars (for example, where should we place curly braces? in the same line or in the next line?).

I am a strong advocate of consistency, more often than not I have actually agreed to follow certain naming conventions that I might not share, but that were well established, of course if you really see something that is wrong, by all means change it, but there are certain areas that are more preference than science, as an example, methods names:

• User getUserById(int userId)
• User getById(int userId)
• User findById(int userId)
• User loadById(int userId)

Which method is the best one? what is the best naming convention? I do have my preference, but if everyone else has already agreed to use, say “loadById”, then why should I start changing the names for other entities? what value am I adding? apart from satisfying my own preferences, am I making the code any better?

Coding software is, largely, a team activity, and there will be clashes, and there will be disagreements, be ready to bend some times and be ready to accept certain things, choose what is really worth a fight and what is not.

#### 12-It is much easier to code things correctly from the beginning.

Ok, we have all cut corners sometimes, but honestly, looking back now, was it worth it? there are scenarios in which it is ok to do so, but to learn which scenarios are is something that takes quite a few years.

The reality is that, it will be YOU the one who will have to deal with the crappy code you are writing today to satisfy the business people, so when required, put up a fight. Know which battles to choose, as it is not always easy to win them, and sometimes put up a fight, even if you know you cannot win it, it sends a message.

But to be fair, in many cases the fault is not on the business, it is on us, after all, it is the job of the business to keep the money coming and it is YOUR job to deliver high quality code, this means writing tests constantly and as you are developing your features, not at the end “if you have time”, tests are part of the code I expect developers to deliver, plus they have saved my ass so many times that I have learn (sometimes the hard way) to write them and give them the importance they deserve.

Also, every time you write bad code, you are sending a message to the next developer that it is fine to do so, this is closely related to the theory of the broken window which essentially establishes that if your code has dirty areas, the rest of the code will not be respected and will get dirty too.

#### 13-It is incredibly inefficient to use the same technology for everything.

So, yes, believe it or not there is no silver bullet. This is one of those recurrent mistakes I see in developers specially, I have heard so many times people saying “I am a Java developer” (or php, or ruby, or python… whatever you prefer), it is a terrible mistake, and it is as absurd as saying “I am a Toyota taxi driver”.

Why do we get obsessed with a single programming language? sure I love java, but if you want to build a single page with a form php will do better, or if you need to analyze a csv file python or R will be fantastic.

The only thing a developer should never stop doing is learning, there are an incredibly large number of technologies out there, sure you can not learn all of them, but before you start writing a complex data structure in java to handle tables, you might want to check pandas or R before.

Being a developer is not so different from being a handy man, you need to to know your tools and, from time to time, TRY other tools, as they can help you to succeed, at the end, it does not matter which technology you used, as long as the code works and can be maintained, do not be afraid of going into that other programming language that has such a bad name at your office, you would be surprised.

#### 14-When (not if) an error occurs, do a post-mortem and learn from it.

There is only one thing worse than introducing a terrible bug into a production system, and that is not learning from it.

If you have been working in the software industry for a few years, I am fairly sure you have made mistakes, some of them might be not too bad (maybe some test emails were send to the users) some others are more serious (pretty much anything related with payments). In every single case, it is fundamental to have a post-mortem so the situation can be analyzed, determine what was done well, what was done wrong and what could have been done better.

Post-mortems are one of the most effective ways of learning from our mistakes, hopefully it is not a process we do a lot, but it will happen. This is specially important for junior developers: you WILL make mistakes, but chances are it is not as terrible as you think, but if you do not learn from those mistakes and you do not improve, then we have a problem, do not be scared of acknowledging your errors in most of companies actually it is seen as a correct aptitude (and if you are in a company where there is a witch hunt after each mistake, you might want to look for another job anyway).

## What I have learned after 12 years as a software developer (I).

I started coding professionally at 2006, at that time I was just fresh out of uni and certainly I had a lot to learn ahead of me. Over the years I have accumulated quite a lot of knowledge, part of it by simply keep myself studying and part of it out of pure experience.

Please be aware that this reflects only my own experiences and may or may not reflect other people’s experiences, we developers tend to have quite different opinions when it comes to code, that said, here is what I have learned.

#### 1-Most of the things will not be needed, apply the Yagni principle.

The most common, by far, mistake that I see is developers trying to account for stuff that simply is not required, it does not cease to amaze me how often this happens, time after time I see complex code and architecture in place that is there because “we might need flexibility in the future”, and what happens? that future rarely shows itself.

There are may examples of this, some of them are relatively benign like declaring an interface that has a single implementation because “we might have other implementations in the future” I see this quite often with DAO objects, the reality is that I rarely see more than one implementation of a given DAO interface, and even if that happens, modern IDEs have refactoring utilities that allow to extract an interface.

Worse than that, I have seen code became very complicated to account for stuff that was not even asked for, once I had to deal with a monstruosity in java whose only job was to read a csv file, the developer (possibly with the best faith) prepare it so it could read XML, json and a number of other formats, unfortunately in the process of doing so, he got the code quite convoluted (no, it was not a simple polymorphism), the final project ended up with many dependencies, and the worst part is that it was a memory consumption beast… of course the code has never processed anything other than csv files, so what was the value of it?

#### 2-Complex architectures are evil, evil I say!

Of course there are exceptions, in some scenarios one might need to have a complex or non-trivial architecture to solve a complex problem, but in the vast majority of cases there is a tendency to overachitect things, specially in the java world.

Most of the software that most of the developers write and maintain is quite boring, most of the cases just webpages that read and write from a datasource (a database, a webservice …), do we really need a complex architecture for that? in most of the cases what will occur is that the junior developers not familiar with the architecture will start to cut corners in order to get stuff done, at the end of it, the code tends to evolve into a big ball of mud.

Don’t get me wrong, I like when I am faced with a problem that requires a non-trivial solution, I love to design big architectures (we all do, I think), but more often than not I have seen complex architectures which mainly just satisfy the ego of the developer(s) who came out with the idea. All being equal, the simple the better.

#### 3-If you want to generate chaos, do not use a ticket system.

As a golden rule, nothing, absolutely nothing should EVER be done without a ticket describing what is being done and why, furthermore, in my current job that is simply not possible, you need a ticket to be able to create a branch.

This rule is specially important if the ticket is very urgent or it is a blocker, precisely because of that, one needs to follow the procedure, the reason being is that the procedure is design when people can have a cold mind, so when the shit hits the fan, you better follow what your past (and calm) you decided to do.

Also, it allows to have control over changes and bring some sort of discipline to developers and non-technical people.

#### 4-Not retaining good developers.

This is an important one.

Finding good developers is not easy, first they are expensive, second, they last very little in the market as companies will hire them quite quickly.

It is not always possible to retain every good developer, after all we live in a free market world and people change companies for many reasons, however there is a big difference between the natural turn over and simply not managing to keep people happy.

Every single time a good developer leaves, two things happens: first the company loses a lot of knowledge that will go away with that developer, and second morale takes a hit, work mates talk mainly… well about work, when a good developer leave for non-natural causes (maybe he/she is moving to a new city, maybe they are about to start their own business), the conversations for the few weeks while such developer will remain in the company will be mainly about how happy he/she is of leaving and how everyone else should do the same.

Notice that to keep people happy there are three main pillars, and they are all equially important

1. Salary.
2. Work environment (relations with other devs, managers…etc)
3. Career development (how interesting are the projects, how the future looks like…etc)

It is not always possible to keep all of those three pillars up, but at the very least two of them should ALWAYS be solid.

#### 5-Lack of system monitoring.

Software is complex, too complex, on top of that, we have been doing software engineering for maybe about 50 years, so we are all still learning (I am sure that 100 years for now, people will consider what we do as primitive as we consider now the car production systems in 1920).

Not monitoring is simply too expensive, once things stop working in production is already too late, I have seen a bit of everything, from companies with fantastic monitoring to companies with no monitoring and everything in between.

Once I saw a company’s website go down simply because they have run out of disk space which is pretty much the equivalent of not being able to serve food at a restaurant because someone forgot to buy it.

Along with lack of monitoring comes lack of alerts, a very common mistake I have seen way too often are systems constantly triggering red alerts and people became so used to false positives that they are simply ignored… until one false positive turns out to be a true positive, having a good set of reliable alerts is key to keep a healthy system, if there are false positives then certainly more work has to be done and better alerts need to be defined, ignoring the problem is not going to help.

#### 6-Ignoring technical debt.

Technical debt is pretty much like financial debt: It is not always bad, sometimes it is perfectly fine to ask for a credit so the company can perform other investments, but leave debt unpaid and the compound interest will kill you.

There are many tools out there to control technical debt (I have been using sonar for quite a while with good results), there is absolutely no excuse to leave technical debt unpaid, it is your responsibility as a developer to push for it, because the non-technical people will never, ever ask you to pay the technical debt, you are the expert and you are paid for it, so put up a fight, sometimes you even have to put up a fight knowing you will lose it, but still it is worth to make a point (I have found however that asking people to send me an email saying they will assume any responsibility if something goes wrong, makes miracles).

Along with ignoring technical debt, there is the bad practice of not doing code reviews, every code to be merged must go through a pull request, with the possible exception of hotfixes (which, by the way you should rarely do anyway). I personally like the code reviews and consider them a safety net, it has not been uncommon for me to find myself saved by another coworker after reviewing my code.

#### Conclusion.

This is just the first part of the post, but really hope someone finds this useful as it really represents years of my own experience.

Happy coding.

## ReLU, sigmoid and tanh, how activation functions affect your machine learning algorithms.

If you have been working with neural networks for a bit, you already know that we need to use activation functions in the hidden layers (well, and also in the output layer), in order to achieve non-linearity.

However, I really enjoy understanding WHY should we use some activation functions instead of others, furthermore, I like to know how different activation functions affect a given model.

In this post I will focus on classification problems, more specifically I will just consider binary classification problems, lets dive in

### Binary classification outputs.

Say you have a neural network that will classify elements into two different categories, for example given an image, it could determine if it is a cat or a dog, in this case our output will be either 1 (cat) or 0 (dog). That’s where the sigmoid function comes in handy.

The formula for the sigmoid function is

$$sigmoid(x) = \frac{1}{1+e^{-x}}\\$$

And the way it looks if we plot it is this

What makes the sigmoid function good for classification problems is that it outputs a value between 0 and 1 that changes in an uniform manner, this makes the sigmoid a great function as an output for a classification model, we can then simply perform predictions such as

$$\hat{y} = 1 \text{ if } sigmoid(x) >= 0.5; \hat{y} = 0 \text{ if } sigmoid(x) < 0.5\\$$

So, why not use the sigmoid also as an activation function in the hidden layers of a neural network? that takes us to the next stage.

### Backward propagation and weight updates.

Ultimately the way we update our weights and biases is this

$$W = W – \alpha \frac{\partial{Cost}}{\partial{W}} \\ b = b – \alpha \frac{\partial{Cost}}{\partial{b}} \\$$

This of course assumes that $$\alpha$$ represents the learning rate. What is important to notice is that if our derivatives are too small, then our updates to $$W$$ and $$b$$ will also be small. The derivative of the sigmoid function is

$$\frac{e^z – e^{-z}}{e^z + e^{-z}} * (1 – \frac{e^z – e^{-z}}{e^z + e^{-z}}) \\$$

which turns out to be simply

$$sigmoid(x) * (1-sigmoid(x)) \\$$

if we plot it, this is what we get.

Here we have our first problem: the maximum value we will ever get is 0.25, already quite low, but things get much worse as our $$x$$ gets away from 0, as then the derivatives get smaller and smaller. This ultimately means our updates to the $$W$$ and $$b$$ will be also small, thus making the learning process slow.

The solution? Using a different function.

### tanh function to the rescue.

Instead of the sigmoid, we will have a look at the hyperbolic tangent, or simply $$tanh(x)$$ which is defined as

$$tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}} \\$$

when plotted, looks like this

This one gives us an output between -1 and 1, but more interestingly, the derivative value is

$$1 – (\frac{e^x – e^{-x}}{e^x + e^{-x}})^ 2 \\$$

or in simpler terms

$$1 – tanh^2(x) \\$$

if we plot it, we get

Pay close attention to the vertical axis! unlike the derivative of the sigmoid function, in this one we reach up to a value of 1, this is so much better than the maximum of 0.25 we got with the sigmoid. This means that we will be updating our $$W$$ and $$b$$ values at a much quicker pace.

### Simpler is better: the ReLU function.

But we still have another interesting candidate: the ReLU function. ReLU stands for Rectified Linear Unit, and it is defined simply as

$$relu(x) = max(0, x) \\$$

you can also define it as

$$relu(x) \begin{cases} 0 \text{ if } x <0 \\ x \text{ if } x >=0 \end{cases} \\$$

In any case, the relu function looks like this.

What I like of it is the simplicity, I have not done calculus in a while, but I still remember that

$$f(x)=x^a \\ f'(x) = a*x^{a-1} \\$$

This means that the derivative of a ReLU function is quite large, lets plot it.

Notice that I have not plotted the value when $$x<0$$ in that case, the value of the derivative will be $$0$$, however in the rest of the cases, we have quite a large derivative value.

### So what?

How does this all affect the learning then? remember that our equations for updating $$W$$ and $$b$$ are

$$W = W – \alpha \frac{\partial{Cost}}{\partial{W}} \\ b = b – \alpha \frac{\partial{Cost}}{\partial{b}} \\$$

And also keep in mind that the derivatives of the functions are different, in particular, the derivative of the sigmoid is quite small compared to the other two functions.

To see how this will impact the learning, I wrote a python notebook, which you can check at my kaggle account, there I used the well known MNIST dataset, but only to classify digits 0 and 1, I run a simple neural network with 256 hidden units, and using the different activation functions mentioned here. The results are pretty obvious:

Notice that in all the cases, the learning rate and the number of hidden units was the same, also, the initial values of $$W$$ and $$b$$ were also the same.

It is fairly impressive how the tanh and ReLU functions are much better candidates as activation functions in the hidden layers.

In conclusion:

1. The sigmoid function is the function you should use as the output function for classification problems, as the value range $$[0, 1]$$ matches exactly what a binary classification problem needs.
2. The sigmoid function will also work as an activation function for the hidden layers, but it will not be as quick.
3. The tanh function is pretty much always better to use in hidden layer than the sigmoid function.
4. The relu function is also a good candidate for hidden layers as an activation function.

Happy coding.

## How to fail a technical interview

As a developer, part of my job consists on interviewing new candidates, I have been doing so for about 7 years now. It does not cease to surprise me how many candidates make the same mistakes over an over in interviews, so I decided to share my experience.

Please understand that this reflects only my experiences, it is not necessarily representative of how things work generally in the software industry, also understand that I work in Sydney, it would not surprise me if there are some minor differences in other countries, or even in other cities within Australia.

Also, I would like to note something: whenever I interview a candidate, I am on his/her side, I just want that person to be the good developer we can hire, for two reasons: it is exactly what the company needs and I can go back to coding, which is what I love.

Lets begin

#### Do not prepare the phone screening.

The very first contact you will have with your potential new employer will be most likely a phone screening interview, I have to remark that normally those are relatively easy interviews, after all, you can only verify so much in a phone conversation, however there are a couple of things to prepare.

You should feel comfortable with the basics of the technology you are interviewing for, as an example, if you are interviewing for a Java related position, you should feel comfortable with basic data structures such as List, Set and Map. I will also expect people to be able to explain the usages of equals() and hashCode().

Additionally basic data structures are a must: Arrays, Linked lists, Hashtables, Stacks, Trees and Graphs are things I expect any candidate to understand well enough to explain how to implement them, I think it is also reasonable to be very familiar with big-oh notation.

Basic algorithms are the next must-have: I do not expect people to implement me a quick-sort, but I do expect them to know how it works, at a high level, I also expect people to be familiar with its running time, sames goes for Tree and Graph transversal, basic recursion and so on. If you do not feel familiar with all of those terms, just review them at home, it should not take more than a couple of hours.

#### Do not expect any coding.

So you are interviewing for a developer position and you are surprised you are asked to code during the interview? Why are people scared when they are asked to write code on a white board?

The way I see it, there are two things to be aware here: First, be wary of any company hiring developers without asking them to develop during the interview, this is as absurd as hiring a cook without seeing him cooking first. Second, we normally do not code on paper/white boards, so practice at home, if you think it will be “easy”, think again, as a small exercise, try to code a binary search on paper and feel the pain of lack of auto completion. The good news is, once you have coded a couple of algorithms in paper, coding them in a white board becomes simple enough.

#### When stuck, just stare at the white board, remain silent.

So, here is what is going to happen during a coding interview: you will get stuck, now the question is, how do you react? I have seen far too many candidates just staring at the white board for 5-10 minutes, saying nothing. The problem with this approach is that you are just wasting time that you might be using to explain what is going on in your head.

Sure, sometimes you need a few minutes to organize the problem in your head, but at some point you need to provide feedback to the people interviewing you, maybe you have just a brute force solution, or maybe you have only a partial solution (that does not work in some cases), well, say it, explain the problem, because if you don’t, all I know is that you are staring at a white board.

Showing how you react when faced with a complex problem is important, even if you do not manage to solve the coding problem you still have a chance if you can show what is going on your head, also it allows the interviewer to work with you, maybe hint you and so on.

#### Do not prepare questions for the interviewer.

Almost every interview that I have conducted has always finished with “well, we have asked you a lot of questions, do you have any questions for us?”, now this is a great opportunity for you to know how the company really works.

I got asked a lot of questions like “what’s the tech stack?” while there is nothing wrong with that question, and I am more than happy to answer it, it does not really provide the candidate any extra knowledge, in most of the cases the tech stack can be derived from the requisites of the job description (I mean, if they ask for java, spring and hibernate, I do not think they will be coding in python, will they?).

There are other questions that, while useful, are too generic, for example: do you write tests? almost 100% of the companies will answer yes, but we all know that it does not mean that they REALLY write tests, here are some examples of questions that I would ask and why.

• What is your current test coverage? In order to answer this question, the company you interview for must not only write tests but also measure their coverage, there are plenty of tools to do so, if they are serious about testing, they should be able to provide a good approximation without much hesitation.
• When was the last time you wrote a unit test? Normally that should have happened within a week (assuming of course the interviewer spends more than 50% of his/her time coding.
• Which kind of machines do developers get? This is absolutely fundamental, I have rejected jobs because everybody had to work under windows, and although there is nothing wrong with windows as a desktop system, I really prefer Linux. Also this includes which IDEs developers user and whether or not they are allow to use their preferred one, after all as a coder you will spend most of your time in your computer with your IDE, so it is worth to know.
• What is your branching strategy?: Again, fundamental, you want to know if they have a solid pipeline or if it is something like “oh, we all code against the main branch”.
• How often do you deploy? and follow up question, when was the last time you applied a rollback? when was the last time you applied a hotfix? This will give you an idea of how serious a company is about CI/CD, if you ask them they will always say they do CI/CD, well, if that is the case they should be able to anser these questions easily.
• What is the latest you have ever left the office? follow up by, what is the latest you have left the office this week? As you can imagine, this is related to overtime, I have never interview for a company where they admit to do overtime over a regular basis, however I have seen that more often than not. By asking these questions you are being very explicit, the answer has to be a concrete number, not some fuzzy constructions.
• If you could change one thing in your architecture, what would it be? I love this question, because it actually allows you to peek into the current problems of the company, lets face it, EVERY company has areas where they can improve, and in fact, if they are hiring is because the need help, this question is a good way of forcing the interviewer to give you a quick tour of the technical mistakes the company might have done. Also, I think it shows that the candidate is interested on improving and learning from mistakes.

This is all, as I mentioned at the beginning of the post, this reflects only my personal experiences, and I share it with the hope of it being useful to other people, nothing more.

Happy coding.

## Machine learning basics: the cost function

Machine learning is ultimately a way to make a program perform a task and to get that task done better over time. Cost functions define how good or bad a program is at performing such task, pretty much every problem consist on getting the value of the cost function to be as small as possible.

For our example, we will use a very simple dataset which consist on two variables: Car speed and distance to stop, our ultimate goal will be to, given a speed we have never seen before, predict what will be the distance to stop.

Lets define some common vocabulary:

• $$X$$ : These will be the observations, in our case it will represent the car speed.
• $$y$$: The correct answers to our observations, in this case the distance.
• $$\hat{y}$$ : Our own predictions given an $$X$$

Notice that all of the values above are actually vectors, or if you prefer, lists (possibly a more friendly term for a developer), this means that each of them can be accessed by indexes, such as

$$y_i$$

This takes us to define another element

• $$n$$: The total number of observations, in this case this means how many elements we have in $$X$$ and $$y$$

Now, this is the data we are going to work with

 Speed (X) Distance (y) 4 2 7 4 8 16 9 10 10 18 11 17 12 14 13 26 14 26 15 20 16 32 17 32 18 42 19 36 20 32 22 66 23 54 24 70 25 85

We have a total of 19 observations here, now lets plot them

With all this in our hands, we can start defining our cost function. A good intuition would be to say that our cost function is simply the difference between our predictions and the actual value.

For example at speed $$15$$km/h we need $$20$$ meters to stop. Imagine that we have

• $$ModelA$$ that predicts that $$25$$ meters are needed, the error would be $$25 – 20 = 5$$.
• $$ModelB$$ that predicts that we need $$21$$ meters to stop then the error would be $$22 – 20 = 2$$ which is already a smaller error than the previous.
• $$ModelC$$ that predicts that we need $$19$$ meters to stop? then the error would be $$19 – 20 = -1$$ this is a bit weird, as we want to make our error be close to 0, not to be negative. The solution for that would be to use a squared error instead, so that way the result would always be positive, lets recalculate the errors using squares.
• $$ModelA = (25 – 20)^2 = 25$$
• $$ModelB = (22-20) ^ 2 = 4$$
• $$ModelC = (19-20) ^ 2 = 1$$

More generally we can simply say $$Error = (\hat{y_i} – y_i)^2$$

With this we can quickly conclude that the best model is the one with the smallest value for the cost function, in this case, that would be $$ModelC$$.

The next step is to apply this to every point in the problem, our model should be able to predict what is the distance required to stop for any given speed, and we should be able to calculate the error of such prediction. The solution? apply exactly the same logic but to the whole set of data. As we mentioned before, distance and speed are both vectors, so we can simply do

$$Error = (\hat{y_1} – y_1)^2 + (\hat{y_2} – y_2)^2 … + (\hat{y_n} – y_n)^2$$

Or if we want to use a better mathematical term

$$\begin{equation*} Error = \sum_{n=1}^n (\hat{y}_n – y_n)^2 \end{equation*}$$

Do not let the math intimidate you, the term $$\sum_{n=1}^n$$ is just a loop over the elements of vectors.

We cannot simply keep adding the terms, think about it, this means that if we have a dataset with a lot of observations, our error will grow as we have more observations, the solution for that is to use the mean error instead, so lets add that to our formula.

$$\begin{equation*} Error = \frac{1}{n} \sum_{n=1}^n (\hat{y}_n – y_n)^2 \end{equation*}$$

So now, what we have is the mean of all the squared errors, this function is surprisingly called “Mean Squared Error” or simply $$MAE$$ and it will be an important concept for the rest of this post

$$\begin{equation*} MSE = \frac{1}{n} \sum_{n=1}^n (\hat{y}_n – y_n)^2 \end{equation*}$$

### Making predictions

Now, we have been talking a lot about $$\hat{y}$$ but how can calculate it? in linear regression this is done by applying a simple formula

$$\hat{y_i} = wX_i + b$$

if we want to generalize it we can simply say

$$\hat{y} = wX + b$$

This introduces 2 new values $$w$$ and $$b$$

• $$w$$ : Represents the weight that we need to calculate, this is the value by which we will multiply $$X$$.
• $$b$$: Represents the bias, we will simply add this term, and we will NOT relate it to $$X$$

An example will make this more clear, lets say $$w=-1, b=10$$

This shows a terrible prediction, our red line (that is, our model) does not align at all with our actual observations. The interesting part here is to quantify how bad it is, in order to do so, lets just have a look at the first $$5$$ datapoints so we can do all calculations by hand.

 Speed Distance predict 4 2 6 7 4 3 8 16 2 9 10 1 10 18 0

We will take the data points where $$i=1$$ that is, the first row. So

$$X_1=4; y_1=2; \hat{y_1}=6$$ so $$Error_1 = (\hat{y_1} – y_1)^2 = 16$$

If we apply $$MSE = \frac{1}{n} \sum_{n=1}^n (\hat{y}_n – y_n)^2$$ we get $$MSE = 1/5 + (6-2)^2 + (3-4) ^2 + (2-16)^2 + (1-10) + (0-18)^2 = 123.6$$

Now, lets consider another model where $$w=3; b=-12$$, then we get this

It is already obvious that this model is much better at predicting the distance, however the question is how much better? again the answer lies in $$MAE$$, the values are

 Speed Distance predict 4 2 0 7 4 9 8 16 12 9 10 15 10 18 18

So we can again calculate $$MSE = 1/5 + (0-2)^2 + (9-4) ^2 + (12-16)^2 + (15-12) + (18-18)^2 = 10.8$$

This gives us a critical information, not only we can figure out which model is better, we can also quantify how much better the model is, and that becomes very important, imagine for example how relevant this could be for autonomous driving.

### Cost functions for other problems.

$$MSE$$ is a good cost function, but it only helps us for regression problems, that means problems where our ouput is a number, for example predicting how warm a day would be based on some variables or predicting what will be the value of a security in the stock market.

However there are many problems where we want to classify values, an example would be to know whether or not a car can stop completely or if it would have an accident, in this scenario $$MSE$$ does not help us, we need another cost function.

#### The logistic function.

For binary classification problems where our output can only take two possible values, we want to use this little function $$logistic = \frac{1}{1+e^{-\hat{y}}}$$ It does not look very intuitive, but if we actually plot it, we get.

The interesting thing about this function is that it takes values between 0 and 1, so we can apply a similar measure to the error by simply comparing $$y$$ with our $$\hat{y}$$ which will take values from 0 to 1 while $$y$$ will either be 0 or 1.

### Conclusion

Cost functions are at the core of understanding machine learning as they ultimately provide a measure of success for a given model, they are also at the center of fundamental algorithms such as gradient descent.

It really helped me in the early day to calculate some of the functions by hand to fully understand their meaning.

There are many other cost functions that one needs to be aware, but these two are the core ones to start with. I strongly recommend going through a couple of examples with $$MSE$$.

Happy coding.