What I have learned after 12 years as a software developer (III).

This is the final post about my experience as a developer, you might want to check part 1 and part 2.

15-Fear-based management is the lowest level of management.

Nothing surprising here, if a company is managed based on fear, then people will essentially tend to try to hide errors. Sure, development errors can be easily traced back, but do you really want to generate an atmosphere where people will be making it hard?

The reality is that errors will happen, is not a matter of if, it is a matter of when, more important than the errors is the reaction to it.

Once I worked for a company and a bug was discovered, it was not too bad, but I felt that I must inform the owner, an incredible amount of people almost begged me not to do so, just to avoid the rage… It certainly got me thinking? what kind of place was that? Ultimately I communicated the problem anyway and actually the reaction to it was not as bad as people told me.

In short: do not use fear, people will not respect you, they will only avoid you, a leader will be followed to the battle field if required, a tyrant will not.

16-A single branching strategy is fundamental.

It does not matter what control version system you use, git, mercurial, svn… but whatever you do, you must have a clear and standard branching strategy, I have personally found that gitflow works very well, but I recognize that it might be too complex for simple organizations.

That said, the days of committing directly to the master branch (or trunk in svn) should be in the past, with continuous delivery one cannot afford the main branch to get polluted with bugs, the extra time required to merge two branches is minimal nowadays.

I have personally experienced the situation of a company not having a single branch strategy, the pain of getting everyone in the same standard and the benefits of it, the larger the company, the harder it gets… but the reward will also be bigger.

A branching strategy should allow you to easily.

  • Deploy your code easily and safely at any time.
  • Allow you to rollback to a previous version easily and quickly.
  • Make hotfixes easy (although hopefully this is something you will rarely do).
  • Permit flexibility in development.

My advice is to use a strategy already tested such as gitflow, if you think you do not need a branching strategy, think again, it can prevent a lot of headaches.

17-Compiler warnings must be address immediately.

Not all languages benefit from this, but many do (Java, C, C++, .NET). The compiler is, with the debugger, the best friend of any developer, it saves us from ourselves constantly.

So, if the compiler is giving you a warning, trust me, you need to address it, sure the code most likely will work, but it is a ticking bomb. Also, the fact that you did not write the code generating the warning is no excuse not to fix it.


Most likely, these 17 points will need revision with time, but they truly represent the most relevant lessons I have learned in 12 years.

Happy coding.

What I have learned after 12 years as a software developer (II).

12 Years as a coder (let it be junior developer, senior, team lead or principal) have taught me a lot, these are the rest of the points that I consider the most important ones, this is the second part of the article, find the first part here.

7-Having a proper build pipeline is fundamental.

There are different ways to do this, and it will certainly depend on the circumstances of each company, but as a bare minimum you should have an integration server where you can have repeatable builds.

Of course this also means that your build process should be as simple as possible, I found it acceptable to have to execute a couple of commands to get a full build (in order to get test coverage and so on), but that should all be clearly encapsulated in your build server.

Goes without saying that if your builds fail with certain frequency then you need to stop whatever you are doing and make them stable, after all, what’s the point of deploying stuff that does not always build correctly.

It is also very advisable to control the build times, I have seen projects that take 45 minutes to compile, no matter what you do, if your build process takes 45 minutes you are not agile (and I do not mind if you do standups or not).

8-Deadlines are just wrong.

To a certain extent I understand that the business wants to get an idea of when something is going to be done (even when that “something” is not clearly defined) and I think that having a certain horizon can help to keep a bit of tension and motivation, but having deadlines is, in the vast majority of cases, simply wrong.

Lets face it, we have been doing software engineering for less than 40-50 years, we are still learning, and believe it or not, the vast majority of software projects are delayed, getting obsessed with deadlines will only lead to poor quality code that will then lead to bugs and that will cost the company.

Do not get me wrong, I am not saying that timing is not important, of course it is, what I am saying is that it should not drive the company behavior. There are interesting alternatives to plain time estimation, such as story points, t-shirt sizes, poker planning… They do work incredibly well, but It will take time until your team gets good at it.

9-Big bang deployments do not work, stop doing them.

It is 2018 and I still have conversations with developers telling me that in their own companies there is a deployment to production every month (or even every 3 months), that is simply a recipe for chaos.

It is much simpler, to deploy small and deploy often than making big deployments, and for good reasons: first, any process that is done often will get automated quickly, second if you make mistakes you will have a much quicker feedback cycle to improve it and third the business will be actually happier as the mean time to deploy a feature will be greatly reduced.

Saying that you do not have the capacity to automate deployments so they are more often is like saying you cannot go and grab a fire extinguisher to put off a fire because you are too busy using little glasses of water instead.

In short: automate deployments, make them common, small and non-scary, everyone will be happier.

10-Developers need to understand the business.

This seems obvious but unfortunately it is quite common to work in many places and when a dev is asked “why are you doing this?” it might take a few questions until you actually get a satisfactory answer, example:

-Why are you adding a table?
-So we can store product types?
-And why do we need to store product types?
-Because we want to display them in a dashboard?
-And why do we want to display products in a dashboard by type?
-So the business can decide which type needs more marketing

There we go!! this is surprisingly common, it is not due to lack of interest or lack of communication, it is just that we developers sometimes get too focus on the technical part, but I have found it incredibly beneficial for me to understand WHY things were being done and what was the ultimate goal of the feature I was building.

This is specially important in startups where things are a bit crazy (or fast-paced, whatever term you prefer), in those environments there is always a fight between developers wanting to rewrite part of the systems and the business desperately trying to close the next financing round.

11-Everything else being equal, consistency wins.

We all have, as developers, our own preferences, some of them can quickly turn into religious wars (for example, where should we place curly braces? in the same line or in the next line?).

I am a strong advocate of consistency, more often than not I have actually agreed to follow certain naming conventions that I might not share, but that were well established, of course if you really see something that is wrong, by all means change it, but there are certain areas that are more preference than science, as an example, methods names:

  • User getUserById(int userId)
  • User getById(int userId)
  • User findById(int userId)
  • User loadById(int userId)

Which method is the best one? what is the best naming convention? I do have my preference, but if everyone else has already agreed to use, say “loadById”, then why should I start changing the names for other entities? what value am I adding? apart from satisfying my own preferences, am I making the code any better?

Coding software is, largely, a team activity, and there will be clashes, and there will be disagreements, be ready to bend some times and be ready to accept certain things, choose what is really worth a fight and what is not.

12-It is much easier to code things correctly from the beginning.

Ok, we have all cut corners sometimes, but honestly, looking back now, was it worth it? there are scenarios in which it is ok to do so, but to learn which scenarios are is something that takes quite a few years.

The reality is that, it will be YOU the one who will have to deal with the crappy code you are writing today to satisfy the business people, so when required, put up a fight. Know which battles to choose, as it is not always easy to win them, and sometimes put up a fight, even if you know you cannot win it, it sends a message.

But to be fair, in many cases the fault is not on the business, it is on us, after all, it is the job of the business to keep the money coming and it is YOUR job to deliver high quality code, this means writing tests constantly and as you are developing your features, not at the end “if you have time”, tests are part of the code I expect developers to deliver, plus they have saved my ass so many times that I have learn (sometimes the hard way) to write them and give them the importance they deserve.

Also, every time you write bad code, you are sending a message to the next developer that it is fine to do so, this is closely related to the theory of the broken window which essentially establishes that if your code has dirty areas, the rest of the code will not be respected and will get dirty too.

13-It is incredibly inefficient to use the same technology for everything.

So, yes, believe it or not there is no silver bullet. This is one of those recurrent mistakes I see in developers specially, I have heard so many times people saying “I am a Java developer” (or php, or ruby, or python… whatever you prefer), it is a terrible mistake, and it is as absurd as saying “I am a Toyota taxi driver”.

Why do we get obsessed with a single programming language? sure I love java, but if you want to build a single page with a form php will do better, or if you need to analyze a csv file python or R will be fantastic.

The only thing a developer should never stop doing is learning, there are an incredibly large number of technologies out there, sure you can not learn all of them, but before you start writing a complex data structure in java to handle tables, you might want to check pandas or R before.

Being a developer is not so different from being a handy man, you need to to know your tools and, from time to time, TRY other tools, as they can help you to succeed, at the end, it does not matter which technology you used, as long as the code works and can be maintained, do not be afraid of going into that other programming language that has such a bad name at your office, you would be surprised.

14-When (not if) an error occurs, do a post-mortem and learn from it.

There is only one thing worse than introducing a terrible bug into a production system, and that is not learning from it.

If you have been working in the software industry for a few years, I am fairly sure you have made mistakes, some of them might be not too bad (maybe some test emails were send to the users) some others are more serious (pretty much anything related with payments). In every single case, it is fundamental to have a post-mortem so the situation can be analyzed, determine what was done well, what was done wrong and what could have been done better.

Post-mortems are one of the most effective ways of learning from our mistakes, hopefully it is not a process we do a lot, but it will happen. This is specially important for junior developers: you WILL make mistakes, but chances are it is not as terrible as you think, but if you do not learn from those mistakes and you do not improve, then we have a problem, do not be scared of acknowledging your errors in most of companies actually it is seen as a correct aptitude (and if you are in a company where there is a witch hunt after each mistake, you might want to look for another job anyway).


What I have learned after 12 years as a software developer (I).

I started coding professionally at 2006, at that time I was just fresh out of uni and certainly I had a lot to learn ahead of me. Over the years I have accumulated quite a lot of knowledge, part of it by simply keep myself studying and part of it out of pure experience.

Please be aware that this reflects only my own experiences and may or may not reflect other people’s experiences, we developers tend to have quite different opinions when it comes to code, that said, here is what I have learned.

1-Most of the things will not be needed, apply the Yagni principle.

The most common, by far, mistake that I see is developers trying to account for stuff that simply is not required, it does not cease to amaze me how often this happens, time after time I see complex code and architecture in place that is there because “we might need flexibility in the future”, and what happens? that future rarely shows itself.

There are may examples of this, some of them are relatively benign like declaring an interface that has a single implementation because “we might have other implementations in the future” I see this quite often with DAO objects, the reality is that I rarely see more than one implementation of a given DAO interface, and even if that happens, modern IDEs have refactoring utilities that allow to extract an interface.

Worse than that, I have seen code became very complicated to account for stuff that was not even asked for, once I had to deal with a monstruosity in java whose only job was to read a csv file, the developer (possibly with the best faith) prepare it so it could read XML, json and a number of other formats, unfortunately in the process of doing so, he got the code quite convoluted (no, it was not a simple polymorphism), the final project ended up with many dependencies, and the worst part is that it was a memory consumption beast… of course the code has never processed anything other than csv files, so what was the value of it?

2-Complex architectures are evil, evil I say!

Of course there are exceptions, in some scenarios one might need to have a complex or non-trivial architecture to solve a complex problem, but in the vast majority of cases there is a tendency to overachitect things, specially in the java world.

Most of the software that most of the developers write and maintain is quite boring, most of the cases just webpages that read and write from a datasource (a database, a webservice …), do we really need a complex architecture for that? in most of the cases what will occur is that the junior developers not familiar with the architecture will start to cut corners in order to get stuff done, at the end of it, the code tends to evolve into a big ball of mud.

Don’t get me wrong, I like when I am faced with a problem that requires a non-trivial solution, I love to design big architectures (we all do, I think), but more often than not I have seen complex architectures which mainly just satisfy the ego of the developer(s) who came out with the idea. All being equal, the simple the better.

3-If you want to generate chaos, do not use a ticket system.

As a golden rule, nothing, absolutely nothing should EVER be done without a ticket describing what is being done and why, furthermore, in my current job that is simply not possible, you need a ticket to be able to create a branch.

This rule is specially important if the ticket is very urgent or it is a blocker, precisely because of that, one needs to follow the procedure, the reason being is that the procedure is design when people can have a cold mind, so when the shit hits the fan, you better follow what your past (and calm) you decided to do.

Also, it allows to have control over changes and bring some sort of discipline to developers and non-technical people.

4-Not retaining good developers.

This is an important one.

Finding good developers is not easy, first they are expensive, second, they last very little in the market as companies will hire them quite quickly.

It is not always possible to retain every good developer, after all we live in a free market world and people change companies for many reasons, however there is a big difference between the natural turn over and simply not managing to keep people happy.

Every single time a good developer leaves, two things happens: first the company loses a lot of knowledge that will go away with that developer, and second morale takes a hit, work mates talk mainly… well about work, when a good developer leave for non-natural causes (maybe he/she is moving to a new city, maybe they are about to start their own business), the conversations for the few weeks while such developer will remain in the company will be mainly about how happy he/she is of leaving and how everyone else should do the same.

Notice that to keep people happy there are three main pillars, and they are all equially important

  1. Salary.
  2. Work environment (relations with other devs, managers…etc)
  3. Career development (how interesting are the projects, how the future looks like…etc)

It is not always possible to keep all of those three pillars up, but at the very least two of them should ALWAYS be solid.

5-Lack of system monitoring.

Software is complex, too complex, on top of that, we have been doing software engineering for maybe about 50 years, so we are all still learning (I am sure that 100 years for now, people will consider what we do as primitive as we consider now the car production systems in 1920).

Not monitoring is simply too expensive, once things stop working in production is already too late, I have seen a bit of everything, from companies with fantastic monitoring to companies with no monitoring and everything in between.

Once I saw a company’s website go down simply because they have run out of disk space which is pretty much the equivalent of not being able to serve food at a restaurant because someone forgot to buy it.

Along with lack of monitoring comes lack of alerts, a very common mistake I have seen way too often are systems constantly triggering red alerts and people became so used to false positives that they are simply ignored… until one false positive turns out to be a true positive, having a good set of reliable alerts is key to keep a healthy system, if there are false positives then certainly more work has to be done and better alerts need to be defined, ignoring the problem is not going to help.

6-Ignoring technical debt.

Technical debt is pretty much like financial debt: It is not always bad, sometimes it is perfectly fine to ask for a credit so the company can perform other investments, but leave debt unpaid and the compound interest will kill you.

There are many tools out there to control technical debt (I have been using sonar for quite a while with good results), there is absolutely no excuse to leave technical debt unpaid, it is your responsibility as a developer to push for it, because the non-technical people will never, ever ask you to pay the technical debt, you are the expert and you are paid for it, so put up a fight, sometimes you even have to put up a fight knowing you will lose it, but still it is worth to make a point (I have found however that asking people to send me an email saying they will assume any responsibility if something goes wrong, makes miracles).

Along with ignoring technical debt, there is the bad practice of not doing code reviews, every code to be merged must go through a pull request, with the possible exception of hotfixes (which, by the way you should rarely do anyway). I personally like the code reviews and consider them a safety net, it has not been uncommon for me to find myself saved by another coworker after reviewing my code.


This is just the first part of the post, but really hope someone finds this useful as it really represents years of my own experience.

Happy coding.

ReLU, sigmoid and tanh, how activation functions affect your machine learning algorithms.

If you have been working with neural networks for a bit, you already know that we need to use activation functions in the hidden layers (well, and also in the output layer), in order to achieve non-linearity.

However, I really enjoy understanding WHY should we use some activation functions instead of others, furthermore, I like to know how different activation functions affect a given model.

In this post I will focus on classification problems, more specifically I will just consider binary classification problems, lets dive in

Binary classification outputs.

Say you have a neural network that will classify elements into two different categories, for example given an image, it could determine if it is a cat or a dog, in this case our output will be either 1 (cat) or 0 (dog). That’s where the sigmoid function comes in handy.

The formula for the sigmoid function is

\(sigmoid(x) = \frac{1}{1+e^{-x}}\\\)

And the way it looks if we plot it is this

What makes the sigmoid function good for classification problems is that it outputs a value between 0 and 1 that changes in an uniform manner, this makes the sigmoid a great function as an output for a classification model, we can then simply perform predictions such as

\(\hat{y} = 1 \text{ if } sigmoid(x) >= 0.5; \hat{y} = 0 \text{ if } sigmoid(x) < 0.5\\\)

So, why not use the sigmoid also as an activation function in the hidden layers of a neural network? that takes us to the next stage.

Backward propagation and weight updates.

Ultimately the way we update our weights and biases is this

W = W – \alpha \frac{\partial{Cost}}{\partial{W}} \\
b = b – \alpha \frac{\partial{Cost}}{\partial{b}} \\

This of course assumes that \(\alpha\) represents the learning rate. What is important to notice is that if our derivatives are too small, then our updates to \(W\) and \(b\) will also be small. The derivative of the sigmoid function is

\( \frac{e^z – e^{-z}}{e^z + e^{-z}} * (1 – \frac{e^z – e^{-z}}{e^z + e^{-z}}) \\\)

which turns out to be simply

\(sigmoid(x) * (1-sigmoid(x)) \\\)

if we plot it, this is what we get.

Here we have our first problem: the maximum value we will ever get is 0.25, already quite low, but things get much worse as our \(x\) gets away from 0, as then the derivatives get smaller and smaller. This ultimately means our updates to the \(W\) and \(b\) will be also small, thus making the learning process slow.

The solution? Using a different function.

tanh function to the rescue.

Instead of the sigmoid, we will have a look at the hyperbolic tangent, or simply \(tanh(x)\) which is defined as

\(tanh(x) = \frac{e^x – e^{-x}}{e^x + e^{-x}} \\\)

when plotted, looks like this

This one gives us an output between -1 and 1, but more interestingly, the derivative value is

\(1 – (\frac{e^x – e^{-x}}{e^x + e^{-x}})^ 2 \\\)

or in simpler terms

\(1 – tanh^2(x) \\\)

if we plot it, we get

Pay close attention to the vertical axis! unlike the derivative of the sigmoid function, in this one we reach up to a value of 1, this is so much better than the maximum of 0.25 we got with the sigmoid. This means that we will be updating our \(W\) and \(b\) values at a much quicker pace.

Simpler is better: the ReLU function.

But we still have another interesting candidate: the ReLU function. ReLU stands for Rectified Linear Unit, and it is defined simply as

\(relu(x) = max(0, x) \\\)

you can also define it as

relu(x) \begin{cases}
0  \text{ if }  x <0 \\
x  \text{ if } x >=0
\end{cases} \\

In any case, the relu function looks like this.

What I like of it is the simplicity, I have not done calculus in a while, but I still remember that

f(x)=x^a \\
f'(x) = a*x^{a-1} \\

This means that the derivative of a ReLU function is quite large, lets plot it.

Notice that I have not plotted the value when \(x<0\) in that case, the value of the derivative will be \(0\), however in the rest of the cases, we have quite a large derivative value.

So what?

How does this all affect the learning then? remember that our equations for updating \(W\) and \(b\) are

W = W – \alpha \frac{\partial{Cost}}{\partial{W}} \\
b = b – \alpha \frac{\partial{Cost}}{\partial{b}} \\

And also keep in mind that the derivatives of the functions are different, in particular, the derivative of the sigmoid is quite small compared to the other two functions.

To see how this will impact the learning, I wrote a python notebook, which you can check at my kaggle account, there I used the well known MNIST dataset, but only to classify digits 0 and 1, I run a simple neural network with 256 hidden units, and using the different activation functions mentioned here. The results are pretty obvious:

Notice that in all the cases, the learning rate and the number of hidden units was the same, also, the initial values of \(W\) and \(b\) were also the same.

It is fairly impressive how the tanh and ReLU functions are much better candidates as activation functions in the hidden layers.

In conclusion:

  1. The sigmoid function is the function you should use as the output function for classification problems, as the value range \([0, 1]\) matches exactly what a binary classification problem needs.
  2. The sigmoid function will also work as an activation function for the hidden layers, but it will not be as quick.
  3. The tanh function is pretty much always better to use in hidden layer than the sigmoid function.
  4. The relu function is also a good candidate for hidden layers as an activation function.

Happy coding.