2015 m. gruodžio 27 d., sekmadienis

Sci-Fi nonsenses

One of the most irritating things when watching Sci-Fi is various mistakes related to real technologies. Here's my list of them.
  1. Pew pew pew
  2. You know what I mean: lasers! Well, not only lasers, plenty of other laser-like guns too. With all that progress they have made you certainly would expect soundless guns to exist. Considering that we have such today! Bonus is that many of those large laser-type guns in turrets on space ships also have a quite powerful recoil.
  3. We live in a middle of nowhere
  4. If you watched sci-fi a lot, you should have seen many cases when spaceship suddenly loses it's ability to fly faster than light. What happens next? Well, they just fly to a nearby planet, which is hours or at worst days away. At the same time our own planet is over 4 light-years away from the nearest other star. This means if some ship breaks down in the middle, it's over 2 years away from any help (if it can fly close to light speed that is). So that's why no aliens have visited us yet - we live in the middle of nowhere!
  5. Ships fly like planes
  6. There is a reason why planes fly the way they do: aerodynamics. They are restricted by our atmosphere. Spaceships however fly in vacuum. There is no reason for them not to be able to turn in place. And with all that fancy technology they have it should be possible for them to make 90 degree turn at high speed. Yet they still prefer to fly in an aerodynamic way. Why?
  7. Wars are fought with weakest weapons there are
  8. Many TV show have very powerful weapons available. A single shot can completely disintegrate multiple humans, a single missile can make entire planet uninhabitable and so on. Yet, when things come to interplanetary war, everyone place it nice and safe: weapons just make non-lethal injuries and planetary bombardments make no more damage than World War II bombs. Everyone is so civilized, no intentions of annihilating your foes even when you lose.
  9. Computers are still stuck somewhere in 70s
  10. This one is where sci-fi creators struggle the most to see the future. Besides human-like AI and ability to talk in human voice computers are more often behind the reality. Just think of it: counting smartphones as computers many of us actually have more than one, while spaceships of hundred years later still have a single computer in them. And often a quite a slow one too.
  11. Ships come to halt on engine failure
  12. What happens when ships engine fails? Usually it comes to full stop. And I'm not talking about faster-than-light travel, sublight speed spaceships also often stop on engine failure. At the same time our primitive space ships fly in close proximity of our solar system with their engines off most of the time. In vacuum there is no resistance you know, so you do not lose speed, hence engines are only for reaching it and maneuvering and fighting gravity.
  13. Aliens have no diversity and are sooo predictable
  14. Even in my city of a bit over half million of people I can't predict how someone will react to certain actions. And we look quite different. Dress different. Have different religions. We even have different native languages, though one dominates. While aliens we encounter in sci-fi are all the same, speak one language, have one culture and are very very predictable. How many times have you heard "these guys will act aggressively", "they never withdraw, that's their weakness", etc. Come on!
  15. We're the only species with creativity in extraordinary situations
  16. When something very unusual happens, our friends aliens, despite being significantly more advanced then us, seem to be unable to find any clever solution. They just do the usual, casual, what they are used to and only we, humans, are able to think of something smart, non-traditional to save everyone. How did those aliens manage to develop so much in the first place?
Got anything to add?

2015 m. gegužės 28 d., ketvirtadienis

C++ inheritance explain (part III)

This is the third part in the series. If you haven't, have a look at part I and part II.

This part is limited to a single, but rather complicated inheritance feature in C++.

Virtual inheritance

Now we've come to one of most complicated features - virtual inheritance.

Let's start as usual, with most trivial example:

class Base
{
public:
  int m_base_member;
  void set_base_member(int x)
    {
      m_base_member = x;
    }
};

class Derived : public virtual Base
{
public:
  int m_derived_member;
  void set_derived_member(int x)
    {
      m_derived_member = x;
    }
};
This time I've added implementations to methods, as they do matter now. Base class translates to this (nothing particular here):

struct Base
{
  int m_base_member;
};

void Base_set_base_member(Base *_this, int x)
{
  _this->m_base_member = x;
}
Now let's see the layout for Derived:

struct Derived_SubObject
{
  int m_derived_member;
};
struct Derived : public virtual Base
{
  void *_vtable;
  Base _parent;
  Derived_SubObject _derived_part;
};
One thing is, that Derived class does have a VTable. That could have been guessed. As you can see, I've split Derived class specific members into a separate struct, that I placed directly inside Derived, rather than inlining members. This was done in order to explain the code in set_derived_member() method, which looks like this:

void Derived_set_derived_member(Derived *_this, int x)
{
  Derived_SubObject *_derived_part = _get_sub_object(_this, Derived_part);
  _derived_part->m_derived_member = x;
}
The important part here is that implementation of set_derived_member() make no assumptions about layout of Derived being passed in, with exception for VTable, which is expected to be at the start. The sub-part of Derived can be located anywhere in the object, method always looks it up via VTable. This has two implications:
  1. negative performance impact, as instead of accessing variable via offset in object, a VTable lookup is performed
  2. in case of multiple inheritance, duplication can be avoided for diamond-problem (as shown below)
class Base
{
public:
  int m_base_int;
};

class Derived1 : public virtual Base
{
public:
  int m_derived1_int;
  void foo(int x) { m_base_int = x; }
};

class Derived2 : public virtual Base
{
public:
  bool m_derived2_bool;
  void bar(int y) { m_base_int = y; }
};

class DerivedMultiple : public Derived1, public Derived2
{
public:
  bool m_multiple_bool;
};


/* DerivedMultiple object; */
Derived1 *d1 = &object;
d1->m_base_int = 3;
d1->foo(6);
Derived2 *d2 = &object;
d2->m_base_int = 4;
d2->bar(2);
Base *b = &d2;
b->m_base_int = 0;
The resulting structs and functions are:

struct Base
{
  int m_base_int;
};

struct Derived1_SubObject
{
  int m_derived1_int;
};
struct Derived1
{
  void *_vtable;
  Base _parent;
  Derived1_SubObject _derived1_part;
};

void Derived1_foo(Derived1 *_this, int x)
{
  Base *base = _get_sub_object(_this, Base_part);
  base->m_base_int = x;
}

struct Derived2_SubObject
{
  bool m_derived2_bool;
};
struct Derived2
{
  void *_vtable;
  Base _parent;
  Derived2_SubObject _derived2_part;
};

void Derived2_bar(int y)
{
  Base *base = _get_sub_object(_this, Base_part);
  base->m_base_int = y;
}

struct DerivedMultiple_Subobject
{
  bool m_multiple_bool;
};
struct DerivedMultiple
{
  void *_vtable;
  Base _parent;
  Derived1_SubObject _derived1_part;
  Derived2_SubObject _derived2_part;
  DerivedMultiple_Subobject _derivedmultiple_part;
};
As you see, in the final layout of DerivedMultiple there is only one Base part. The methods Derived1::foo() and Derived2::bar() have identical code, all they require is a VTable at the start of _this. Because DerivedMultiple satisfies this requirement, it can be passed in directly to either of two.
Let's analyze the code part step by step:

/* DerivedMultiple object; */
Derived1 *d1 = &object;
_get_sub_object(d1, Base_part)->m_base_int = 3;   // d1->m_base_int = 3;
Derived1_foo(d1, 6);           // d1->foo(6);
When doing pointer assignment compiler is not required to do anything about pointer, as all that is needed is a VTable at the start. When accessing anything from Base a VTable-lookup is done to obtain a pointer to Base part inside object. foo() call is trivial.
The code where we use Derived2 is pretty much the same:

Derived2 *d2 = &object;
_get_sub_object(d2, Base_part)->m_base_int = 4;    //d2->m_base_int = 4;
Derived2_bar(d2, 2);   //d2->bar(2);
Last, let's cast to Base and use that:
Base *b = _get_sub_object(d2, Base_part);
b->m_base_int = 0;
The base is obtained the same way as inside implementations of foo() and bar(). The important thing to note here is that making Base into a virtual class does not change that, except that pointer to VTable could be reused then.

This draws few conclusions:

  • Virtual inheritance has negative performance impact, because of regular sub-object lookups
  • In case of multiple inheritance and diamond hierarchy, two copies of common base class can be avoided
  • Internal layout of class is unpredictable, compiler can rearrange sub-objects
  • Virtual inheritance may not solve problem with duplicate base class; this can happen, if there is a complex mix of classes, where some classes do not use virtual inheritance, so their layout has to be preserved

2015 m. balandžio 3 d., penktadienis

C++ inheritance explained (Part II)

This is the second part, explaining how inheritance works in C++ under the hood.
If you haven't read the first part, I recommend to have at least a quick look, as it clarifies the approach I'm taking. You can find first part here.
In this part I'll explain probably the most feared feature in C++ - multiple inheritance.

Multiple inheritance with simple base classes

Let's go for simple, but yet famous diamond problem:

class CommonBase
{
public:
  int m_some_int;
  void set_int(int x) { m_some_int = x; }
};
class DerivedOne : public CommonBase
{
public:
  float m_some_float;
  void foo(){}
};
class DerivedTwo : public CommonBase
{
public:
  bool m_some_bool;
  void bar() {}
};
class DerivedMultiple : public DerivedOne, public DerivedTwo
{
public:
  void baz(){}
  void set_int(int x) { DerivedOne::set_int(x); }
};

This one is tricky. Let's list only resulting structs here first:

struct CommonBase
{
  int m_some_int;
};
struct DerivedOne
{
  CommonBase _parent;
  float m_some_float;
};
struct DerivedTwo
{
  CommonBase _parent;
  bool m_some_bool;
};
struct DerivedMultiple
{
  DerivedOne _parent1;
  DerivedTwo _parent2;
};
Pay attention to DerivedMultiple struct. To make it clear, let's expand it's parents inline:

struct DerivedMultiple
{
  /* DerivedOne _parent1; */
  CommonBase _parent1_parent;
  float m_some_float;

  /* DerivedTwo _parent2; */
  CommonBase _parent2_parent;
  bool m_some_bool;
};
As you can see, DerivedMultiple has two copies of CommonBase! Another thing to note is that DerivedTwo does not start at offset 0!
That immediately raises two questions:
  1. Which CommonBase is used, when needed?
  2. How do we call DerivedTwo::bar() on a DerivedMultiple object?
Answers get clear when we translate the calling code:

/* DerivedMultiple object; */
object.foo();
object.baz();
object.bar();

results in:

/* DerivedMultiple object; */
CommonBase_foo(&object);
DerivedMultiple_baz(&object);
DerivedTwo_bar(&object._parent2);  /* <--- PAY ATTENTION */

As you can see, when we call method, that comes from DerivedTwo, we don't pass in pointer to our object as first argument! Instead, we pass in pointer to the subobject part, where the DerivedTwo part is located!
But now we have another question: what if we call foo() from inside bar()? How is CommonBase resolved when we have pointer pointing somewhere inside DerivedMultiple object?

Let's demystify it with this example:

/* DerivedMultiple object; */
DerivedOne *derived1 = &object;
DerivedTwo *derived2 = &object;
derived1->foo();
derived2->foo();

This code translates into the following:

/* DerivedMultiple object; */
DerivedOne *derived1 = &object;
DerivedTwo *derived2 = &object._parent2;
CommonBase_foo(derived1);
CommonBase_foo(derived2);  /* but, THEY LOOK THE SAME? */
OK, so this gives us two puzzles. First, when assigning pointer to DerivedMultiple to a pointer to DerivedTwo, the pointer is automatically shifted to the subobject part! Second, and the most important, THERE IS NOTHING SPECIAL DONE TO RESOLVE CommonBase!
Yes, that right - the two calls will access the different CommonBase subobject inside DerivedMultiple!

Let illustrate it with numbers:

/* DerivedMultiple object; */
object.set_int(5);
DerivedTwo *two = &object;
two->set_int(6);
int five = object.DerivedOne::m_some_int;
int six = object.DerivedTwo::m_some_int;

Calling set_int() on main object and DerivedTwo will set different m_some_int fields.
Now you know, that multiple inheritance is hated for a reason!

Multiple inheritance involving virtual base class

Take this hierarchy:

class SimpleBase
{
public:
  int m_some_int;
};
class VirtualBase
{
public:
  float m_some_float;
  virtual void foo() {}
};
class Derived : public SimpleBase, public VirtualBase
{
};

The resulting structs are:

struct SimpleBase
{
  int m_some_int;
};
struct VirtualBase
{
  void *_vtable;
  float m_some_float;
};
struct Derived
{
  void *_vtable;
  SimpleBase _parent1;
  VirtualBase _parent2;
};

To make it clear, let's expand parents directly inside:

struct Derived
{
  void *_vtable;
  /* SimpleBase _parent1; */
  int m_some_int;

  /* VirtualBase _parent2; */
  void *_parent2_vtable;
  float m_some_float;
};

As you see, the only changes are related to VTable. There are few possible scenarios:

  • if the first base class is polymorphic, it's VTable can be reused (no need to add such field at the beginning)
  • there might be several pointers to VTable inside class (some can be inherited)
  • every pointer to VTable can point to the same or different place (this is entirely up to compiler)

Method overrides in multiple inheritance

Things get further complicated when we override methods in a class with more than one base.
Let's take this example:

class Base
{
public:
  int m_base_int;
};
class SimpleDerived : public Base
{
public:
  int m_simple_derived_int;
};
class VirtualDerived : public Base
{
public:
  int m_virtual_derived_int;
  virtual void set_int(int x) { m_virtual_derived_int = x; }
};
class Multiple : public SimpleDerived, public VirtualDerived
{
public:
  virtual void set_int(int x) override { m_simple_derived_int = x; }
};

You already have the idea, how resulting struct look like, so I won't bother pasting them here. Let's execute this code:

/* Multiple object; */
object.set_int(5);
VirtualDerived *vd = &object;
vd->set_int(8);   /* <--- HOW DOES THIS ONE WORK? */

When translated to C it looks like this:

/* Multiple object; */
_get_method_address(object._vtable, set_int)(&object, 5);   /* no magic here */

VirtualDerived *vd = &object._parent2;   /* this one familliar too */
_get_method_address(vd->_vtable, set_int)(_get_object_address_for_func(vd, set_int), 8);

Looks like I've been lying to you a bit, when explaining how method overrides work :)
What you see here happening is:

  • method address is obtained from VTable as usual
  • because object we have a pointer to can be involved in multiple inheritance, we can not pass pointer to it to a function - what if we have a pointer to some subobject inside, while method expects a pointer to actual object?
  • before pointer is passed to function, it goes through some compiler function, that looks to VTable and returns us a valid address to pass to function (not necessairy to beginning of real object)
  • function is used all the time when object address is passed to a virtual method, because we never know, what types are derived from given class, the tree can have a very complicated mixture of single and multiple inheritance

Hints for safe use of multiple-inheritance

  • try to use only single inheritance and interfaces; in C++ interface would be a class, that has nothing but statics and pure-virtual methods
  • the biggest problems come from classes with fields and non-virtual methods; try to achieve, that non-first base class has none
  • make non-primary base classes as trivial as possible (ideally interfaces), best top level classes (not derived from anything)
  • avoid diamond, use virtual inheritance once noticed
  • be very very careful

Stay tuned for part III, which will have another complicated aspect - virtual inheritance!

2015 m. kovo 26 d., ketvirtadienis

C++ inheritance explained (Part I)

Inheritance in C++ is one of most complex forms of inheritance there is. Understanding, how it works and what hidden features are involved is useful (if not required) to not mess things up.
I'll try to explain it all in detail by examples.
Before we start, there are few things to note:
  • Visibility (both member and inheritance) has no effect, so everything in all examples is public
  • The C++ code will translated to C code to reveal, what is done automatically by compiler
  • The "C++ compiler" is an imaginary one, in attempt to make things simple and clear
  • Namespaces and name mangling are ignored for simplicity (have no impact on inheritance)

The simple inheritance

Let's start with the most trivial example:

class SimpleBase
{
public:
  int m_some_int;
  void foo(int x) {}
};
class SimpleDerived : public SimpleBase
{
public:
  float m_some_float;
  void bar() {}
};

When compiled, compiler turns it into something like this:

struct SimpleBase
{
  int m_some_int;
};
void SimpleBase_foo(SimpleBase *_this, int x) {}
struct SimpleDerived
{
  SimpleBase _parent;
  float m_some_float;
};
void SimpleDerive_bar(SimpleDerived *_this) {}

How it works:

  • top level class becomes a struct with member variables matching those of the class
  • methods become functions, that take a pointer (in C++ it's actually a reference) to a corresponding struct as first argument, with other arguments being that of the original method
  • inheritance places parent struct as first member, it has an offset 0, so derived class can be casted to a base one (this is done automatically by compiler)
  • special methods (like constructor) and overloaded operators are also methods and are turned into similar functions
  • static methods are just functions, in this case class serves simply as a namespace plus has some visibility related features

Simple inheritance with polymorphic base class

Let's change the example so that base class is polymorphic:

class PolyBase
{
public:
  virtual void foo() {}
  void bar() {}
  int m_some_int;
};
class SimpleDerivedFromPoly : public PolyBase
{
  virtual void foo() override {}
  void bar() {}
  float m_some_float;
};

In this case compiler turns base class it into something like this:

struct PolyBase
{
  void *_vtable;
  int m_some_int;
};
void PolyBase_foo(PolyBase *_this) {}
void PolyBase_bar(PolyBase *_this) {}

What we see different from simple inheritance is that there is something called _vtable as first member (compiler is free to place it anywhere, but it is usual to place it as first member).
Another thing that changes significantly is how methods are called. Let's take C++ code:

/* PolyBase object; */
object.bar();
object.foo();

Compiler translates it to something like this:

/* PolyBase object; */
PolyBase_bar(&object);
_get_method_address(object._vtable, foo)(&object);

The difference you see here is:

  • non-virtual method is a simple function call
  • virtual method call involves so called vtable-lookup: address of method foo is found in vtable (somehow, this is up to compiler), the address is a pointer to function that is called.
The derived class is compiled into:

struct SimpleDerivedFromPoly
{
  PolyBase _parent;
  float m_some_float;
};
void SimpleDerivedFromPoly_foo(SimpleDerivedFromPoly *_this) {}
void SimpleDerivedFromPoly_bar(SimpleDerivedFromPoly *_this) {}

Nothing particular here. Let's see how method calls look like:

/* SimpleDerivedFromPoly object */
object.bar();
object.foo();

becomes:

/* SimpleDerivedFromPoly object */
SimpleDerivedFromPoly_bar(&object);
_get_method_address(object._parent._vtable, foo)(&object);

Notes:

  • non-virtual method calls the method from child class
  • virtual method call is no different at all (except that _vtable is inside _parent)
  • the actual value of _vtable is different for an object of every class, that's how the correct method is found

Simple inheritance with polymorphism added in derived class

Unlike in previous example, this time let's have simple base class and polymorphic derived:

class SimpleBase
{
public:
  int m_some_int;
  void foo() {}
};
class PolymorphicDerived : public SimpleBase
{
public:
  bool m_some_bool;
  virtual void bar() {}
};

This results in the following structs:

struct SimpleBase
{
  int m_some_int;
};
struct PolymorphicDerived
{
  void *_vtable;
  SimpleBase _parent;
  bool m_some_bool;
};

As you see, now things get a bit more complicated, because a pointer to VTable is prepended before parent (it could place it after it too, but I'm placing it this way, because it will make it easier to understand multiple inheritance later)! Let's see how it works!

/* PolymorphicDerived object; */
SimpleBase *base = &object;
base->foo();
object.foo();

When traslated to C, it results in the following. I'll split it to explain individual parts.

/* PolymorphicDerived object; */
SimpleBase *base = &object._parent;
SimpleBase_foo(base);

So, as you see, when assigning to base the pointer is automatically shifted by compiler to point to parent! The pointer to base class actually points not to object, but inside it. This enables the simple method call as it would be if had an object of SimpleBase. Calling inherited method from the real object is also different:

SimpleBase_foo(&object._parent);

in this call not the object is passed as parameter, but the subobject of relevant type.

Last thing to note is downcasting:

/* SimpleBase *base; */
PolymorphicDerived *derived = static_cast<PolymorphicDerived*>(base);
Results in something like this:

/* SimpleBase *base; */
PolymorphicDerived *derived = (PolymorphicDerived*) ((void*)base) - sizeof(void*);

The exact opposite must happen, the pointer must be moved back by pointer size, in order to point at the beginning!
This involves the following things behind scenes:

  • Casting PolymorphicDerived* to void* and then to SimpleBase* will result in invalid pointer (not pointing to subobject part)!
  • Comparing PolymorphicDerived* to SimpleBase* using == or != operator will move one of them (either) by pointer size before comparison

End of Part I. Stay tuned for the next part which will involve one of the scary parts in C++: multiple inheritance!

2015 m. sausio 13 d., antradienis

Basic rules of automated software testing

There are a lot of posts about what should or shouldn't be done when writing automated tests for software. Below is my list.

Tests are written for others first, only then for yourself

There are only few cases, where tests are written to check whether the code works. In most cases writing test is not the most efficient way to check. Instead, tests are written primarily to catch regressions when unrelated changes are made. Since it's quite easy to break something you don't know about, tests are written to prevent others to break something they possibly don't even consider.
As such, claims "I don't need tests" are pretty much void if there is more than one developer on the project.

Test that hasn't failed at least once has not been proven to test anything

Should be obvious, but if the test has never been red, how do you know that it actually tests something? Maybe the test is simply always green and will not catch any bug!
Tests are just like any other code, they have to be verified. The simple way to do this is to introduce a bug in the code and run the test, to see if it fails.

When code evolves, test suite should evolve with it

The only case when code changes not necessarily cause test changes is refactoring. In all other cases if code changes, but tests don't, the new features are not covered, so the test suite degrades.
Test suite is relevant only if it is kept in sync with code it tests.

Tests should test the smallest possible feature set

This is easier said than done. Isolating different features from one another can be difficult. Testing features is only one part. Ideally it should be easy to find, what got broken when test fails. If test depends on multiple features at the same time, failure of that test shows, that one of these features is broken, but not always tells which one.
There are two solutions here:
  • Make tests depend only on one feature, so that test failure indicates a problem with that feature
  • Order tests accordingly. If test depends on 3 different features, but 2 of them have been thoroughly tested before, a failure is likely caused by the third
First one of these two options is preferred.

Testing against mocks is inadequate

It is a popular suggestion by unit-tests proponents to mock everything to achieve single feature isolation.
There is a pitfall in doing so: mocks are never a real thing! Testing against mocks only proves that code works with these mocks! There is almost no software in the world, that is 100% compliant with standards they support or even their own documentation (assuming that software has been around for a while). Yet, people for some reason think, that it is possible to keep mocks 100% identical in behavior to the real thing they imitate.
The real implementations evolve over time and mocks have to kept in sync. Sooner or later they diverge. For this reason integration and end-to-end tests are required, in order to make sure the code works with actual real implementations.

Unit testing is inadequate

Should be obvious: end users don't care, whether your tests are green or not. They care, whether software works or not. Unit tests can prove that individual components work, but they say nothing about the behavior, when those components are put together.

It's not important, whether code or test is written first

TDD zealots claim otherwise and they are wrong. While TDD can increase the coverage and quality of tests, it does not guarantee that!
What matters in the end is the quality of the code and the quality of the tests. When both are good, no one cares, in what order they were produced.

Only testing your own code is risky

in part this is another argument against TDD...
One of the reasons for doing code review is that it is hard to spot problems in your own code. The same applies to tests - when else drills your code besides yourself, the chances of catching bugs increases.

Only testing the correct behavior is inadequate

One common mistake made in automated tests is only verifying that code behaves correctly under correct conditions. For complete testing the opposite should also be tested: code should give errors under incorrect conditions.

Do you have any additional rules to add?