Software developer Interview Q&A

interview questions

This page covers a list of interview questions that every software developer should know, starting from language specific, up to data structures, operating systems, and memory related knowledge.
Please let me know if you notice any mistakes.
Other than that, enjoy your reading!

C++ interview questions

  1. What does the const keyword specify, and where it can be used?

    • In variables, to specify that the data cannot be modified.
      • Constant value / reference
      • Pointer to constant data (left side of the pointer)
        • The pointer CAN point to another data.
        • The pointer CANNOT modify the data it points to.

      • Constant pointers (right side of the pointer)
        • The pointer CANNOT point to another data.
        • The pointer CAN modify the data it points to.
      • Constant pointer to a constant data
        • The pointer CANNOT change the data it points to.
        • The pointer CANNOT points to a different object.
    • after a class function declaration
      • We can modify a member while being inside a function with const if we add the mutable modifier
    • at return values, to assure that the data returned is read-only
      • if the return value is a variable, it cannot be modified
      • if the return value is a class instance, the const this pointer will be returned. Only const functions can be called further.
  2. What does the static keyword specify, and where it can be used?

    • Inside functions, to retain the value in subsequent calls
    • In classes
      • Static functions and objects are part of the class itself, and not of a single instance
        • They can be called by both using an instance and by using the class name
        • Static members needs to be declared outside of the class definition
    • If a global variable is static, its visibility is limited to the same source code.
    • Starting from C++11, static variable initialization is thread-safe (“such a variable is initialized the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization. […] If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.”)
  3. What is the inline keyword?

    • It is a hint to the compiler to inline a function.
      • Inlining a function would reduce the cost of the call by copying the code from that function in place instead of calling the actual function.
    • Can we have a recursive inline?
      • No, the compiler don’t know about the level of recursion at compile time
  4. What is the difference between a struct and a class?

    • The default access modifiers: struct default is public, while the default for classes is private
  5. Can we inherit a class from a struct?

    • Yes we can.
    • If we inherit privately from the struct, all of the data from it will also be private inside of the class
  6. What access modifiers do we have?

    • Private: we have access to that member/function only from inside the class
    • Protected: we have access to the member/functions from inside of the class and from derived classes
    • Public: we have access from everywhere
  7. Where do we use the friend keyword?

    • A class can add another class or function as its friend, meaning that it has access on the private members of that class.
    • If class A have class B as a friend, it means that:
      • class B can access private members/functions of class A.
      • class A cannot access private members/functions of class B
      • friends of class B cannot access private members/functions of class A
  8. What types of inheritance we can have?

    • Given class A stated above, the functions from it will have the following access in the derived classes:
    • Public inheritance: is-a relation
      • This is the usual derivation. B instances have, by default, the same functionality as A instances have. The developer can now decide which functions to overwrite, overload, or rewrite if the function was a virtual, while keeping all the implementation of the A class (and can call them too – directly (if there’s no ambiguity), by doing an upcasting (converting a derived-class reference or pointer to a base-class), or with the ‘using’ keyword in some cases[See Q19]).
      • We can use casts to write B in terms of a
    • Protected inheritance: in-terms-of relation
      • all the public and protected members/functions from A are protected in B
    • Private inheritance: has-a relation

    • private: void one(); void two(); };

      • All the public and protected members/functions are private in B
      • In terms of code, it’s the same as below
      • We cannot use casts to write B in terms of A (same applies for protected)
  9. Where can we use the virtual keyword?

    • In virtual inheritance
      • It is used to solve the multiple inheritance problem (diamond problem)
      • Because D derives from both B and C classes, which each one of them have their own copy of the data members and methods of the A class, the D class will contain two sub-objects of the given A class. Meaning, any access to the given base class member ‘i’ will end in a compile error, because D will contain two objects under the name i : one under the namespace B::i, and the other from C::i.
      • In order for the compiler to accept the access to the base class members and functions, we will need to virtually derive from them, as below
    • In a function
      • The function that needs to be called will be determined at runtime, based on the object type
      • A virtual function without declaration is called a pure virtual. A class that contain at least one pure virtual function are abstract and cannot be instantiated
  10. What is the size of…

    • An empty class
      • The size is 1.
      • Imagine we have an array of empty classes. If the size of the class would be 0, all of the instances would have the same this ptr
    • A class with a virtual function
      • The size of the vptr (which is the same as the size of any other pointer,).
      • The size required by the system to hold a unique memory address is compiler dependent: 4 bytes (32 bits) on a 32-bit machine, 8 byte on a 64-bit machine.
  11. What is a vtable, vptr? Where are they stored?

    • Virtual table (vTable)
      • Is a lookup table of function pointers used to dynamically bind the virtual functions to objects at runtime.
      • Every class that uses a virtual function (either by having one or inheriting one) receives a vTable for that class.
      • The vTable contains one entry of function pointer for each virtual function.
      • Pure virtual functions have the function pointer set to null.
      • If the virtual function is not overwritten, it will point to the virtual function declared from base
    • Virtual pointer (vPtr)
      • Is the vTable pointer: a hidden pointer added by the compiler to the class.
      • The pointer is pointing to the vtable of that particular class.
      • Each object of a class with virtual functions stores this vptr, and uses it to resolve the call to the virtual function.
      • vTable
  12. What is the difference between deep and shallow copy?

    • Shallow copy is the default code generated by the compiler, when a copy constructor / copy assignment is called. The compiler will copy all the data individually ( using the assignment operator ‘=’ ).
    • Shallow copy works fine for simple classes (with no dynamically allocated content).
    • If a shallow copy is performed on a class with pointer objects, the shallow copy will copy only the memory address of our object – it will not allocate or copy any contents – so both pointers will point to the same address in memory, and changing in one place will have effect in the other one as well.
    • Deep copy is performed by writing our own copy constructors and assignment operator.
    • The overwritten functions should allocate memory for the copy and also copy the actual data in place, so each object will have its own separate copy.
    • Shallow vs deep
  13. What is the difference between new and new[], delete and delete[]?

    • The operator new and delete, new[] and delete[] need to be used correspondingly.
    • new and delete are used to allocate / deallocate memory for a single object
    • new[] and delete[] are used to allocate / deallocate memory for array of objects.
  14. What is the difference between malloc and new?

    • malloc is originally from C, and it is used to allocate memory only (no constructor call!)
    • malloc receives the number of bytes to be allocated, and return a void* that forces the developer to cast that memory to the required one.
    • the new keyword is more type safe, returning the correct type to the user.
    • new can be overwritten by a class.
    • In C++, new will also call the constructor of the object.
  15. What is the difference between NULL and nullptr? [link]

    • NULL is originally from C, and it’s declared as this:
    • having two functions, such as below, calling get(NULL) will call the first one
  16. Which functions does the compiler automatically generate? [copyright]

    • If they are needed, and unless you declare one of your own,the compiler will generate:
      • a default constructor
      • a copy constructor
      • a copy assignment operator
      • a destructor
    • All those are only generated by the compiler when they are needed. (The difference is that, when the compiler cannot create them, that’s Ok as long as they aren’t used.)
    • C++11 adds the following rules, which are also true for C++14:
      • The compiler generates the move constructor if
        • there is no user-declared copy constructor, and
        • there is no user-declared copy assignment operator, and
        • there is no user-declared move assignment operator and
        • there is no user-declared destructor,
        • it is not marked as deleted,
        • and all members and bases are moveable.
      • Similar for the move assignment operator: It is generated if there is no user defined
        • there is no user-declared copy constructor, and
        • there is no user-declared copy assignment operator, and
        • there is no user-declared move constructor and
        • there is no user-declared destructor,
        • it is not marked as deleted,
        • and all members and bases are moveable.
    • Do we still have the default constructor generated if we add one with parameters?
      • No
  17. Can we have a virtual constructor? What about a virtual destructor?

    • We cannot have a virtual constructor, only virtual destructor.
    • Why we should have a virtual destructor?
      • When we delete an object, the proper destructor will be called (the one from sub-class). Otherwise, only the base class destructor would call, and we would not deallocate the memory contained in subclass
  18. Explicit constructor [link]

    • Explicit is used to disable implicit casts.
    • Given classes A and B:
  19. Name hiding / Name shadowing

    • Functions in derived classes which don’t override functions in base classes but which have the same name will hide other functions of the same name in the base class
    • Name lookup stops if it finds a name in one of your bases. It won’t look beyond in other bases.
    • The function in B shadows the function in A
    • In order to use them, we need to either:
      • explicitly tell to the compiler which function we need to call
      • tell the compiler to load them into scope, from inside of the class
  20. Size of array when sent as param

    • Whenever you declare a function that takes an array parameter, the C standard requires the compiler to ignore you and change the parameter to a pointer. So these declarations all behave like the first one (a will be a pointer to int in all cases):
  21. What is volatile? [link]

    • Used to disable compiler value caching, to request it each time its needed because it might be modified from outside
    • “A volatile object is one whose value might change spontaneously. That is, when you declare an object to be volatile, you’re telling the compiler that the object might change state even though no statements in the program appear to change it.”
    • Consider the following code:
    • The compiler sees that the signal is not changed in your code, so it may change it into the following code:
    • Using volatile says that the compiler should not optimize the code because it can change (also, it needs to read it every time it is accessed, not only add it to the registers/cache and read the same value every time)

  22. Default argument overwritten in derived virtual function [link]

    • Simple answer, don’t do that.
    • At compile time, the compiler substitutes the default arguments from the member functions of the static types of the pointers, while the function call is determined at runtime.
  23. How does the compiler knows how many objects should be destructed, when using delete[]?

    • The runtime system stores the number of objects somewhere where it can be retrieved only if you know the pointer. There are two popular techniques to do so, both of them being used in compilers:

    • Over-allocate the array and put the size just to the left of the first object.
    • Use an associative array with the pointer as the key and the size as the value.
  24. Do we have a memory leak if a constructor throws an exception ?

    • No. Creating an object is a two-step process.
    • First of all, the required memory is allocated for the object. Afterwards, the placement new is used to create an object into that memory address. If an error occurs during object’s construction, the memory is deallocated, otherwise, the memory address (this pointer) will be returned to the caller of the new operation.

C++11

  1. Smart pointers

    • Everything regarding smart pointers can be found in my other pages. [link]
  2. What does the keywords override, final, default, delete means

    • override [link]
      • When working with derived classes, it’s easy to create a new virtual function in the derived class when you actually meant to override a function in the base class. This happens when you fail to properly match the function prototype in the derived class with the one in the base class.
    • final [link]
      • There are occasionally times when you don’t want to allow someone to override a virtual function, or even create a derived class. C++11 adds the identifier final to provide this functionality.
    • default [link]
      • The default specifier can only be used with functions that have a default.
      • You can specify that you would like the compiler to provide a default one.
    • delete [link]
      • can be used to disallow a function from being defined or called. One of the best uses of the delete specifier is to make a class uncopyable:

Advanced topics

  1. What is the difference between the stack and the heap?

    • The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and function arguments/return value. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed.
    • The stack is faster than the heap because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented). Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache.
    • If the stack runs out of memory, then this is called a stack overflow – and could cause the program to crash. This often happens when a lot of nested functions are being called, or if there is an infinite recursive call.
    • Stack
    • The heap is memory set aside for dynamic allocation. Unlike the stack, there’s no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.
    • The heap could have memory fragmentation problem (see Q2).
    • Data stored on the heap has to be deleted manually by the programmer using one of the built in keywords like free, delete, or delete[ ]  – otherwise we will end up with a memory leak.
    • The heap is slower than the stack because, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be synchronized with other heap accesses in the program.
    • Stack vs Heap
  2. Memory (heap) fragmentation

    • What is it?
      • Occurs when the available memory on the heap is being stored as noncontiguous (or disconnected) blocks – because used blocks of memory are in between the unused memory blocks. When excessive fragmentation occurs, allocating new memory may be impossible because of the fact that even though there is enough memory for the desired allocation, there may not be enough memory in one big block for the desired amount of memory.
      • Frequent allocation and deallocation of dynamic memory leads to heap fragmentation, especially if the application is running for long periods and allocates small memory chunks.
      • memory fragmentation
    • How to avoid it?
      • use dynamic memory allocating sparingly
        • replace new and new[] with static or automatic storage, or use STL containers
      • try to allocate and deallocate large chunks in every new and delete expression instead of allocating small chunks frequently
      • Write your own custom allocator: Use placement new to preallocate a large buffer and then slice that buffer into smaller units to reduce the risk of heap fragmentation
      • Partition the heap into regions where memory allocation of particular size is allowed
  3. Memory (data structure) alignment [link]

    • What is it?
      • Is the way the data is arranged and accessed in computer memory. The CPU reads or writes data in word sized chunks (4 bytes) or more  This topic can be separated in two parts: Data alignment and data padding.
        • Data alignment means putting the data at a memory address equal to some multiple of the word size. In order to align the data, empty bytes are added (if needed) at the end of the data, which is the data structure padding.
    • When it occurs?
      • Padding is only inserted when a structure member is followed by another one with a larger padding alignment, and at the end of the structure.
      • The size of structures / classes are always multiple of word size, so the next memory address will be properly aligned.
    • How do we handle it ?
      • The best way to reduce structure padding (thus, reduce the number of empty bits, and the overall size of the structure) is to sort the data by descending alignment requirements.
    • memory alignment
  4. Memory barrier / fence (Barrier Synchronization)

    • What is a memory barrier?
      • A barrier is a block on reading from or writing to certain memory locations by certain threads or processes.
      • It is an instruction on certain processor architectures that will ensure certain guarantees about the order of accesses to memory. Operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
    • Why it is needed?
      • Due to modern CPU optimizations that can result in out-of-order execution (memory ordering – reordering of memory operations – loads and stores) that are only visible in a multithreading environment. (See Singleton double-locking problem)
    • memory barrier
  5. Memory cache

    • What is a cached memory ?
      • The CPUs have a few layers of hardware memory used to reduce the cost of accessing data directly from the main memory. When fetching a certain element of an array/matrix/structure etc from memory, elements near it will be fetched as well and stored in a cache line. If the ordering is exploited, this will result in fewer memory accesses (because the next few values which are needed for subsequent computations are already in a cache line).
    • memory cache
  6. What is an atomic instruction (operation) ?

    • An atomic CPU instruction means that only one read/write operation is done into the memory. Even the basic increment is usually contained out of three separate operations. Load, Increment, then Store. (read the value in the register, increment the register, save the new value from the register into the variable).
    • Atomicity is a guarantee of isolation from concurrent processes. Additionally, atomic operations commonly have a succeed-or-fail definition—they either successfully change the state of the system, or have no apparent effect.
    • For inter-thread synchronization, C++11 comes with atomic variables.
  7. What is a branch prediction?

    • Branch prediction is a computation done by the CPU during its idle time, in order to optimize the speed of the process. It is done by attempting to predict the value of a computation before knowing for sure if it’s correct or not. It is used in conditional statements (if’s) and it is based on previous results. Until the point in time when the result is known for sure, the probable branch will be selected and the upcoming lines of codes will be calculated: If the prediction was correct, the CPU won extra time for the process; if not, the executed instructions will be discarded and recomputed with the correct value. In order to benefit from a proper branch prediction, the data should be as predictable as possible.
    • Example for conditional statement on arrays:
    • For the code from above, the CPU will be unable to predict properly which branch will be executed -> But imagine what will be the outcome if the array was sorted first. The first X values will be predicted properly, afterwards the predict will fail due to outcome being the opposite (> Constant_LIMIT), and it will continue to successfully predict the branches afterwards. Assuming the size of array is very big, the branch prediction is actually very useful.
    • Link for other branch predictions and how to reorganize loops: [link]
    • branching

Multithreading / Concurrency

  1. What is a Critical Section?

    • A critical section is a group of instructions/statements or region of code that need to be executed atomically. Concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed is protected. This protected section is the critical section or critical region. It cannot be executed by more than one process.
    • Typically, the critical section accesses a shared resource, such as a data structure, a peripheral device, or a network connection, that would not operate correctly in the context of multiple concurrent accesses. A thread must acquire a lock prior to executing critical section.
    • critical section
  2. What is a data race? And what about race conditions?

    • A data race occurs when two instructions from different threads try to access the same memory location, and at least one of them is a write operation.
    • A race condition is a situation that occurs when an application behaves differently based on the timing and on the sequences executed by two different threads.
      • For example two threads read and write (increment) the same variable. If the threads are not synchronized, the following situation can occur:
      • data race
  3. Sync mechanisms [link, link]

    • Mutex
      • A mutex is a locking mechanism used to synchronize access to a resource, and it can be shared by multiple processes.
      •  Only one task (can be a thread or process based on OS abstraction) can acquire the mutex. It means there is ownership associated with mutex. Only the process that locked the mutex can unlock it.
      • Mutexes provide priority inversion safety. Since the mutex knows its current owner, it is possible to promote the priority of the owner whenever a higher-priority task starts waiting on the mutex.
      • Mutexes also provide deletion safety, where the process holding the mutex cannot be accidentally deleted.
      • mutex
    • Mutex in C++
      • C++ contains multiple standard-defined mutexes:
        • mutex (C++11): Offers exclusive, non-recursive ownership semantics.
        • timed_mutex (C++11): Offers exclusive, non-recursive ownership semantics. Contains try_lock_for / try_lock_until function that will attempt to acquire the lock for a period of time.
        • recursive_mutex (C++11): Offers exclusive, recursive ownership semantics. The mutex can lock multiple times when recursively calling the function that contains it, but in order to have it unlocked, it needs to call unlock the same number of times.
        • recursive_timed_mutex (C++11): This is a combination of timed_mutex / recursive_mutex.
        • shared_mutex (C++17): This mutex allows multiple threads to access the critical section if they are reading from it, but when it has to write, all the other threads that want to access the critical section will have to wait. This is done using one of two levels of lock access:
          • exclusive – exclusive access, done using lock / unlock.
          • shared – shared access, done using lock_shared / unlock_shared.
        • shared_timed_mutex (C++14): Contains the same functionality as shared_mutex, but also contains the functions that will attempt to acquire the lock for a period of time.
      • A mutex contains two functions, lock and unlock. By forgetting to unlock the mutex or if an exception is thrown, the thread containing the mutex will lock forever. Because mistakes can happen, the rule is that you should NEVER use these functions directly, but to use a mutex ownership wrapper, such as a lock_guard or a unique_lock.
      • A lock_guard receives a mutex as parameter and calls the mutex lock function on the constructor, and unlock on the destructor.
        • A unique_lock is almost the same as a lock_guard, but less restrictive. The unique_lock can wrap the mutex without locking it on constructor call, can attempt to lock a mutex for a specified time durationcan unlock it at any point in its existence, and can transfer the ownership of the lock from one instance to another. Unless we need the capabilities of unique_lock (needed in condition variables), we should always use a lock_guard.
    • Semaphores
      • Semaphore is a signaling mechanism that also provide synchronization services, but can allow more than one thread to enter.
      • In contrast with the mutex, a semaphore has no concept of an owner (any process can unlock a semaphore), do not provide priority inversion safety, nor deletion safety.
      • semaphore
  4. What is a dead lock?

    • A deadlock can be explained by a set of blocked processes each holding a resource and waiting to acquire a resource held by another process in the set.
    • deadlock
  5. What is double-checked locking? [link, link]

    • Double-Checked Locking is widely cited and used as an efficient method for implementing lazy initialization in a multithreaded environment. However, because of compiler optimizations(1), CPU running the operations out of order (2) and due to locally cached copies of memory in the CPU (3), this method does not work at all. (For further information read the links from above)
    • C++11 added thread-safe initialization for local static variables [Q2;C++ topic]

Design patterns

  1. Explain a specific pattern and what problem does it solve (To be checked on the topic designated for each pattern)
    • Singleton – Ensure that only one instance of a class is created and Provide a global access point to the object.
    • Factory(Simplified version of Factory Method) – Creates objects without exposing the instantiation logic to the client and Refers to the newly created object through a common interface.
    • Factory Method – Defines an interface for creating objects, but let subclasses to decide which class to instantiate and Refers to the newly created object through a common interface.
    • Abstract Factory – Offers the interface for creating a family of related objects, without explicitly specifying their classes.
    • Builder – Separate the construction of a complex object from its representation so that the same construction process can create different representations.
    • Prototype – Specify the kinds of objects to create using a prototypical instance, and create new objects by copying this prototype.
    • Object Pool – reuses and shares objects that are expensive to create.
    • Chain of Responsibility – It avoids attaching the sender of a request to its receiver, giving this way other objects the possibility of handling the request too – The objects become parts of a chain and the request is sent from one object to another across the chain until one of the objects will handle it.
    • Command – Encapsulate a request in an object, Allows the specialization of requests and allows saving them in a queue.
    • Interpreter – Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the language / Map a domain to a language, the language to a grammar, and the grammar to a hierarchical object-oriented design.
    • Iterator – Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation.
    • Mediator – Define an object that encapsulates how a set of objects interact. Mediator promotes loose coupling by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.
    • Observer – Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.
    • Strategy – Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it.
    • Template Method – Define the skeleton of an algorithm in an operation, deferring some steps to subclasses / Template Method lets subclasses redefine certain steps of an algorithm without letting them to change the algorithm’s structure.
    • Visitor – Represents an operation to be performed on the elements of an object structure / Visitor lets you define a new operation without changing the classes of the elements on which it operates.
    • Null Object – Provide an object as a surrogate for the lack of an object of a given type.
    • Adapter – Convert the interface of a class into another interface clients expect. / Adapter lets classes work together, that could not otherwise because of incompatible interfaces.
    • Bridge – The pattern decouples an abstraction from its implementation, so that the two can vary independently.
    • Composite – Compose objects into tree structures to represent part-whole hierarchies. / Composite lets clients treat individual objects and compositions of objects uniformly.
    • Decorator – add additional responsibilities dynamically to an object.
    • Flyweight – use sharing to support a large number of objects that have part of their internal state in common where the other part of state can vary.
    • Memento – capture the internal state of an object without violating encapsulation and thus providing a mean for restoring the object into initial state when needed.
    • Proxy – provide a “Placeholder” for an object to control references to it.

Data structures / Algorithms

  1. Explain a specific data structure (e.g. vector, map, simple/double linked list, set/multiset, map/multimap, etc.)

    • To be checked on the designated pages on this blog
  2. Add / Remove an element from a simple linked list

    • Adding/Removing an element into/from a linked list is not a hard operation, but the developer need to assure that the nodes are updated in the right order. Assuming we have the following structure, the operation is done as below:
    • Adding an element
      • add_single
    • Removing an element
      • remove_single
    • Reverse the nodes of a linked list
      • In order to reverse the nodes of a list, we will need to keep three pointers, prev, current, and next, and iterate until current reaches the end of the list. Assuming we have the reverseList function, that receives the head pointer and returns the new head.
      • reverse single
  3. Add / Remove an element from a doubled linked list
    • Assuming we have the following structure:
    • Adding an element
      • add double
    • Removing an element
      • remove double
  4. When should we use a vector and when should we use a list?

    • A vector is very fast at accessing elements, because they are cached up(continuous memory).
    • A vector is slow at inserting/removing elements, because insertion in a full list would require allocating a new and bigger space and move all the data into the new space, while removing them will imply shifting all the other elements to the left.
    • A list is very fast at adding or deleting elements, requiring only the update of some pointers, but it is slow at iterating through because the next element is not cached (discontinuous memory).
  5. What is binary search?

    • Binary search is an algorithm that is available on a sorted set of values and drastically reduce searching time. It is implemented by always verifying the middle element of the data set and keeping only half of it, on each iteration, based on whether the condition was successful or not. The algorithm stops when the element was found or when the remaining set is empty.
    • Assuming we have a vector of 1000 elements (distinct sorted numbers from 0 to 999) and a number that is searched (780 let’s say), the algorithm will do the following steps:
      • Step 1: Go to the half of the vector ,check if we found the value
        • Compare the value with our number. Pick the left side if higher, right side if lower (since its sorted)
        • 780 > 500 -> select the right side [500, 999]
      • Step 2: Go to the half of the vector:  750 , compare with our value -> select right vector [750,999]
      • Continue the iteration until found or until remaining set of data is empty
        • Assuming we would do that with a regular iteration in step 2 we are at value 750, otherwise we would have needed 750 iterations just to reach this point. But if we needed to check the value ‘3’ for example, linear search would be faster than binary search.
    • Binary search drastically improves performance when used on a big (sorted) collection of data.
    • binary

Operating Systems

  1. What is the role of an operating system?

    • The operating system manages all the software and hardware on the computer, by acting as an interface between the hardware and the programs requesting I/O.
    • Some of the features of an operating system are:
      • Memory management
        • Keep track of the main memory (which parts of it are in use and which are free)
        • Allocate the memory when a process requests it
        • Deallocate the memory when a process is terminated
      • CPU management
        • The CPU scheduler will keep track of the processes that can run
        • Provides the context switching on the CPU
      • File management
        • Keep track of the filesystem (files, location, status, etc)
  2. Scheduler

    • The scheduler is a part of the operating system that decides which process runs at a certain point in time. It usually has the ability to pause a running process, move it to the back of the running queue and start a new process.
    • scheduler
    • Schedulers are of three types:
      • Long-term scheduler (Job scheduler)
        • Select processes from the queue and loads them into memory for execution. (change state of a process to ready)
      • Short-term scheduler (CPU scheduler)
        • Handles the changing state of a process from ready to running, by allocating a process (thread) to a CPU.
        • After a process is selected to run, it is sent to the dispatcher that will save the context for the previously runnable thread and load the context for the new thread.
      • Medium-term scheduler
        • Removes the process from memory, handle the swapped-out processes.
      • ProcessStateTransition
  3. Scheduling Algorithms

    • Scheduling algorithms are used to distribute hardware resources to the many processes / threads that requests them.
    • Preemptive vs non-preemptive scheduling
      • Preemptive scheduling is used when the current task is interrupted for some time and resumed later,  in order to release the CPU for another higher priority task to jump-in. (Such as round robin).
      • In Non-preemptive scheduling, a running task is executed till completion, and cannot be interrupted (Such as FIFO).
    • The most common algorithms used to distribute the resources in an even matter, are:
      • FCFS (First come, first served)
        • The simplest algorithm, the tasks are added into a queue and processed in the order that they arrive.
      • SJF (Shortest job first)
        • The tasks are added into a priority queue, and after each scheduling event occurrence (a task is finished, a new task is added etc), a check is done whether the current running process have the lowest time until completion. If not, the current running process will be interrupted (preemption) and the task with the lowest time remaining will be scheduled to the CPU.
      • Fixed priority pre-emptive
        • The operating system assigns a priority for each task, and the scheduler adds the tasks into the ready queue based on their priority. Lower priority tasks gets interrupted by higher priority tasks.
        • Starvation of low priority tasks (the tasks are scheduled to run in the CPU after a long time waiting for other higher priority tasks to finish) can occur if there are continuous addition of high priority tasks into the queue. In order to avoid/solve that problem, the ageing technique is used, to gradually increase the priority of a task based on its waiting time in the ready queue.
      • Round-robin
        • The scheduler assigns a fixed time unit for each task and cycles through them.

Linux

  1. Processes vs threads [link, link, link]

    • A process is a container, it contains all the resources needed for the program to run.
    • A process has an address space and info of other resources ,such as open files, child processes, etc. It doesn’t run on its own, but only group resources together. Each process contains at least a thread of execution (main), that will be scheduled to run on a CPU.
    • A thread is actually doing the work for a process. Since the process contains a single address space, all the threads within a process share the existent data.
    • process
  2. Context switching: process-level vs thread-level

    • Context switch: the switching of the CPU from one process/thread to another. The switching is based on the operating system’s scheduler algorithm. The switching is done by saving the state of the active thread (stack, program counter), and loading the state of the new thread into the CPU.
    • Thread-level
      • The scheduler will assign the CPU to a different thread (but from the same process) . This is a much cheaper operation than process switching, mainly because the threads share the same memory space. The only things that need to be reinitialized will be the stack and the program counter for the new thread.
    • Process-level 
      • The scheduler will assign the CPU to a different thread (from a different process). This is a slow operation, because all the memory addresses, mappings, page tables etc have to be unloaded, and the ones from the new process will have to be loaded. Since the data for the new process is probably not in the cache, it also has to be loaded from a slower cache layer / main memory.
    • context switch
  3. Interprocess communication (IPC)

    • IPC is a set of mechanisms supported by some operating systems that allows the processes to communicate with each other.
    • Interprocess communication can be achieved by using:
      • local files
      • signals (System message, used to remotely control the other process)
      • pipes
        • unnamed pipe: Unnamed pipes are available only in the process that created them and its descendants. Used with exec / fork to maintain a communication channel between parent and child. Usually, the child closes the reading side of the pipe, while the parent closes the other side of the pipe (the writing). This way, the child can only write in the pipe, while the parent only reads from the child process.
        • named pipe: Used to communicate between two local processes with no relation between them. They are called local sockets, are implemented as FIFO, have names and exist as special files within the filesystem. After being created, the threads can read / write on them. By default, the open calls are blocking the thread until at least a reader and a writer are connected to the pipe.
      • sockets
        • Sockets are a bidirectional channel that support the communication between unrelated processes, even on different machines. The server configure them using three parameters, communication style, namespace, and protocol.
          • Communication style: Control how the socket treats transmitted data and specifies the number of communication partners. When data is sent through a socket, it is packed first into chunks called packets. Communication style can be:
            • Connection style: guarantees delivery of all the packets, in the order they were sent. If the packets are lost or reordered, the receiver automatically requests the redelivery of them.
            • Datagram style:  does not guarantee delivery or arrival order.
          • Namespace: Specifies how socket addresses are written. The clients  connect to the socket through the configured namespace.
          • Protocol: The protocol specifies how data is transmitted, such as TCP/IP.
      • message queue
        • Provides an asynchronous communications protocol using a queue for messaging.
        • The sender and the receiver of the message don’t need to connect simultaneously or be linked in any way. Messages placed on the queue are stored until the receiver retrieves them, but the number of messages that can be stored are limited.
      • shared memory
        • One process will create an area in RAM which other processes can access.
      • memory-mapped file
        • Forms an association between a file and a process’s memory.
        • A process can read the file’s contents using direct memory access, and can write to the file by directly modifying the memory.