Linear vector-oriented programming

* Note

The methods described here aren’t exacly the same as the array-based programming used e.g. in MATLAB, Ada. It’s a derivative of data-oriented programming (or a superset of dod). I just couldn’t think of a better name. The methods described here are simmilar to array- and flow-based design.

* Introduction

Computer don’t operate on objects. They don’t operate on pointers. They don’t operate on functions. In fact, computers don’t even operate on floats, integers or strings. The words ‘float’, ‘integer’ or ‘string’ are just a way we try to visualize, or comprehed data. Which is what computers operate on – data. “What was the last time you had only one of something? One object, oneentity, one enemy, one treasure chest?” – This is how data-oriented design is usually described. Computers don’t operate on units, they operate on whole data.

But data-oriented programming is hard to visualize. It’s hard to code and even people who are experienced with large data flows admit it.

Linear vector-oriented programming is my solution to that problem. It is my way of visualizing data-oriented design and my way of coding it.

* Vector

What’s a vector? A vector is simply a way to process data. It’s not an explicit structure containing some data, it’s a way to comprehend that data. A vector has some length, a size (which is the size of a single data unit, e.g. a dword) and the data itself. The length of a vector can change, but it’s size remains constant. The data is completely mutable. The data in a vector should be dynamically allocated and reallocated whenever its size changes. However, it is advised to pre-allocate enough memory whenever we know the size of the needed memory to avoid unnecessary memory-access overhead and moving the data to a different physical location.

* Flow

What is flow? The word ‘flow’ is used here to annotate the execution of a vector-oriented program. It describes how data is processed, ie. not as single operations on seperate objects, but as some input data flowing through each step of a pipeline, where every step also produces some output data.

For visualisation purposes:

OOP is like trees. It has many branches. Some branches branch off of other branches. OO processing is like plucking fruits off of that tree. There are many and they’re all scattered everywhere.

DOD is like a river. It flows in one direction from a certain point. But noone cares about how it flows. Everyone is concerned about the water that flows in it. Some of it comes from different sources than the rest and some of it flows into other destinations than the rest. In fact, only the water and its contents matter. Everything is fully dependent on that. The content of the river may change in some form.

LVOP is like a pipe with multiple filters that in some way change the contents of the pipe. They may even take out all of it’s content and replace it with something else. That pipe has a beginning and an end. It receives some in-material and returns something different. It doesn’t receive anything from extern sources and doesn’t leak anywhere.

* Language

Is there a language that supports all of this in its standard?

Yes, actually. For one, you can use any scalar language, like C, C++ or Java. Those languages require explicit loops to perform operations on vectors. However, there are some languages that allow for arithmetic operations on arrays. Those languages include: Ada, Fortran, MATLAB (or GNU Octave). There is also the NumPy extention to Python. So array-languages should be better for linear-vector programming, right? In theory, yes. In theory the code will be easier to maintain and write. But that code must then be compiled to machine code, that your machine can actually understand. Sadly, no machine supports real array-based instructions. So your fancy array-code will be translated to scalar expressions with loops anyways. But due to the additional abstraction layer it will likely be slower than optimized C code. The only exception here are SSE instructions, which introduce arithmetic operations on 4-dimensional float (and integer, in SSE2) vectors. Those are extremely useful and the performance boost they give may be stunning, but they only work for 4d vectors and they can be used by both scalar and array languages.

So the language is a matter of preference, as long as the language is compliant to the rules of dod. What I use is C. C has incredibly powerful memory-management and allows for many optimizations. Most modern compilers will produce highly optimized machine code. Some C compilers (like gcc) also have intrinsics directly related to the SSE instructions (see xmmintrin.h).

* Pros

  • cache-friendly
  • faster than OOP for applications with a large amount of data
  • no scattered memory
  • easy-to-implement pipelining
  • easy-to-implement multithreading

* Cons

  • slower than OOP for applications with only singular units of data
  • when using complicated units, their data might be scattered around the memory (ie. when using an entity system)
  • inheritence may be hard or impossible to implement
    • same applies for other OOP paradigms

* Summary

  • same as dod, ie
  • plain old data
    • avoid unnecessary pointers
    • only plain old variables
    • data tightly packed in arrays
  • all data should be tightly packed
  • math is fast, optimize for memory usage
    • avoid unnecessary memory reads and writes
  • don’t access data one by one when accessing it in chunks is possible
  • additionally
    • avoid branching
      • a program should have some input and some output contained in a linear pipeline
      • a procedure shouldn’t access data it’s not supposed to
      • a procedure definitely shouldn’t create data it’s not supposed to
      • every procedure and pipeline should have it’s own purpose and execute it well
        • so never create and never use clusterfucks
      • using if’s isn’t bad
        • it’s in fact better than inheritance based conditionals
      • use SIMD if possible and efficient
      • OOP can coexists with dod, but not with LVOP
    • always KISS/don’t overcomplicate

 

* Links

http://www.dataorienteddesign.com/dodmain/
https://en.wikipedia.org/wiki/Data-oriented_design
https://blog.molecular-matters.com/2011/11/03/adventures-in-data-oriented-design-part-1-mesh-data-3/
https://www.youtube.com/watch?v=rX0ItVEVjHc
https://www.youtube.com/watch?v=fHNmRkzxHWs

Leave a comment