Code style

From Apertium
Revision as of 10:38, 10 June 2016 by Frankier (talk | contribs)
Jump to navigation Jump to search

C++

Which features/libraries to prefer/Semantics

New code should prefer modern C++ using C++03. Here, modern C++ is defined in opposition to "C with classes" style. (Note, there is quite a bit of existing code which would probably qualify as "C with classes".) In practice, for our purposes, This means:

Do

  • Use const and references where possible
  • Prefer C++ casts over C casts
  • Prefer the C++ stdlib over the C standard library
  • Prefer usage of smart pointers over manual memory management when stack/global allocation don't do the trick.
  • Prefer containers over home made data structures.

Don't

  • Use void*
  • Use sprintf
  • If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.

String encoding

Situation

Approach

  • Use UTF-8 for serialising to files.
  • Use wcerr for outputting errors.
  • In terms of internal encodings, go with the flow for now.
  • You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout.
  • If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting.

Alternatives

It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/

I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib

No.

Please?

No.

Why not?

It's going to make it impossible to build for language pair authors.

But if it's a dependency that's built in tree it's just a matter of getting the source there. It could be bootstrapped with a simple script or even (ick) vendorised into Subversion.

This doesn't make any difference. Please just implement whatever you need from scratch.

Formatting/Syntax

This is less important. Currently through the code base there are:

Most code is Sergio's style. See Emacs_C_style_for_Apertium_hacking for an older attempt at formalising it.

Wiki TODO: Document this below.

Possible project TODO: Run clang-format

m5w

  • loosely following the Clang standards for naming and such
  • strictly for indenting -- clang-format
  • default to Clang naming style, unless I'm talking about some kind of STL class or STL imitation
  • example of imitation is the serialiser class for pairs which are named as such: first_Type, second_Type so we can do can do pair.first and .second
  • keep the STL format for the part of the name that's STL
  • but if it's followed by something not STL, then I finish out with an underscore and then go to CamelCase
  • another divergence is with variables that don't have a whole lot of information e.g. something being passed to a serialiser: they get bland names such as Stream and SerialisedType
  • since those are already used as type names -- just add a trailing underscore

Python

PEP-8?