Difference between revisions of "Code style"

From Apertium
Jump to navigation Jump to search
Line 18: Line 18:
 
* Use sprintf
 
* Use sprintf
 
* If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.
 
* If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.
  +
  +
=== String encoding ===
  +
  +
==== Situation ====
  +
  +
* There are lots of wstrings kept in memory about place. These are UTF-16 on Windows and UTF-32 on Linux.
  +
* There are also char* and strings which are UTF-8 encoded kept in memory in some places.
  +
* Mixing wide and narrow character streams is forbidden by the standard (http://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program) and also can also cause real problems in practice (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42552 https://sourceforge.net/p/apertium/tickets/106/).
  +
  +
==== Approach ====
  +
  +
* Use UTF-8 for serialising to files.
  +
* Use wcerr for outputting errors.
  +
* In terms of internal encodings, go with the flow for now.
  +
* You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout.
  +
* If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting.
  +
  +
==== Alternatives ====
  +
  +
It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/
   
 
=== I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib ===
 
=== I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib ===

Revision as of 10:38, 10 June 2016

C++

Which features/libraries to prefer/Semantics

New code should prefer modern C++ using C++03. Here, modern C++ is defined in opposition to "C with classes" style. (Note, there is quite a bit of existing code which would probably qualify as "C with classes".) In practice, for our purposes, This means:

Do

  • Use const and references where possible
  • Prefer C++ casts over C casts
  • Prefer the C++ stdlib over the C standard library
  • Prefer usage of smart pointers over manual memory management when stack/global allocation don't do the trick.
  • Prefer containers over home made data structures.

Don't

  • Use void*
  • Use sprintf
  • If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.

String encoding

Situation

Approach

  • Use UTF-8 for serialising to files.
  • Use wcerr for outputting errors.
  • In terms of internal encodings, go with the flow for now.
  • You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout.
  • If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting.

Alternatives

It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/

I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib

No.

Please?

No.

Why not?

It's going to make it impossible to build for language pair authors.

But if it's a dependency that's built in tree it's just a matter of getting the source there. It could be bootstrapped with a simple script or even (ick) vendorised into Subversion.

This doesn't make any difference. Please just implement whatever you need from scratch.

Formatting/Syntax

This is less important. Currently through the code base there are:

Most code is Sergio's style. See Emacs_C_style_for_Apertium_hacking for an older attempt at formalising it.

Wiki TODO: Document this below.

Possible project TODO: Run clang-format

m5w

  • loosely following the Clang standards for naming and such
  • strictly for indenting -- clang-format
  • default to Clang naming style, unless I'm talking about some kind of STL class or STL imitation
  • example of imitation is the serialiser class for pairs which are named as such: first_Type, second_Type so we can do can do pair.first and .second
  • keep the STL format for the part of the name that's STL
  • but if it's followed by something not STL, then I finish out with an underscore and then go to CamelCase
  • another divergence is with variables that don't have a whole lot of information e.g. something being passed to a serialiser: they get bland names such as Stream and SerialisedType
  • since those are already used as type names -- just add a trailing underscore

Python

PEP-8?