Difference between revisions of "Code style"
Line 18: | Line 18: | ||
* Use sprintf |
* Use sprintf |
||
* If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer. |
* If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer. |
||
=== String encoding === |
|||
==== Situation ==== |
|||
* There are lots of wstrings kept in memory about place. These are UTF-16 on Windows and UTF-32 on Linux. |
|||
* There are also char* and strings which are UTF-8 encoded kept in memory in some places. |
|||
* Mixing wide and narrow character streams is forbidden by the standard (http://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program) and also can also cause real problems in practice (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42552 https://sourceforge.net/p/apertium/tickets/106/). |
|||
==== Approach ==== |
|||
* Use UTF-8 for serialising to files. |
|||
* Use wcerr for outputting errors. |
|||
* In terms of internal encodings, go with the flow for now. |
|||
* You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout. |
|||
* If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting. |
|||
==== Alternatives ==== |
|||
It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/ |
|||
=== I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib === |
=== I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib === |
Revision as of 10:38, 10 June 2016
Contents
C++
Which features/libraries to prefer/Semantics
New code should prefer modern C++ using C++03. Here, modern C++ is defined in opposition to "C with classes" style. (Note, there is quite a bit of existing code which would probably qualify as "C with classes".) In practice, for our purposes, This means:
Do
- Use const and references where possible
- Prefer C++ casts over C casts
- Prefer the C++ stdlib over the C standard library
- Prefer usage of smart pointers over manual memory management when stack/global allocation don't do the trick.
- Prefer containers over home made data structures.
Don't
- Use void*
- Use sprintf
- If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.
String encoding
Situation
- There are lots of wstrings kept in memory about place. These are UTF-16 on Windows and UTF-32 on Linux.
- There are also char* and strings which are UTF-8 encoded kept in memory in some places.
- Mixing wide and narrow character streams is forbidden by the standard (http://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program) and also can also cause real problems in practice (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42552 https://sourceforge.net/p/apertium/tickets/106/).
Approach
- Use UTF-8 for serialising to files.
- Use wcerr for outputting errors.
- In terms of internal encodings, go with the flow for now.
- You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout.
- If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting.
Alternatives
It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/
I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib
No.
Please?
No.
Why not?
It's going to make it impossible to build for language pair authors.
But if it's a dependency that's built in tree it's just a matter of getting the source there. It could be bootstrapped with a simple script or even (ick) vendorised into Subversion.
This doesn't make any difference. Please just implement whatever you need from scratch.
Formatting/Syntax
This is less important. Currently through the code base there are:
- Sergio's style; e.g. fst_processor.cc
- m5w's style; e.g. basic_stream_tagger_trainer.cc
- felipe's style; e.g. apertium_tagger.cc
Most code is Sergio's style. See Emacs_C_style_for_Apertium_hacking for an older attempt at formalising it.
Wiki TODO: Document this below.
Possible project TODO: Run clang-format
m5w
- loosely following the Clang standards for naming and such
- strictly for indenting -- clang-format
- default to Clang naming style, unless I'm talking about some kind of STL class or STL imitation
- example of imitation is the serialiser class for pairs which are named as such: first_Type, second_Type so we can do can do pair.first and .second
- keep the STL format for the part of the name that's STL
- but if it's followed by something not STL, then I finish out with an underscore and then go to CamelCase
- another divergence is with variables that don't have a whole lot of information e.g. something being passed to a serialiser: they get bland names such as Stream and SerialisedType
- since those are already used as type names -- just add a trailing underscore
Python
PEP-8?