Revision as of 10:38, 10 June 2016

C++

Which features/libraries to prefer/Semantics

New code should prefer modern C++ using C++03. Here, modern C++ is defined in opposition to "C with classes" style. (Note, there is quite a bit of existing code which would probably qualify as "C with classes".) In practice, for our purposes, This means:

Do

Use const and references where possible
Prefer C++ casts over C casts
Prefer the C++ stdlib over the C standard library
Prefer usage of smart pointers over manual memory management when stack/global allocation don't do the trick.
Prefer containers over home made data structures.

Don't

Use void*
Use sprintf
If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.

String encoding

Situation

There are lots of wstrings kept in memory about place. These are UTF-16 on Windows and UTF-32 on Linux.
There are also char* and strings which are UTF-8 encoded kept in memory in some places.
Mixing wide and narrow character streams is forbidden by the standard (http://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program) and also can also cause real problems in practice (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42552 https://sourceforge.net/p/apertium/tickets/106/).

Approach

Use UTF-8 for serialising to files.
Use wcerr for outputting errors.
In terms of internal encodings, go with the flow for now.
You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout.
If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting.

Alternatives

It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/

I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib

No.

Please?

No.

Why not?

It's going to make it impossible to build for language pair authors.

But if it's a dependency that's built in tree it's just a matter of getting the source there. It could be bootstrapped with a simple script or even (ick) vendorised into Subversion.

This doesn't make any difference. Please just implement whatever you need from scratch.

Formatting/Syntax

This is less important. Currently through the code base there are:

Sergio's style; e.g. fst_processor.cc
m5w's style; e.g. basic_stream_tagger_trainer.cc
felipe's style; e.g. apertium_tagger.cc

Most code is Sergio's style. See Emacs_C_style_for_Apertium_hacking for an older attempt at formalising it.

Wiki TODO: Document this below.

Possible project TODO: Run clang-format

m5w

loosely following the Clang standards for naming and such
strictly for indenting -- clang-format
default to Clang naming style, unless I'm talking about some kind of STL class or STL imitation
example of imitation is the serialiser class for pairs which are named as such: first_Type, second_Type so we can do can do pair.first and .second
keep the STL format for the part of the name that's STL
but if it's followed by something not STL, then I finish out with an underscore and then go to CamelCase
another divergence is with variables that don't have a whole lot of information e.g. something being passed to a serialiser: they get bland names such as Stream and SerialisedType
since those are already used as type names -- just add a trailing underscore

Python

PEP-8?

@@ Line 18: / Line 18: @@
 * Use sprintf
 * If the pointer lifetime is the same as the object it refers to. Don't use a raw pointer.
+=== String encoding ===
+==== Situation ====
+* There are lots of wstrings kept in memory about place. These are UTF-16 on Windows and UTF-32 on Linux.
+* There are also char* and strings which are UTF-8 encoded kept in memory in some places.
+* Mixing wide and narrow character streams is forbidden by the standard (http://stackoverflow.com/questions/8947949/mixing-cout-and-wcout-in-same-program) and also can also cause real problems in practice (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42552 https://sourceforge.net/p/apertium/tickets/106/).
+==== Approach ====
+* Use UTF-8 for serialising to files.
+* Use wcerr for outputting errors.
+* In terms of internal encodings, go with the flow for now.
+* You might have to use either wcout or cout depending on the situation but they shouldn't ever mix within the execution of a program. To help with this, follow the following rule: If you're expecting the program's output to be piped - don't output to wcout at all, just cout. If you're expecting the the output to end up on a terminal, use wcout.
+* If you have a UTF-8 string rather than an 7-bit clean ANSI string, you should UtfConverter::fromUtf8 it before outputting.
+==== Alternatives ====
+It might be nice in future to use utf-8 everywhere inside Apertium and re-encode strings only when necessary at API boundaries. (Eg for stdio, re-encode to wstring and use wcout/wcerr only in environments with non-utf8 locale otherwise use cout/cerr). This could then be wrapped in a thin portable abstraction. There's some information about this way of doing things here: http://utf8everywhere.org/
 === I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib ===

Difference between revisions of "Code style"

Revision as of 10:38, 10 June 2016

Contents

C++

Which features/libraries to prefer/Semantics

Do

Don't

String encoding

Situation

Approach

Alternatives

I'd like to use $LIB eg Boost:Wurble or remove ifdefs by using eg gnulib

Please?

Why not?

But if it's a dependency that's built in tree it's just a matter of getting the source there. It could be bootstrapped with a simple script or even (ick) vendorised into Subversion.

Formatting/Syntax

m5w

Python

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools