formater: add support for double-quoted fields

Part of #91.

* spot/misc/formater.cc, spot/misc/formater.hh: Here.
* bin/common_output.cc: Adjust automatic output format.
* doc/org/csv.org: Adjust.
* tests/core/lbt.test, tests/core/ltlfilt.test: More tests.
* NEWS: Mention the changes.
This commit is contained in:
Alexandre Duret-Lutz 2016-08-08 10:13:26 +02:00
parent 6ed0830f87
commit 0d753048ce
7 changed files with 97 additions and 28 deletions

15
NEWS
View file

@ -94,6 +94,19 @@ New in spot 2.0.3a (not yet released)
* genltl learned two options, --positive and --negative, to control * genltl learned two options, --positive and --negative, to control
wether formulas should be output after negation or not (or both). wether formulas should be output after negation or not (or both).
* The formater used by --format (for ltlfilt, ltlgrind, genltl,
randltl) or --stats (for autfilt, dstar2tgba, ltl2tgba, ltldo,
randaut) learned to recognize double-quoted fields and double the
double-quotes output inbetween as expected from RFC4180-compliant
CSV files. For instance
ltl2tgba -f 'a U "b+c"' --stats='"%f",%s'
will output
"a U ""b+c""",2
* The --csv-escape option of genltl, ltlfilt, ltlgrind, and randltl
is now deprecated. The option is still here, but hidden and
undocumented.
* Arguments passed to -x (in ltl2tgba, ltl2tgta, autfilt, dstar2tgba) * Arguments passed to -x (in ltl2tgba, ltl2tgta, autfilt, dstar2tgba)
that are not used are now reported as they might be typos. that are not used are now reported as they might be typos.
This ocurred a couple of times in our test-suite. A similar This ocurred a couple of times in our test-suite. A similar
@ -1696,7 +1709,7 @@ New in spot 1.2.1 (2013-12-11)
columns before the formula) and %> (text after) can be used columns before the formula) and %> (text after) can be used
with the --format option to alter this output. with the --format option to alter this output.
- ltlfile, genltl, randltl, and ltl2tgba have a --csv-escape option - ltlfilt, genltl, randltl, and ltl2tgba have a --csv-escape option
to help escape formulas in CSV files. to help escape formulas in CSV files.
- Please check - Please check

View file

@ -50,7 +50,9 @@ static const argp_option options[] =
{ "wring", OPT_WRING, nullptr, 0, "output in Wring's syntax", -20 }, { "wring", OPT_WRING, nullptr, 0, "output in Wring's syntax", -20 },
{ "utf8", '8', nullptr, 0, "output using UTF-8 characters", -20 }, { "utf8", '8', nullptr, 0, "output using UTF-8 characters", -20 },
{ "latex", OPT_LATEX, nullptr, 0, "output using LaTeX macros", -20 }, { "latex", OPT_LATEX, nullptr, 0, "output using LaTeX macros", -20 },
{ "csv-escape", OPT_CSV, nullptr, 0, // --csv-escape was deprecated in Spot 2.1, we can remove it at
// some point
{ "csv-escape", OPT_CSV, nullptr, OPTION_HIDDEN,
"quote the formula for use in a CSV file", -20 }, "quote the formula for use in a CSV file", -20 },
{ "format", OPT_FORMAT, "FORMAT", 0, { "format", OPT_FORMAT, "FORMAT", 0,
"specify how each line should be output (default: \"%f\")", -20 }, "specify how each line should be output (default: \"%f\")", -20 },
@ -268,7 +270,30 @@ output_formula(std::ostream& out,
{ {
if (prefix) if (prefix)
out << prefix << ','; out << prefix << ',';
// For backward compatibility, we still run
// stream_escapable_formula when --csv-escape has been given.
// But eventually --csv-escape should be removed, and the
// formula printed raw.
if ((prefix || suffix) && !escape_csv)
{
std::ostringstream tmp;
stream_formula(tmp, f, filename, linenum);
std::string tmpstr = tmp.str();
if (tmpstr.find_first_of("\",") != std::string::npos)
{
out << '"';
spot::escape_rfc4180(out, tmpstr);
out << '"';
}
else
{
out << tmpstr;
}
}
else
{
stream_escapable_formula(out, f, filename, linenum); stream_escapable_formula(out, f, filename, linenum);
}
if (suffix) if (suffix)
out << ',' << suffix; out << ',' << suffix;
} }

View file

@ -74,17 +74,18 @@ ltl2tgba -f Xa -f 'G("switch == on" -> F"tab[3,5] < 12")' --stats '%f,%s,%e'
The second line of this input does no conform to [[https://www.rfc-editor.org/rfc/rfc4180.txt][RFC 4180]] because The second line of this input does no conform to [[https://www.rfc-editor.org/rfc/rfc4180.txt][RFC 4180]] because
non-escaped fields are not allowed to contain comma or double-quotes. non-escaped fields are not allowed to contain comma or double-quotes.
To fix this, use =ltl2tgba='s =--csv-escape= option: this causes To fix this, simply double-quote the =%f= in the argument to =--stats=:
"=%f=" to produce a double-quoted string properly escaped.
#+BEGIN_SRC sh :results verbatim :exports both #+BEGIN_SRC sh :results verbatim :exports both
ltl2tgba -f Xa -f 'G("switch == on" -> F"tab[3,5] < 12")' --stats '%f,%s,%e' --csv-escape ltl2tgba -f Xa -f 'G("switch == on" -> F"tab[3,5] < 12")' --stats '"%f",%s,%e'
#+END_SRC #+END_SRC
#+RESULTS: #+RESULTS:
: "Xa",3,3 : "Xa",3,3
: "G(!""switch == on"" | F""tab[3,5] < 12"")",2,4 : "G(!""switch == on"" | F""tab[3,5] < 12"")",2,4
The formater will detect your double-quote and automatically double
any double quote output between them, as per [[https://www.rfc-editor.org/rfc/rfc4180.txt][RFC 4180]].
The tool [[file:ltlcross.org][=ltlcross=]] has its own =--csv=FILENAME= option to format the The tool [[file:ltlcross.org][=ltlcross=]] has its own =--csv=FILENAME= option to format the
statistics it gathers in a CSV file, but you have very little control statistics it gathers in a CSV file, but you have very little control
@ -150,18 +151,6 @@ ltlfilt -F gen.csv/3 --size-min=8 --relabel=abc
: and-gf,5,GFa & GFb & GFc & GFd & GFe : and-gf,5,GFa & GFb & GFc & GFd & GFe
: u-left,5,(((a U b) U c) U d) U e : u-left,5,(((a U b) U c) U d) U e
For security, in case a formula may contain double-quotes or
commas, you should use the =--csv-escape= option:
#+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F gen.csv/3 --size-min=8 --relabel=abc --csv-escape
#+END_SRC
#+RESULTS:
: and-gf,3,"GFa & GFb & GFc"
: and-gf,4,"GFa & GFb & GFc & GFd"
: and-gf,5,"GFa & GFb & GFc & GFd & GFe"
: u-left,5,"(((a U b) U c) U d) U e"
The preservation in the output of the text before and after the The preservation in the output of the text before and after the
selected column can be altered using the =--format= option. The =%<= selected column can be altered using the =--format= option. The =%<=
escape sequence represent the (comma-separated) data of all the escape sequence represent the (comma-separated) data of all the
@ -173,7 +162,7 @@ string.
For instance this moves the first two columns after the formulas. For instance this moves the first two columns after the formulas.
#+BEGIN_SRC sh :results verbatim :exports both #+BEGIN_SRC sh :results verbatim :exports both
ltlfilt -F gen.csv/3 --size-min=8 --csv-escape --format='%f,%<' ltlfilt -F gen.csv/3 --size-min=8 --format='"%f",%<'
#+END_SRC #+END_SRC
#+RESULTS: #+RESULTS:
: "GFp1 & GFp2 & GFp3",and-gf,3 : "GFp1 & GFp2 & GFp3",and-gf,3
@ -181,14 +170,19 @@ ltlfilt -F gen.csv/3 --size-min=8 --csv-escape --format='%f,%<'
: "GFp1 & GFp2 & GFp3 & GFp4 & GFp5",and-gf,5 : "GFp1 & GFp2 & GFp3 & GFp4 & GFp5",and-gf,5
: "(((p1 U p2) U p3) U p4) U p5",u-left,5 : "(((p1 U p2) U p3) U p4) U p5",u-left,5
Note that if the =--format= option is not specified, the default
format is one of: =%f=, =%<,%f=, =%f,%>=, or =%<,%f,%>= depending on
whether the input CSV had column before and after the selected one.
Furthermore, the formula field is automatically double-quoted if the
formula actually use double-quotes, and the input CSV file had more
than one column.
Typical uses of =ltlfilt= on CSV file include: Typical uses of =ltlfilt= on CSV file include:
- Filtering lines based on an LTL criterion, as above. - Filtering lines based on an LTL criterion, as above.
- Changing the syntax of LTL formulas. For instance =ltl2tgba='s - Changing the syntax of LTL formulas. For instance =ltl2tgba='s
=--stats= option, and =ltlcross='s =--csv= option always output =--stats= option, and =ltlcross='s =--csv= option always output
formulas in Spot's format. If that is inappropriate, simply formulas in Spot's format. If that is inappropriate, simply use
use =ltlfilt= to rewrite the relevant column in your prefered =ltlfilt= to rewrite the relevant column in your preferred syntax.
syntax.
* Dealing with header lines * Dealing with header lines

View file

@ -19,8 +19,10 @@
#include "config.h" #include "config.h"
#include <spot/misc/formater.hh> #include <spot/misc/formater.hh>
#include <spot/misc/escape.hh>
#include <iostream> #include <iostream>
#include <sstream> #include <sstream>
#include <cstring>
namespace spot namespace spot
{ {
@ -68,6 +70,20 @@ namespace spot
formater::format(const char* fmt) formater::format(const char* fmt)
{ {
for (const char* pos = fmt; *pos; ++pos) for (const char* pos = fmt; *pos; ++pos)
{
if (*pos == '"')
{
*output_ << '"';
const char* end = strchr(pos + 1, '"');
if (!end)
continue;
std::string tmp(pos + 1, end - (pos + 1));
std::ostringstream os;
format(os, tmp);
escape_rfc4180(*output_, os.str());
pos = end;
// the end double-quote will be printed below
}
if (*pos != '%') if (*pos != '%')
{ {
*output_ << *pos; *output_ << *pos;
@ -95,6 +111,7 @@ namespace spot
break; break;
pos = next; pos = next;
} }
}
return *output_; return *output_;
} }
} }

View file

@ -173,8 +173,11 @@ namespace spot
std::ostream& std::ostream&
format(std::ostream& output, const char* fmt) format(std::ostream& output, const char* fmt)
{ {
std::ostream* tmp = output_;
set_output(output); set_output(output);
return format(fmt); format(fmt);
set_output(*tmp);
return output;
} }
/// Expand the %-sequences in \a fmt, write the result on \a output_. /// Expand the %-sequences in \a fmt, write the result on \a output_.

View file

@ -1,6 +1,6 @@
#!/bin/sh #!/bin/sh
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
# Copyright (C) 2013 Laboratoire de Recherche et # Copyright (C) 2013, 2016 Laboratoire de Recherche et
# Développement de l'Epita (LRDE). # Développement de l'Epita (LRDE).
# #
# This file is part of Spot, a model checking library. # This file is part of Spot, a model checking library.
@ -101,9 +101,16 @@ test `wc -l < formulas.2` -eq 168
test `wc -l < formulas.3` -eq 168 test `wc -l < formulas.3` -eq 168
test `wc -l < formulas.4` -eq 168 test `wc -l < formulas.4` -eq 168
# The --csv-escape option is now obsolete and replaced by double
# quotes in the format string. So eventually the first two lines
# should disappear.
run 0 $ltlfilt formulas.2 --csv-escape --format='%L,%f' > formulas.5 run 0 $ltlfilt formulas.2 --csv-escape --format='%L,%f' > formulas.5
run 0 $ltlfilt formulas.5/2 --csv-escape --format='%L,%f' > formulas.6 run 0 $ltlfilt formulas.5/2 --csv-escape --format='%L,%f' > formulas.6
run 0 $ltlfilt formulas.2 --format='%L,"%f"' > formulas.5a
run 0 $ltlfilt formulas.5/2 --format='%L,"%f"' > formulas.6a
cmp formulas.5 formulas.6 cmp formulas.5 formulas.6
cmp formulas.5 formulas.5a
cmp formulas.5a formulas.6a
# Make sure ltl2dstar-style litterals always get quoted. # Make sure ltl2dstar-style litterals always get quoted.
test "`$ltlfilt -l --lbt-input -f 'G F a'`" = 'G F "a"' test "`$ltlfilt -l --lbt-input -f 'G F a'`" = 'G F "a"'

View file

@ -348,4 +348,14 @@ test "`cat out`" = 'G F "a\"\\b"'
$ltlfilt --size=foo=2..3 2>stderr && exit 1 $ltlfilt --size=foo=2..3 2>stderr && exit 1
grep 'invalid range.*should start with' stderr grep 'invalid range.*should start with' stderr
cat >out <<EOF
test1,"""a b"" U c"
"""a b"" U c",test2
EOF
ltlfilt out/1 > out.1
ltlfilt out/2 > out.2
diff out out.1
diff out out.2
true true