Bug 6884

Summary: the sed applet behaves erraticaly when script from file (option -f)
Product: Busybox Reporter: clu <clu>
Component: Standard ComplianceAssignee: unassigned
Status: RESOLVED INVALID    
Severity: normal CC: busybox-cvs
Priority: P5    
Version: 1.22.x   
Target Milestone: ---   
Hardware: PC   
OS: Windows   
Host: Target:
Build:
Attachments: tarball of the web extraction used as sample to reproduce the bug

Description clu 2014-02-14 16:57:31 UTC
Created attachment 5240 [details]
tarball of the web extraction used as sample to reproduce the bug

context (used as bash for the shell and sed for the operations) :
BusyBox v1.23.0-TIG-1381-g883817e (2014-01-21 14:06:02 GMT) multi-call binary.

i perform a succession of transformations with bb's sed on a given file (see attached for a sample).

1. on the command line :
    $ cat samp.htm | sed -r "/sens=achat/! d; s/^[^<]*<//g; s/<tr /\n&/g" | sed -r "s/.*symbole=([^']*)' title=([^>]*)[^<]*<\/a><\/td>/\"\1\";\2;/g"
  works as expected.
  
2. and i want to do the same via a sed file :
    $ cat samp.htm | sed -rf f2.sed
where f2. contains the following (without the line numbering):
  1-      /sens=achat/! d
  2-      s/^[^<]*<//g; s/<tr /\n&/g
  3-      s/.*symbole=([^']*)' title=([^>]*)[^<]*<\/a><\/td>/\"\1\";\2;/g
but it seems that the script doesn't go beyond line 2 (as if, for the cli version, the last pipe was skipped!)

For the story this is a web extraction from boursorama.com (collected with bb's wget) which 
i'm transforming into csv (with ";" as separator -french version...-)

I am available for further tests on my "win XP pro" platform
thank you for your concern.
clu
20140214
Comment 1 Denys Vlasenko 2014-02-21 13:38:12 UTC
(In reply to comment #0)
> Created attachment 5240 [details]
> tarball of the web extraction used as sample to reproduce the bug
> 
> context (used as bash for the shell and sed for the operations) :
> BusyBox v1.23.0-TIG-1381-g883817e (2014-01-21 14:06:02 GMT) multi-call binary.
> 
> i perform a succession of transformations with bb's sed on a given file (see
> attached for a sample).
> 
> 1. on the command line :
>     $ cat samp.htm | sed -r "/sens=achat/! d; s/^[^<]*<//g; s/<tr /\n&/g" | sed
> -r "s/.*symbole=([^']*)' title=([^>]*)[^<]*<\/a><\/td>/\"\1\";\2;/g"
>   works as expected.
> 
> 2. and i want to do the same via a sed file :
>     $ cat samp.htm | sed -rf f2.sed
> where f2. contains the following (without the line numbering):
>   1-      /sens=achat/! d
>   2-      s/^[^<]*<//g; s/<tr /\n&/g
>   3-      s/.*symbole=([^']*)' title=([^>]*)[^<]*<\/a><\/td>/\"\1\";\2;/g
> but it seems that the script doesn't go beyond line 2 (as if, for the cli
> version, the last pipe was skipped!)

(You need to stop escaping \" in the file version. IOW: \"\1\"  should be  "\1"  there).

For me the script doesn't work as if line 3 is missing. The output is different from that too.

However: without going deeper into what the scripts are doing, I tried
GNU sed version 4.2.1 on them too, and it and busybox sed gave exactly
the same results on both your testcases.

Moreover, 

sed -r \
  -e "/sens=achat/! d; s/^[^<]*<//g; s/<tr /\n&/g" \
  -e "s/.*symbole=([^']*)' title=([^>]*)[^<]*<\/a><\/td>/\"\1\";\2;/g"

that is, single process instead of two seds piped, works the same as the version with -f FILE.

I guess you are inserting \n chars to break lines and expect sed to detect those as new lines to operate one. This doesn't work.
Comment 2 clu 2014-02-21 18:50:05 UTC
  Thank you for taking care of my problem.
  
> (You need to stop escaping \" in the file version. IOW: \"\1\"  should be  "\1" there).
  My bad, i didn't notice... you totally right

> For me the script doesn't work as if line 3 is missing. The output is different from that too.
  That is precisely my point but ...
...
> I guess you are inserting \n chars to break lines and expect sed to detect those as new lines to operate one. This doesn't work.
  I understand here that, the 3 operations are realized on the same original file while for the piped version last two operations
  are made on the "piped results" of the previous ones...
  thus even :
  sed -r -e "/sens=achat/! d; s/^[^<]*<//g; s/<tr /\n&/g" 
  is different from :
  sed -r -e "/sens=achat/! d" | sed -r -e "s/^[^<]*<//g; s/<tr /\n&/g"
  because in the first case the substitutions would also be done on the (later) deleted lines
  whereas in the second case they would only be applied to the remaining lines, after deletion.
  
  Am i understanding right?
  If so, that would explain to me why the commands can't be gathered in a single sed script file.
  
  Again, many thank for your help.