![]() I tried to reproduce with a smaller data set (10 lines before the two lines that get merged, the two lines that gets merged and the 10 lines after that) but I was not able to reproduce on this smaller data set. Also the merging happens more frequently towards the end of the file it seems. Sometimes two lines are merged into one and rerunning the test script that reads the file it's always the same lines that are merged. I have the same problem with a huge file (8GB) containing long lines. Well, with Andy saying he can't reproduce the problem I am going to close as invalid.Īndy, if you ever happen to be able to upload data that triggers it, then please re-open this bug. The only thing that bothers me is that Java somehow is not affected by this bad data. It may be an EBCDIC/ASCII conversion or possibly something to do with the mainframe to PC transfer. It originates on a Unix box, then goes to an IBM mainframe, then to my Windows machine and through many updates along the way. The data that triggers this bug is transferred several time before it gets to me. Using a hex editor I find no problem with the line breaks. I have had no luck creating random data to reproduce the problem which leaves me to come to the conclusion that it was the data itself. Using something like fp.read(8192) I'm sure might temporarily solve my problem but I will keep working on getting a file I can upload. When looking at the file in a hex editor everything looks fine and a small Java program using a buffered reader will give me the correct line count when Python does not. Each file that this has happened to is a fixed block file of either 6990 or 7700 bytes wide but this I think is insignificant. ![]() I have also had fileinput.input(fileList) compound the problem. I am using open() for reading the file, no other features. codecs.EncodedFile etc.) or are you using plain open() for reading the file? ![]() Good luck.Īre you using any of the unicode reading features (i.e. Have you tried reading the file with fp.read(8192) or similar? Hopefully you're able to reproduce the bug with scrubbed data (because I couldn't construct random data to do so). Perusing the file source code found a detailed discussion of fgets vs fgetc for finding the next line in the file. ![]() (e.g., have the right amount of rows with a line length that seemingly was the right line. None of these files failed to read properly. I don't know if this helps: I spent the last little while creating / reading random files that all (seemingly) matched the description you gave us. How wide are the min and max widths of the lines? This problem is of particular interest to me. I will hopefully have a file to send soon. The total file is 888420 lines and this happens in four spots. The line break is 0d0a (same as the others) where the bug happens so I am wondering if it is a buffer issue where the linebreak falls at the edge, however no other characters are ever missed. ![]() In my data line 617391 in a fixed block file of 6990 bytes wide gets read in with the next line after it. I can not upload the files that trigger this because of the data that is in them but I am working on getting around that. Windows (5, 1, 2600, 2, 'Service Pack 2')ĭo you happen to have a sample you could upload that triggers the bug? I have not noticed any other characters being skipped, only the line break. When a newline is skipped it is usually followed by several more in the next few hundred lines. The problem is even worse when using the fileinput module and reading in five or six huge files consisting of 4.8 million records causes several hundred pairs of lines to be read as single lines. When processing huge fixed block files of about 7000 bytes wide and several hundred thousand lines long some pairs of lines get read as one long line with no line break when using "for line in file:". Amonthei, brett.cannon, doerwalter, mark-roberts, runedevikĬreated on 16:56 by amonthei, last changed 14:56 by admin. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |