[Micronet] Deleting blank columns in a tab-delimited report?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Micronet] Deleting blank columns in a tab-delimited report?

Ian Crew
Hi all:

I've got a large report (tab-delimited text, and therefore easy to suck into Excel if necessary) that includes a lot of blank columns where every cell except the header cell is empty.  

Does anyone know of an easy way to delete or hide those blank columns, while ensuring that the correct header stays with the correct column?  My google mojo is failing me today.  

I'm not particularly fussy about tool or method (Excel 2011, perl, unix script, whatever), nor does it have to be fully automated--this is the sort of thing I'll probably be doing less than 20 times ever.  What I really don't want to have to do is manually look through 250+ columns x 900+ rows to try to determine which columns are empty.  Also, since I'm not going to do it that many times, I'm hoping that it wouldn't take a huge amount of set-up time to get whatever solution working...

Any tips?

Thanks!

Ian

___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Deleting blank columns in a tab-delimited report?

Ian Crew
Hi Ryan:

Copying the whole list, because others might be similarly confused.  You've made me realize that a screenshot of the file imported into Excel might be the best illustration here:


As you can see, many of the columns are blank (like columns BK, BL, BO, BP, etc. in the screenshot; and yes, those are represented in the text file as x<tab><tab>z), and others are sparsely populated (like columns BM and BN in the screenshot), but all have a column header in row 1.  I'm interested in getting rid of the columns that are truly blank, but being sure to keep the ones that are just sparsely populated.

I've found the Excel "COUNTA" function with feels like it might come close (something like "IF COUNTA(BY2:BY1000) == 0, hide this column"), but I haven't fully figured that out yet.  More unix-y, command-line-y, solutions are definitely welcome.... 

Thanks,

Ian

On May 7, 2014, at 5:57 PM, Ryan Lovett <[hidden email]> wrote:

Hi Ian,

I'm coming at this from the unix/CLI perspective...

Isn't the notion of deleting a blank value a no-op? Or are your blank values actually white space of some sort? Given

x<tab><tab>z

The middle column is blank; it has been deleted (or was never there). If the blank values are only found at the end of a row, then I can see how deleting the tab would delete the empty column.

Just trying to understand the problem. Deleting something that is empty has the ring of a tree falling down -- the values at the end of the row are people who can hear it.

Ryan


Ryan


On Wed, May 7, 2014 at 5:02 PM, Ian Crew <[hidden email]> wrote:
Hi all:

I've got a large report (tab-delimited text, and therefore easy to suck into Excel if necessary) that includes a lot of blank columns where every cell except the header cell is empty.  

Does anyone know of an easy way to delete or hide those blank columns, while ensuring that the correct header stays with the correct column?  My google mojo is failing me today.  

I'm not particularly fussy about tool or method (Excel 2011, perl, unix script, whatever), nor does it have to be fully automated--this is the sort of thing I'll probably be doing less than 20 times ever.  What I really don't want to have to do is manually look through 250+ columns x 900+ rows to try to determine which columns are empty.  Also, since I'm not going to do it that many times, I'm hoping that it wouldn't take a huge amount of set-up time to get whatever solution working...

Any tips?

Thanks!

Ian

___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.



___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Deleting blank columns in a tab-delimited report?

Graham Patterson

So this is a sparse spreadsheet, with some (initially unknown) columns
empty except for a header? These solutions would need some fleshing out,
but I've done stranger things.

a) If the original file is actually <tab><tab> and not <tab><space><tab>
then the first step is to put in a sentinel value between the adjacent
tabs. Then you could run it through AWK/Perl/Python/whatever with the
field separator set to <tab> to get a fixed line length output, and then
use cut or AWK to chop out the columns you don't want. This will likely
be two passes, one to work out the data fields for each row, and hence
derive the 'empty' columns, and a second pass for the actual extraction.
You need to accumulate column state for the whole file in order to know
which column to remove. hence solution (c).

b) The simplest option might be to load it into PHP using fgetcsv()
(will handle various separators), and then use PHPExcel to create a
worksheet to your specifications. Or load it into Excel to do the
conversion and save it as an Excel document, then use PHPExcel to
manipulate the column hiding/removal.

c) Finally, you could roll the text data file 90 degrees, and chop out
rows with just the one header field (much easier when the decision is on
a line by line basis). Then roll it back 90 degrees. A quick look for
'transpose' found:
http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash
http://stackoverflow.com/questions/19001405/transpose-rows-into-column-in-unix

Since you are not doing this very often, speed of execution is less
important than setup and testing.

Graham

On 5/7/2014 6:14 PM, Ian Crew wrote:

> Hi Ryan:
>
> Copying the whole list, because others might be similarly confused.
>   You've made me realize that a screenshot of the file imported into
> Excel might be the best illustration here:
>
>
> As you can see, many of the columns are blank (like columns BK, BL, BO,
> BP, etc. in the screenshot; and yes, those are represented in the text
> file as x<tab><tab>z), and others are sparsely populated (like columns
> BM and BN in the screenshot), but all have a column header in row 1.
>   I'm interested in getting rid of the columns that are truly blank, but
> being sure to keep the ones that are just sparsely populated.
>
> I've found the Excel "COUNTA" function with feels like it might come
> close (something like "IF COUNTA(BY2:BY1000) == 0, hide this column"),
> but I haven't fully figured that out yet.  More unix-y, command-line-y,
> solutions are definitely welcome....
>
> Thanks,
>
> Ian
>
> On May 7, 2014, at 5:57 PM, Ryan Lovett <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>> Hi Ian,
>>
>> I'm coming at this from the unix/CLI perspective...
>>
>> Isn't the notion of deleting a blank value a no-op? Or are your blank
>> values actually white space of some sort? Given
>>
>> x<tab><tab>z
>>
>> The middle column is blank; it has been deleted (or was never there).
>> If the blank values are only found at the end of a row, then I can see
>> how deleting the tab would delete the empty column.
>>
>> Just trying to understand the problem. Deleting something that is
>> empty has the ring of a tree falling down -- the values at the end of
>> the row are people who can hear it.
>>
>> Ryan
>>
>>
>> Ryan
>>
>>
>> On Wed, May 7, 2014 at 5:02 PM, Ian Crew <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi all:
>>
>>     I've got a large report (tab-delimited text, and therefore easy to
>>     suck into Excel if necessary) that includes a /lot/ of blank
>>     columns where every cell _except_ the header cell is empty.
>>
>>     Does anyone know of an easy way to delete or hide those blank
>>     columns, while ensuring that the correct header stays with the
>>     correct column?  My google mojo is failing me today.
>>
>>     I'm not particularly fussy about tool or method (Excel 2011, perl,
>>     unix script, whatever), nor does it have to be fully
>>     automated--this is the sort of thing I'll probably be doing less
>>     than 20 times ever.  What I /really/ don't want to have to do is
>>     manually look through 250+ columns x 900+ rows to try to determine
>>     which columns are empty.  Also, since I'm not going to do it that
>>     many times, I'm hoping that it wouldn't take a huge amount of
>>     set-up time to get whatever solution working...
>>
>>     Any tips?
>>
>>     Thanks!
>>
>>     Ian
>>
>>     ___
>>     Ian Crew
>>     Platform and Services Manager, Research Hub
>>     http://hub.berkeley.edu <http://hub.berkeley.edu/>
>>
>>     IST-Architecture, Platforms and Integration (API)
>>     Earl Warren Hall, Second Floor
>>     University of California, Berkeley
>>
>>
>>
>>     -------------------------------------------------------------------------
>>     The following was automatically added to this message by the list
>>     server:
>>
>>     To learn more about Micronet, including how to subscribe to or
>>     unsubscribe from its mailing list and how to find out about
>>     upcoming meetings, please visit the Micronet Web site:
>>
>>     http://micronet.berkeley.edu <http://micronet.berkeley.edu/>
>>
>>     Messages you send to this mailing list are public and
>>     world-viewable, and the list's archives can be browsed and
>>     searched on the Internet.  This means these messages can be viewed
>>     by (among others) your bosses, prospective employers, and people
>>     who have known you in the past.
>>
>>
>
> ___
> Ian Crew
> Platform and Services Manager, Research Hub
> http://hub.berkeley.edu
>
> IST-Architecture, Platforms and Integration (API)
> Earl Warren Hall, Second Floor
> University of California, Berkeley
>
>
>
>
> -------------------------------------------------------------------------
> The following was automatically added to this message by the list server:
>
> To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:
>
> http://micronet.berkeley.edu
>
> Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
>

--
Graham Patterson, Systems Administrator
Lawrence Hall of Science, UC Berkeley   510-643-2222
"...past the iguana, the tyrannosaurus, the mastodon,
the mathematical puzzles, and the meteorite..." - directions to my office.

 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Deleting blank columns in a tab-delimited report?

Ian Crew
In reply to this post by Ian Crew
Hi all:

With many, many thanks to all of you, I've got a great solution:  exactly 2.5 hours after I sent the request, Don Patterson sent me a beautiful perl script that does exactly what we need.  Now, when we load it up in Excel, the rightmost column is CX (102) rather than ML (350).  A pretty huge difference!

Also thank you to Aron, Rick, Ryan, Paul, and Graham for their very helpful suggestions and offers of assistance.

Here's how things look now, which looks much better:


Cheers,

Ian

On May 7, 2014, at 6:14 PM, Ian Crew <[hidden email]> wrote:

Hi Ryan:

Copying the whole list, because others might be similarly confused.  You've made me realize that a screenshot of the file imported into Excel might be the best illustration here:

<PastedGraphic-1.png>

As you can see, many of the columns are blank (like columns BK, BL, BO, BP, etc. in the screenshot; and yes, those are represented in the text file as x<tab><tab>z), and others are sparsely populated (like columns BM and BN in the screenshot), but all have a column header in row 1.  I'm interested in getting rid of the columns that are truly blank, but being sure to keep the ones that are just sparsely populated.

I've found the Excel "COUNTA" function with feels like it might come close (something like "IF COUNTA(BY2:BY1000) == 0, hide this column"), but I haven't fully figured that out yet.  More unix-y, command-line-y, solutions are definitely welcome.... 

Thanks,

Ian

On May 7, 2014, at 5:57 PM, Ryan Lovett <[hidden email]> wrote:

Hi Ian,

I'm coming at this from the unix/CLI perspective...

Isn't the notion of deleting a blank value a no-op? Or are your blank values actually white space of some sort? Given

x<tab><tab>z

The middle column is blank; it has been deleted (or was never there). If the blank values are only found at the end of a row, then I can see how deleting the tab would delete the empty column.

Just trying to understand the problem. Deleting something that is empty has the ring of a tree falling down -- the values at the end of the row are people who can hear it.

Ryan


Ryan


On Wed, May 7, 2014 at 5:02 PM, Ian Crew <[hidden email]> wrote:
Hi all:

I've got a large report (tab-delimited text, and therefore easy to suck into Excel if necessary) that includes a lot of blank columns where every cell except the header cell is empty.  

Does anyone know of an easy way to delete or hide those blank columns, while ensuring that the correct header stays with the correct column?  My google mojo is failing me today.  

I'm not particularly fussy about tool or method (Excel 2011, perl, unix script, whatever), nor does it have to be fully automated--this is the sort of thing I'll probably be doing less than 20 times ever.  What I really don't want to have to do is manually look through 250+ columns x 900+ rows to try to determine which columns are empty.  Also, since I'm not going to do it that many times, I'm hoping that it wouldn't take a huge amount of set-up time to get whatever solution working...

Any tips?

Thanks!

Ian

___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.



___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Deleting blank columns in a tab-delimited report?

Ian Crew
Hi all:

I've been asked to pass along the solution from Donald Patterson, as well as some of the other suggestions I got off-list.  So here goes (with thanks to the folks that made the suggestions:

1) The solution from Donald Patterson that I ended up using.  He wrote:
What? A tab-delimited file? I'm doing perl. Attached is my suggestion of a perl script to drop the blank (header-only) columns. .

    ./dropblankcols.pl "-inputfile=test-input.txt" > test-output.txt
      where:
         test-input.txt  - is the filename of the input file I tested with
         test-output.txt - is the filename of the output file

I had all files in the same folder. Running Linux.

If the input file isn't that large, one could accumulate the rows in an array in memory so the second pass would go through the in-memory array and we wouldn't have to read the file from disk a second time. 


2) Several suggestions from Aron Roberts:
Not by any means a complete or even useful answer, but two potential
pieces of the puzzle ...

awk: removing columns by column number (once you've identified the
empty columns, that is:

http://www.theunixschool.com/2012/11/awk-examples-insert-remove-update-fields.html

Python: a somewhat similar problem, perhaps?
http://stackoverflow.com/questions/1983902/remove-row-or-column-from-2d-list-if-all-values-in-that-row-or-column-are-none

Visual Basic: pseudocode (untried by its author) - and possibly
applicable to other languages with column-wise operations - for
finding and removing empty columns, by checking whether the first row
cell is blank (you could check the second row in this case - the one
below the column header - if your data happens to be so conveniently
arranged that the 2nd row shouldn't ever be empty *except* in an empty
column)

http://www.vbforums.com/showthread.php?595670-Remove-empty-columns-from-a-tab-delimited-text-file&p=3680457&viewfull=1#post3680457

3) A suggestion from Rick Jaffe:
I find that using Conditional formatting (in Excel, at least) to color the values in a column makes it easy to spot values while scrolling the page. [....] This helps identify empty columns and might alleviate the need to hide them.

4) Several suggestions from Graham Patterson, already posted to Micronet, but included here for completeness:
So this is a sparse spreadsheet, with some (initially unknown) columns 
empty except for a header? These solutions would need some fleshing out, 
but I've done stranger things.

a) If the original file is actually <tab><tab> and not <tab><space><tab> 
then the first step is to put in a sentinel value between the adjacent 
tabs. Then you could run it through AWK/Perl/Python/whatever with the 
field separator set to <tab> to get a fixed line length output, and then 
use cut or AWK to chop out the columns you don't want. This will likely 
be two passes, one to work out the data fields for each row, and hence 
derive the 'empty' columns, and a second pass for the actual extraction. 
You need to accumulate column state for the whole file in order to know 
which column to remove. hence solution (c).

b) The simplest option might be to load it into PHP using fgetcsv() 
(will handle various separators), and then use PHPExcel to create a 
worksheet to your specifications. Or load it into Excel to do the 
conversion and save it as an Excel document, then use PHPExcel to 
manipulate the column hiding/removal.

c) Finally, you could roll the text data file 90 degrees, and chop out 
rows with just the one header field (much easier when the decision is on 
a line by line basis). Then roll it back 90 degrees. A quick look for 
'transpose' found:
http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash
http://stackoverflow.com/questions/19001405/transpose-rows-into-column-in-unix

Since you are not doing this very often, speed of execution is less 
important than setup and testing.

Thanks again, all, for all your help.  Micronet rocks!

Ian

On May 8, 2014, at 7:15 AM, Ian Crew <[hidden email]> wrote:

Hi all:

With many, many thanks to all of you, I've got a great solution:  exactly 2.5 hours after I sent the request, Don Patterson sent me a beautiful perl script that does exactly what we need.  Now, when we load it up in Excel, the rightmost column is CX (102) rather than ML (350).  A pretty huge difference!

Also thank you to Aron, Rick, Ryan, Paul, and Graham for their very helpful suggestions and offers of assistance.

Here's how things look now, which looks much better:

<PastedGraphic-2.png>

Cheers,

Ian

On May 7, 2014, at 6:14 PM, Ian Crew <[hidden email]> wrote:

Hi Ryan:

Copying the whole list, because others might be similarly confused.  You've made me realize that a screenshot of the file imported into Excel might be the best illustration here:

<PastedGraphic-1.png>

As you can see, many of the columns are blank (like columns BK, BL, BO, BP, etc. in the screenshot; and yes, those are represented in the text file as x<tab><tab>z), and others are sparsely populated (like columns BM and BN in the screenshot), but all have a column header in row 1.  I'm interested in getting rid of the columns that are truly blank, but being sure to keep the ones that are just sparsely populated.

I've found the Excel "COUNTA" function with feels like it might come close (something like "IF COUNTA(BY2:BY1000) == 0, hide this column"), but I haven't fully figured that out yet.  More unix-y, command-line-y, solutions are definitely welcome.... 

Thanks,

Ian

On May 7, 2014, at 5:57 PM, Ryan Lovett <[hidden email]> wrote:

Hi Ian,

I'm coming at this from the unix/CLI perspective...

Isn't the notion of deleting a blank value a no-op? Or are your blank values actually white space of some sort? Given

x<tab><tab>z

The middle column is blank; it has been deleted (or was never there). If the blank values are only found at the end of a row, then I can see how deleting the tab would delete the empty column.

Just trying to understand the problem. Deleting something that is empty has the ring of a tree falling down -- the values at the end of the row are people who can hear it.

Ryan


Ryan


On Wed, May 7, 2014 at 5:02 PM, Ian Crew <[hidden email]> wrote:
Hi all:

I've got a large report (tab-delimited text, and therefore easy to suck into Excel if necessary) that includes a lot of blank columns where every cell except the header cell is empty.  

Does anyone know of an easy way to delete or hide those blank columns, while ensuring that the correct header stays with the correct column?  My google mojo is failing me today.  

I'm not particularly fussy about tool or method (Excel 2011, perl, unix script, whatever), nor does it have to be fully automated--this is the sort of thing I'll probably be doing less than 20 times ever.  What I really don't want to have to do is manually look through 250+ columns x 900+ rows to try to determine which columns are empty.  Also, since I'm not going to do it that many times, I'm hoping that it wouldn't take a huge amount of set-up time to get whatever solution working...

Any tips?

Thanks!

Ian

___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.



___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.

dropblankcols.pl (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Deleting blank columns in a tab-delimited report?

Aaron Culich-2
Here's one more solution to add to your list if you have Python installed with the Pandas library <http://pandas.pydata.org/> which has a great CSV parser if you ever have to handle more gnarly files and/or more complex manipulations it will robustly handle data manipulation with just a few simple lines. In this case just 1 one line does the hard work (and the 3 other lines are just to import the library and read/write the data):

You can see the live example (with comments) here in the IPython Notebook Viewer:


Or clone the gist repo from here:


The gist contains the IPython Notebook file, the example input file, and a raw Python script version, as well.

-Aaron


On Thu, May 8, 2014 at 10:40 AM, Ian Crew <[hidden email]> wrote:
Hi all:

I've been asked to pass along the solution from Donald Patterson, as well as some of the other suggestions I got off-list.  So here goes (with thanks to the folks that made the suggestions:

1) The solution from Donald Patterson that I ended up using.  He wrote:
What? A tab-delimited file? I'm doing perl. Attached is my suggestion of a perl script to drop the blank (header-only) columns. .

    ./dropblankcols.pl "-inputfile=test-input.txt" > test-output.txt
      where:
         test-input.txt  - is the filename of the input file I tested with
         test-output.txt - is the filename of the output file

I had all files in the same folder. Running Linux.

If the input file isn't that large, one could accumulate the rows in an array in memory so the second pass would go through the in-memory array and we wouldn't have to read the file from disk a second time. 


2) Several suggestions from Aron Roberts:
Not by any means a complete or even useful answer, but two potential
pieces of the puzzle ...

awk: removing columns by column number (once you've identified the
empty columns, that is:

http://www.theunixschool.com/2012/11/awk-examples-insert-remove-update-fields.html

Python: a somewhat similar problem, perhaps?
http://stackoverflow.com/questions/1983902/remove-row-or-column-from-2d-list-if-all-values-in-that-row-or-column-are-none

Visual Basic: pseudocode (untried by its author) - and possibly
applicable to other languages with column-wise operations - for
finding and removing empty columns, by checking whether the first row
cell is blank (you could check the second row in this case - the one
below the column header - if your data happens to be so conveniently
arranged that the 2nd row shouldn't ever be empty *except* in an empty
column)

http://www.vbforums.com/showthread.php?595670-Remove-empty-columns-from-a-tab-delimited-text-file&p=3680457&viewfull=1#post3680457

3) A suggestion from Rick Jaffe:
I find that using Conditional formatting (in Excel, at least) to color the values in a column makes it easy to spot values while scrolling the page. [....] This helps identify empty columns and might alleviate the need to hide them.

4) Several suggestions from Graham Patterson, already posted to Micronet, but included here for completeness:
So this is a sparse spreadsheet, with some (initially unknown) columns 
empty except for a header? These solutions would need some fleshing out, 
but I've done stranger things.

a) If the original file is actually <tab><tab> and not <tab><space><tab> 
then the first step is to put in a sentinel value between the adjacent 
tabs. Then you could run it through AWK/Perl/Python/whatever with the 
field separator set to <tab> to get a fixed line length output, and then 
use cut or AWK to chop out the columns you don't want. This will likely 
be two passes, one to work out the data fields for each row, and hence 
derive the 'empty' columns, and a second pass for the actual extraction. 
You need to accumulate column state for the whole file in order to know 
which column to remove. hence solution (c).

b) The simplest option might be to load it into PHP using fgetcsv() 
(will handle various separators), and then use PHPExcel to create a 
worksheet to your specifications. Or load it into Excel to do the 
conversion and save it as an Excel document, then use PHPExcel to 
manipulate the column hiding/removal.

c) Finally, you could roll the text data file 90 degrees, and chop out 
rows with just the one header field (much easier when the decision is on 
a line by line basis). Then roll it back 90 degrees. A quick look for 
'transpose' found:
http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash
http://stackoverflow.com/questions/19001405/transpose-rows-into-column-in-unix

Since you are not doing this very often, speed of execution is less 
important than setup and testing.

Thanks again, all, for all your help.  Micronet rocks!

Ian

On May 8, 2014, at 7:15 AM, Ian Crew <[hidden email]> wrote:

Hi all:

With many, many thanks to all of you, I've got a great solution:  exactly 2.5 hours after I sent the request, Don Patterson sent me a beautiful perl script that does exactly what we need.  Now, when we load it up in Excel, the rightmost column is CX (102) rather than ML (350).  A pretty huge difference!

Also thank you to Aron, Rick, Ryan, Paul, and Graham for their very helpful suggestions and offers of assistance.

Here's how things look now, which looks much better:

<PastedGraphic-2.png>

Cheers,

Ian

On May 7, 2014, at 6:14 PM, Ian Crew <[hidden email]> wrote:

Hi Ryan:

Copying the whole list, because others might be similarly confused.  You've made me realize that a screenshot of the file imported into Excel might be the best illustration here:

<PastedGraphic-1.png>

As you can see, many of the columns are blank (like columns BK, BL, BO, BP, etc. in the screenshot; and yes, those are represented in the text file as x<tab><tab>z), and others are sparsely populated (like columns BM and BN in the screenshot), but all have a column header in row 1.  I'm interested in getting rid of the columns that are truly blank, but being sure to keep the ones that are just sparsely populated.

I've found the Excel "COUNTA" function with feels like it might come close (something like "IF COUNTA(BY2:BY1000) == 0, hide this column"), but I haven't fully figured that out yet.  More unix-y, command-line-y, solutions are definitely welcome.... 

Thanks,

Ian

On May 7, 2014, at 5:57 PM, Ryan Lovett <[hidden email]> wrote:

Hi Ian,

I'm coming at this from the unix/CLI perspective...

Isn't the notion of deleting a blank value a no-op? Or are your blank values actually white space of some sort? Given

x<tab><tab>z

The middle column is blank; it has been deleted (or was never there). If the blank values are only found at the end of a row, then I can see how deleting the tab would delete the empty column.

Just trying to understand the problem. Deleting something that is empty has the ring of a tree falling down -- the values at the end of the row are people who can hear it.

Ryan


Ryan


On Wed, May 7, 2014 at 5:02 PM, Ian Crew <[hidden email]> wrote:
Hi all:

I've got a large report (tab-delimited text, and therefore easy to suck into Excel if necessary) that includes a lot of blank columns where every cell except the header cell is empty.  

Does anyone know of an easy way to delete or hide those blank columns, while ensuring that the correct header stays with the correct column?  My google mojo is failing me today.  

I'm not particularly fussy about tool or method (Excel 2011, perl, unix script, whatever), nor does it have to be fully automated--this is the sort of thing I'll probably be doing less than 20 times ever.  What I really don't want to have to do is manually look through 250+ columns x 900+ rows to try to determine which columns are empty.  Also, since I'm not going to do it that many times, I'm hoping that it wouldn't take a huge amount of set-up time to get whatever solution working...

Any tips?

Thanks!

Ian

___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.



___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.



 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.
Reply | Threaded
Open this post in threaded view
|

Re: [Micronet] Deleting blank columns in a tab-delimited report?

Dav Clark
And for posterity, adding to Aaron's answer, if your file is essentially tabular, it makes sense to use the best tabular data structure in whatever platform you're comfortable in (like pandas for python). If you're more comfortable in R, you can read.delim into a data.frame. And if you're more of a GUI gal (or fella), Stat Transfer appears to be the strongest not-terribly-expensive commercial tool in this category, OpenRefine is the dominant open source solution, and upstart DataPad seeks to be the new (web-based) champ for usability and performance.

D


On Thu, May 8, 2014 at 11:12 AM, Aaron Culich <[hidden email]> wrote:
Here's one more solution to add to your list if you have Python installed with the Pandas library <http://pandas.pydata.org/> which has a great CSV parser if you ever have to handle more gnarly files and/or more complex manipulations it will robustly handle data manipulation with just a few simple lines. In this case just 1 one line does the hard work (and the 3 other lines are just to import the library and read/write the data):

You can see the live example (with comments) here in the IPython Notebook Viewer:


Or clone the gist repo from here:


The gist contains the IPython Notebook file, the example input file, and a raw Python script version, as well.

-Aaron


On Thu, May 8, 2014 at 10:40 AM, Ian Crew <[hidden email]> wrote:
Hi all:

I've been asked to pass along the solution from Donald Patterson, as well as some of the other suggestions I got off-list.  So here goes (with thanks to the folks that made the suggestions:

1) The solution from Donald Patterson that I ended up using.  He wrote:
What? A tab-delimited file? I'm doing perl. Attached is my suggestion of a perl script to drop the blank (header-only) columns. .

    ./dropblankcols.pl "-inputfile=test-input.txt" > test-output.txt
      where:
         test-input.txt  - is the filename of the input file I tested with
         test-output.txt - is the filename of the output file

I had all files in the same folder. Running Linux.

If the input file isn't that large, one could accumulate the rows in an array in memory so the second pass would go through the in-memory array and we wouldn't have to read the file from disk a second time. 


2) Several suggestions from Aron Roberts:
Not by any means a complete or even useful answer, but two potential
pieces of the puzzle ...

awk: removing columns by column number (once you've identified the
empty columns, that is:

http://www.theunixschool.com/2012/11/awk-examples-insert-remove-update-fields.html

Python: a somewhat similar problem, perhaps?
http://stackoverflow.com/questions/1983902/remove-row-or-column-from-2d-list-if-all-values-in-that-row-or-column-are-none

Visual Basic: pseudocode (untried by its author) - and possibly
applicable to other languages with column-wise operations - for
finding and removing empty columns, by checking whether the first row
cell is blank (you could check the second row in this case - the one
below the column header - if your data happens to be so conveniently
arranged that the 2nd row shouldn't ever be empty *except* in an empty
column)

http://www.vbforums.com/showthread.php?595670-Remove-empty-columns-from-a-tab-delimited-text-file&p=3680457&viewfull=1#post3680457

3) A suggestion from Rick Jaffe:
I find that using Conditional formatting (in Excel, at least) to color the values in a column makes it easy to spot values while scrolling the page. [....] This helps identify empty columns and might alleviate the need to hide them.

4) Several suggestions from Graham Patterson, already posted to Micronet, but included here for completeness:
So this is a sparse spreadsheet, with some (initially unknown) columns 
empty except for a header? These solutions would need some fleshing out, 
but I've done stranger things.

a) If the original file is actually <tab><tab> and not <tab><space><tab> 
then the first step is to put in a sentinel value between the adjacent 
tabs. Then you could run it through AWK/Perl/Python/whatever with the 
field separator set to <tab> to get a fixed line length output, and then 
use cut or AWK to chop out the columns you don't want. This will likely 
be two passes, one to work out the data fields for each row, and hence 
derive the 'empty' columns, and a second pass for the actual extraction. 
You need to accumulate column state for the whole file in order to know 
which column to remove. hence solution (c).

b) The simplest option might be to load it into PHP using fgetcsv() 
(will handle various separators), and then use PHPExcel to create a 
worksheet to your specifications. Or load it into Excel to do the 
conversion and save it as an Excel document, then use PHPExcel to 
manipulate the column hiding/removal.

c) Finally, you could roll the text data file 90 degrees, and chop out 
rows with just the one header field (much easier when the decision is on 
a line by line basis). Then roll it back 90 degrees. A quick look for 
'transpose' found:
http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash
http://stackoverflow.com/questions/19001405/transpose-rows-into-column-in-unix

Since you are not doing this very often, speed of execution is less 
important than setup and testing.

Thanks again, all, for all your help.  Micronet rocks!

Ian

On May 8, 2014, at 7:15 AM, Ian Crew <[hidden email]> wrote:

Hi all:

With many, many thanks to all of you, I've got a great solution:  exactly 2.5 hours after I sent the request, Don Patterson sent me a beautiful perl script that does exactly what we need.  Now, when we load it up in Excel, the rightmost column is CX (102) rather than ML (350).  A pretty huge difference!

Also thank you to Aron, Rick, Ryan, Paul, and Graham for their very helpful suggestions and offers of assistance.

Here's how things look now, which looks much better:

<PastedGraphic-2.png>

Cheers,

Ian

On May 7, 2014, at 6:14 PM, Ian Crew <[hidden email]> wrote:

Hi Ryan:

Copying the whole list, because others might be similarly confused.  You've made me realize that a screenshot of the file imported into Excel might be the best illustration here:

<PastedGraphic-1.png>

As you can see, many of the columns are blank (like columns BK, BL, BO, BP, etc. in the screenshot; and yes, those are represented in the text file as x<tab><tab>z), and others are sparsely populated (like columns BM and BN in the screenshot), but all have a column header in row 1.  I'm interested in getting rid of the columns that are truly blank, but being sure to keep the ones that are just sparsely populated.

I've found the Excel "COUNTA" function with feels like it might come close (something like "IF COUNTA(BY2:BY1000) == 0, hide this column"), but I haven't fully figured that out yet.  More unix-y, command-line-y, solutions are definitely welcome.... 

Thanks,

Ian

On May 7, 2014, at 5:57 PM, Ryan Lovett <[hidden email]> wrote:

Hi Ian,

I'm coming at this from the unix/CLI perspective...

Isn't the notion of deleting a blank value a no-op? Or are your blank values actually white space of some sort? Given

x<tab><tab>z

The middle column is blank; it has been deleted (or was never there). If the blank values are only found at the end of a row, then I can see how deleting the tab would delete the empty column.

Just trying to understand the problem. Deleting something that is empty has the ring of a tree falling down -- the values at the end of the row are people who can hear it.

Ryan


Ryan


On Wed, May 7, 2014 at 5:02 PM, Ian Crew <[hidden email]> wrote:
Hi all:

I've got a large report (tab-delimited text, and therefore easy to suck into Excel if necessary) that includes a lot of blank columns where every cell except the header cell is empty.  

Does anyone know of an easy way to delete or hide those blank columns, while ensuring that the correct header stays with the correct column?  My google mojo is failing me today.  

I'm not particularly fussy about tool or method (Excel 2011, perl, unix script, whatever), nor does it have to be fully automated--this is the sort of thing I'll probably be doing less than 20 times ever.  What I really don't want to have to do is manually look through 250+ columns x 900+ rows to try to determine which columns are empty.  Also, since I'm not going to do it that many times, I'm hoping that it wouldn't take a huge amount of set-up time to get whatever solution working...

Any tips?

Thanks!

Ian

___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.



___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley


___
Ian Crew
Platform and Services Manager, Research Hub

IST-Architecture, Platforms and Integration (API)
Earl Warren Hall, Second Floor
University of California, Berkeley



-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.




-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.




--
Dav Clark
Data & Learning Scientist
UC Berkeley D-Lab
510-664-7000

 
-------------------------------------------------------------------------
The following was automatically added to this message by the list server:

To learn more about Micronet, including how to subscribe to or unsubscribe from its mailing list and how to find out about upcoming meetings, please visit the Micronet Web site:

http://micronet.berkeley.edu

Messages you send to this mailing list are public and world-viewable, and the list's archives can be browsed and searched on the Internet.  This means these messages can be viewed by (among others) your bosses, prospective employers, and people who have known you in the past.