Wednesday, September 5, 2012

Study changes (NYT editors') understanding of how DNA causes disease

Study Changes Understanding of How DNA Causes Disease
At least four million gene switches that reside in bits of DNA once thought to be inactive turn out to play critical roles in health, researchers reported.

So, what is your first take on what this story is about? Just, say, reading the title and the lede. It sounds like this is some sort of new result about how "junk DNA" actually does something. Wow! And there might be new understanding of (potentially all?) disease! 

Of course, these pieces of junk DNA had been known to be associated with certain diseases for a decade. In fact, the author likely knew this as it is written in the article.
In large studies over the past decade, scientists found that minor changes in human DNA sequences increase the risk that a person will get those diseases.
The earliest papers are from the late 1990s and 2000s. As the Human Genome project was coming up with much less than it expected, scientists pushed into this area. 

And of course, gene switches aren't new -- another fact the author likely knew as it too is written in the article.

In recent years, some [scientists] began to find switches in the 99 percent of human DNA that is not genes
I think the author left off what recent years meant because 10 years doesn't sound so new. In recent years (2007) it was sufficiently established for NOVA to cover it.

In fact, the entire concept has been around for awhile. The reason I wrote this particular post is that I personally have known about this simply through the aforementioned NOVA episode. I knew enough about gene switches in 2008 to comment (with proto-spittle flecked ire) on an idiotic statement by Ray Kurzweil saying the brain is simple because a human DNA sequence consists of only "50 million bytes" of information. I said:
In the worst case, a sizable fraction of all 2^20000 [gene on/off] states could be involved to get from a stem cell to every neuron in its right place of the brain with the proper function.

I don't want to detract from the actual work presented in the article. It is a pretty awesome piece of human genome mapping, and it really sheds light on how complex the whole thing is.  (And it puts some more hurt on Kurzweil since the entire 3D structure along with the switches appears to be important in DNA.) 

Gina Kolata seems to be a stand-up molecular biologist cum journalist. I imagine the editors of the NYT were completely blown away by progress in stuff they hadn't been paying attention to since the 1990s (or maybe ever) and said she should change the lede. 

And I guess it got me to click the link.

Monday, September 3, 2012

Other than the grammar ...

How often do people ask the question Other than the grammar, how was the speech? Well, apparently over 298 million times

Human speech derives its information carrying capacity from several places not the least of which is its temporal structure. Sorting on word frequency literally destroys significant quantities of information. The entire information content of the word green next to the word frog (i.e. green frog for those following along at home) is that green is modifying frog so as both to convey the information that the frog is green and distinguish said frog from e.g. a poison dart frog (which is not green, but instead blue or yellow). If I take that word green and move it to different position unrelated to the position of frog, that word green no longer carries any information at all ... and any information you do decide to imbue it with has no foundation whatsoever.

The above word cloud (apparently also known as a wordle, though that just may be specific generation software) of Romney's speech to the RNC has removed all of the information except that he might be running for President of America. However that is information I am adding to this infographic. The speaker simply mentions President and America. It could be in a negative light. The most commonly appearing words in this blog post are green and frog, but I'm not talking about green frogs. In a sense, the creators have only done a half-assed job. Below I lay waste to the information content, reducing the speech to an empirical estimate of the letter frequency in English.

