<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Gaël Varoquaux</title>
	<atom:link href="http://gael-varoquaux.info/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://gael-varoquaux.info/blog</link>
	<description>Views on Python, Computational Science, ...</description>
	<pubDate>Wed, 09 May 2012 09:34:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Update on scikit-learn: recent developments for machine learning in Python</title>
		<link>http://gael-varoquaux.info/blog/?p=165</link>
		<comments>http://gael-varoquaux.info/blog/?p=165#comments</comments>
		<pubDate>Tue, 08 May 2012 23:12:54 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[machine learning]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[science]]></category>

		<category><![CDATA[scientific computing]]></category>

		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=165</guid>
		<description><![CDATA[Yesterday, we released version 0.11 of the scikit-learn toolkit for machine learning in Python, and there was much rejoincing.
Major features gained in the last releases
In the last 6 months, there have been many things happening with the scikit-learn. While I do not whish to give an exhaustive summary of features added (it can be found [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday, we released version 0.11 of the <a href="http://scikit-learn.org"><i>scikit-learn</i></a> toolkit for machine learning in Python, and there was much rejoincing.</p>
<h2>Major features gained in the last releases</h2>
<p>In the last 6 months, there have been many things happening with the scikit-learn. While I do not whish to give an exhaustive summary of features added (it can be found <a href="http://scikit-learn.org/stable/whats_new.html">here</a>), let me list a few of the additions that I personnally find exciting.</p>
<p><a href="http://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_iris.html"> <img src="http://scikit-learn.org/stable/_images/plot_forest_iris_1.png" width=40% align="right"></a></p>
<h3>Non-linear prediction models</h3>
<p>For complex prediction problems where there is no simple model available, as in computer vision, non-linear models are handy. A good example of such models are those based on decisions trees and model averaging. For instance random forests are used in the Kinect to locate body parts. As they are intrinsically complex, they may need a large amount of training data. For this reason, they have been implemented in the scikit-learn with special attention to computational efficiency.
</p>
<ul>
<li><a href="http://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees">Randomized Forests and extra-trees</a></li>
<li><a href="http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting">Gradient boosted regression trees</a></li>
</ul>
<div style="clear: both"></div>
<h3>Dealing with unlabeled instances</h3>
<p>It is often easy to gather unlabeled observations than labeled observation. While prediction of a quantity of interest is then harder or simply impossible, mining this data can be useful.
</p>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"><a href="http://scikit-learn.org/stable/modules/label_propagation.html">Semi-supervised<br />
learning</a>: using unlabeled observations together with labeled one for better prediction.<br />
<hr/><a href="http://scikit-learn.org/stable/auto_examples/semi_supervised/plot_label_propagation_structure.html"><img src="http://scikit-learn.org/stable/_images/plot_label_propagation_structure_1.png" width=300px/></a>
</div>
<div style="width:10px; float:left;">&nbsp;</div>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"><a href="http://scikit-learn.org/stable/modules/outlier_detection.html">Outlier/novelty detection</a>: detect deviant observations.<br />
<hr/><a href="http://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html"><img src="http://scikit-learn.org/stable/_images/plot_oneclass_1.png" width=300px /></a></div>
<div style="width:10px; float:left;">&nbsp;</div>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"><a href="http://scikit-learn.org/stable/modules/manifold.html">Manifold learning</a>: discover a non-linear low-dimensional structure in the data.<br />
<hr/><a href="http://scikit-learn.org/stable/modules/manifold.html"><img src="http://scikit-learn.org/stable/_images/plot_compare_methods_1.png" width=300px /></a> </div>
<div style="width:10px; float:left;">&nbsp;</div>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"> <a href="http://scikit-learn.org/stable/modules/clustering.html">Clustering</a> with <a href="http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means">an algorithm</a> that can scale to really large datasets using an online approach: fitting small portions of the data on after the other (Mini-batch k-means).<br />
<hr/><a href="http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html"><img src="http://scikit-learn.org/stable/_images/plot_cluster_comparison_1.png" width=300px /></a>
</div>
<div style="width:10px; float:left;">&nbsp;</div>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888">
<a href="http://scikit-learn.org/stable/modules/decomposition.html#dictionarylearning">Dictionary learning</a>: learning patterns in the data that represent it sparsely: each observation is a combination of a small number patterns.</p>
<hr/><a href="http://scikit-learn.org/stable/auto_examples/decomposition/plot_image_denoising.html#example-decomposition-plot-image-denoising-py"><img src="http://scikit-learn.org/stable/_images/plot_image_denoising_1.png" width=300px /></a></div>
<div style="clear: both"></div>
<h3>Sparse models: when very few descriptors are relevant</h3>
<p>In general, finding which descriptors are useful when there are many of them is like find a needle in a haystack: it is a very hard problem. However, you know that only a few of these descriptors actually carry information, you are in a so-called <i>sparse</i> problem, for specific approaches can work well.
</p>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"><a href="http://scikit-learn.org/stable/modules/linear_model.html#orthogonal-matching-pursuit-omp">Orthogonal matching pursuit</a>: a greedy and fast algorithm for very sparse linear models<br />
<hr/><a href="http://scikit-learn.org/stable/auto_examples/linear_model/plot_omp.html"><img src="http://scikit-learn.org/stable/_images/plot_omp_1.png" width=300px /></a></div>
<div style="width:10px; float:left;">&nbsp;</div>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"><a href="http://scikit-learn.org/stable/modules/feature_selection.html#randomized-sparse-models">Randomized sparsity (randomized Lasso)</a>: selecting the relevant descriptors in noisy high-dimensional observations<br />
<hr/> <a href="http://scikit-learn.org/stable/auto_examples/linear_model/plot_sparse_recovery.html"><img src="http://scikit-learn.org/stable/_images/plot_sparse_recovery_11.png" width=300px /></a></div>
<div style="width:10px; float:left;">&nbsp;</div>
<div style="width:300px; float:left; padding: 5px; border: 1px solid #888"> <a href="http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLasso.html#sklearn.covariance.GraphLasso">Sparse inverse covariance</a>: learning graphs of connectivity from correlations in the data</p>
<hr/><a href="http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#example-applications-plot-stock-market-py"><img src="http://scikit-learn.org/stable/_images/plot_stock_market_1.png" width=300px /></a>
</div>
<div style="clear: both"></div>
<h1>Getting developpers together: the Granada sprint</h1>
<p><object width="400" height="300" align="right"><param name="flashvars" value="offsite=true&#038;lang=en-us&#038;page_show_url=%2Fsearch%2Fshow%2F%3Fq%3Dscikit-learn%26m%3Dtags%26w%3D66885349%2540N03&#038;page_show_back_url=%2Fsearch%2F%3Fq%3Dscikit-learn%26m%3Dtags%26w%3D66885349%2540N03&#038;method=flickr.photos.search&#038;api_params_str=&#038;api_tags=scikit-learn&#038;api_tag_mode=bool&#038;api_user_id=66885349%40N03&#038;api_safe_search=3&#038;api_content_type=7&#038;api_media=all&#038;api_sort=date-posted-desc&#038;jump_to=&#038;start_index=0"></param><param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=109615"></param><param name="allowFullScreen" value="true"></param><embed type="application/x-shockwave-flash" src="http://www.flickr.com/apps/slideshow/show.swf?v=109615" allowFullScreen="true" flashvars="offsite=true&#038;lang=en-us&#038;page_show_url=%2Fsearch%2Fshow%2F%3Fq%3Dscikit-learn%26m%3Dtags%26w%3D66885349%2540N03&#038;page_show_back_url=%2Fsearch%2F%3Fq%3Dscikit-learn%26m%3Dtags%26w%3D66885349%2540N03&#038;method=flickr.photos.search&#038;api_params_str=&#038;api_tags=scikit-learn&#038;api_tag_mode=bool&#038;api_user_id=66885349%40N03&#038;api_safe_search=3&#038;api_content_type=7&#038;api_media=all&#038;api_sort=date-posted-desc&#038;jump_to=&#038;start_index=0" width="400" height="300"></embed></object></p>
<p>Of course, such developments happen only because we have a great team of <a href="https://github.com/scikit-learn/scikit-learn/graphs/contributors">dedicated coders</a>.</p>
<p>Getting along and working together is a critical part of the project. In December 2011, we held the first international <a href="http://scikit-learn">scikit-learn</a> sprint in Granada, on the side of the <a href="http://nips.cc">NIPS conference</a>. That was a while ago, and I haven&#8217;t found time to blog about it, maybe because I was too busy merging in the code produced :). Here is a small report from my point of view. Better late than never. </p>
<h2>Participants from all over the globe</h2>
<p>This sprint was a big deal for us, because for the first time, thanks to sponsor money, we were able to fly contributors from overseas and meet the team in person. For the first time I was able to see the faces behind many of the fantastic people that I knew only from the mailing list.</p>
<p>I really think that we must thank our sponsors, <strong>Google</strong> and <strong>tinyclues</strong>, but also The PSF, that is in particular Jesse Noller but especially <strong>Steve Holden</strong>, whose help was absolutely instrumental in getting sponsor money. This money is what made it possible to unite a good fraction of the team, and it opened the door to great moments of coding, and more.</p>
<h2>Producing code lines and friendship</h2>
<p>An important aspect of the sprint for me was that I really felt the team being united. Granada is a great city and we spent fantastic moments together. Now when I review code, I can often put a face on the author of that code and remember a walk below the Alhambra or an evening in a bar. I am sure it helps reviewing code!
</p>
<h2>Was it worth the money?</h2>
<p><a href="http://gael-varoquaux.info/blog/wp-content/uploads/2012/skl_activity.png"><br />
<img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/skl_activity.png" width=50% align='right'> </a>I really appreciate that the sponsors did not ask for specific returns on investment beyond acknowledgments, but I think that it is useful for us to ask the question: was it worth the money? After all, we got around $5000, and that&#8217;s a lot of money. First of all, as a side effect of the sprint, people who had invested a huge amount of time in a machine learning toolkit without asking anything in return got help to go to a major machine learning conference.</p>
<p>But was there a return over investment in terms of code? If you look at the number of lines of code modified weekly (figure on the right), there is a big spike in December 2011. That&#8217;s our sprint! Importantly, if you look at the months following the sprint, there still is a lot of activity in the months following the sprint. This is actually unusual, as the active developments happen more in the summer break than during the winter, as our developpers are busy working on papers or teaching.</p>
<p>The explaination is simple: we where thrilled by the sprint. Overall, it was incredibly beneficial to the project. I am looking forward to the next ones.</p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=165</wfw:commentRss>
		</item>
		<item>
		<title>3 Google summer of code for scikit-learn and more&#8230;</title>
		<link>http://gael-varoquaux.info/blog/?p=164</link>
		<comments>http://gael-varoquaux.info/blog/?p=164#comments</comments>
		<pubDate>Mon, 23 Apr 2012 21:25:58 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[computational science]]></category>

		<category><![CDATA[machine learning]]></category>

		<category><![CDATA[mayavi]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[science]]></category>

		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=164</guid>
		<description><![CDATA[The scikit-learn got 3 students accepted for the Google summer of code.

Imanuel Bayer will work on making our sparse linear models, for regression and classification, faster. His proposal Optimizing sparse linear models using coordinate descent and strong rules.
David Marek will implement multi-layer perceptrons for the scikit. His proposal: Multilayer Perceptron
Vlad Niculae will work on speeding [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://scikit-learn.org">scikit-learn</a> got 3 students accepted for the Google summer of code.</p>
<ul>
<li><a href="http://ibayer.blogspot.fr/">Imanuel Bayer</a> will work on making our sparse linear models, for regression and classification, faster. His proposal <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/ibayer/11001">Optimizing sparse linear models using coordinate descent and strong rules</a>.</li>
<li><a href="http://www.davidmarek.cz/">David Marek</a> will implement multi-layer perceptrons for the scikit. His proposal: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/h4wk_cz/24001">Multilayer Perceptron</a></li>
<li><a href="http://blog.vene.ro/">Vlad Niculae</a> will work on speeding up the library in general, catching all the low hanging fruits, and the ones a bit higher. His proposal: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/vladn/26002">Need for scikit-learn speed</a></li>
</ul>
<p>
In addition, other related projects have exciting projects, for instance <a href="http://statsmodels.sourceforge.net/"><strong>statsmodels</strong><a>:</p>
<ul>
<li>Divyanshu Bandil: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/divyanshu/34002">Extension of Linear to Non Linear Models in Statsmodels Python module</a></li>
<li>Alexandre Crayssac: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/alexandreyc/8001">estimating system of equations</a></li>
<li>Justin Grana: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/j_grana/8001">empirical Likelihood in Statsmodels</a></li>
<li>Georgi Panterov: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/gpanterov/7001">nonparametric estimation</a></li>
</ul>
<p> and <a href="http://www.cython.org">Cython</a>:
<ul>
<li>Philip Herron: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/redbrain1123/28002">pxd generation using gcc-python-plugin</a></li>
<li>Mark Florisson: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/markflorisson88/30002">Fast Numerical Computing with Cython</a></li>
</ul>
<p>finally, in <a href="http://pandas.pydata.org/">Pandas</a>:</p>
<ul>
<li>Vytautas Jancauskas: <a href="http://www.google-melange.com/gsoc/project/google/gsoc2012/bucket_brigade/42002">Plots in pandas</a>
</li>
</ul>
<p>Congratulations to all of the students. This is going to be an exciting summer.</p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=164</wfw:commentRss>
		</item>
		<item>
		<title>The problems of low statistical power and publication bias</title>
		<link>http://gael-varoquaux.info/blog/?p=163</link>
		<comments>http://gael-varoquaux.info/blog/?p=163#comments</comments>
		<pubDate>Sat, 14 Apr 2012 15:16:33 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[computational science]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[science]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=163</guid>
		<description><![CDATA[
 Lately, I have been a mood of scientific scepticism: I have the feeling that the worldwide academic system is more and more failing to produce useful research. Christophe Lalanne&#8217;s twitter feed lead me to an interesting article in a non-mainstream journal: A farewell to Bonferroni: the problems of low statistical power and publication bias, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://idoubtit.files.wordpress.com/2010/12/coldfusion.jpg" align="right" width="20%" target="http://idoubtit.wordpress.com/2010/12/16/direct-to-the-public-science/"></p>
<p> Lately, I have been a mood of scientific scepticism: I have the feeling that the worldwide academic system is more and more failing to produce useful research. Christophe Lalanne&#8217;s <a href="https://twitter.com/#!/chlalanne">twitter feed</a> lead me to an interesting article in a non-mainstream journal: <a href="http://beheco.oxfordjournals.org/content/15/6/1044.short"><strong>A farewell to Bonferroni: the problems of low statistical power and publication bias</strong></a>, by Shinichi Nakagawa.
<p>Each study performed has a probability of being wrong. Thus performing many studies will lead to some wrong conclusions by chance. This is known in statistics as the <a href="http://en.wikipedia.org/wiki/Multiple_comparisons">multiple comparisons</a> problem. When a working hypothesis is not verified empirically in a study, this null finding is seldom reported, leading to what is called <i>publication bias</i>: <strong>discoveries are further studied; negative results are usually ignored</strong> (Y. Benjamini). Because only <i>discoveries</i>, called <i>detections</i> in statistical terms, are reported, <strong>published results contain more false detections than the individual experiments and very little false negatives</strong>. Arguably, the original investigators have corrected using the understanding that they gained the experiments performed and account in a <i>post-hoc analysis</i> for the fact that some of their working hypothesis could not have been correct. Such a correction can work only in a field where there is a good mechanistic understanding, or models, such as physics, but in my opinion not in life and social sciences.</p>
<p>Let me quote some relevant extracts of <a href="http://beheco.oxfordjournals.org/content/15/6/1044.short">the article</a>, as you may never have access to it thanks to the way scientific publishing works:</p>
<blockquote><p> Recently, Jennions and Moller (2003) carried out a meta-analysis on statistical power in the field of behavioral ecology and animal behavior, reviewing 10 leading journals including Behavioral Ecology. Their results showed dismayingly low average statistical power (note that a meta-analytic review of statistical power is different from post hoc power analysis as criticized in Hoenig and Heisey, 2001). The statistical power of a null hypothesis (Ho) significance test is the probability that the test will reject Ho when a research hypothesis (Ha) is true.<br />&#8230;<br />
The meta-analysis on statistical power by Jennions and Moller (2003) revealed that, in the field of behavioral ecology and animal behavior, statistical power of less than 20% to detect a small effect and power of less than 50% to detect a medium effect existed. This means, for example, that the average behavioral scientist performing a statistical test has a greater probability of making a Type II error (or beta) (<i>i.e.</i>, not rejecting Ho when Ho is false; note that statistical power is equals to 1 - beta) than if they had flipped a coin, when an experiment effect is of medium size.<br />&#8230;<br />
Imagine that we conduct a study where we measure as many relevant variables as possible, 10 variables, for example. We find only two variables statistically significant. Then, what should we do? We could decide to write a paper highlighting these two variables (and not reporting the other eight at all) as if we had hypotheses about the two significant variables in the first place. Subsequently, our paper would be published. Alternatively, we could write a paper including all 10 variables. When the paper is reviewed, referees might tell us that there were no significant results if we had &#8220;appropriately&#8221; employed Bonferroni corrections, so that our study would not be advisable for publication. However, the latter paper is scientifically more important than the former paper. For example, if one wants to conduct a meta-analysis to investigate an overall effect in a specific area of study, the latter paper is five times more informative than the former paper. In the long term, statistical significance of particular tests may be of trivial importance (if not always), although, in the short term, it makes papers publishable. Bonferroni procedures may, in part, be preventing the accumulation of knowledge in the field of behavioral ecology and animal behavior, thus hindering the progress of the field as science.
</p></blockquote>
<p><img src="http://farm6.staticflickr.com/5206/5330056727_a98c97c3c5.jpg" align="right" width="30%"></p>
<p>Some of the concerns raised here are partly a criticism of Bonferoni corrections, <i>i.e.</i> in technical terms correcting for <a href="http://en.wikipedia.org/wiki/Familywise_error_rate">family-wise error rate (FWER)</a>. It is actually the message that the author wants to convey in his paper. Proponents of controling for <a href="http://en.wikipedia.org/wiki/False_discovery_rate">false discovery rate (FDR)</a> argue that an investigator shouldn&#8217;t be penalized for asking more questions, and the fraction of errors in the answers should be controlled, rather than the absolute value. That said, FDR, while useful, does not answer the problems of publication bias.</p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=163</wfw:commentRss>
		</item>
		<item>
		<title>Want features? Just code</title>
		<link>http://gael-varoquaux.info/blog/?p=162</link>
		<comments>http://gael-varoquaux.info/blog/?p=162#comments</comments>
		<pubDate>Thu, 08 Mar 2012 21:46:52 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[personnal]]></category>

		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[scientific computing]]></category>

		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=162</guid>
		<description><![CDATA[Somebody just sent an email on a user&#8217;s mailing list for an open-source scientific package entitled &#8220;Feature foo: why is package bar not up to the task?&#8221; (names hidden to avoid pointing directly to the responsible of my wrath). To quote him:
Is there ANY plan for having such a module in package bar?? I think (personally) that [...]]]></description>
			<content:encoded><![CDATA[<p>Somebody just sent an email on a user&#8217;s mailing list for an open-source scientific package entitled <strong>&#8220;<em>Feature foo</em>: why is <em>package bar</em> not up to the task?&#8221;</strong> (names hidden to avoid pointing directly to the responsible of my wrath). To quote him:</p>
<blockquote><p>Is there ANY plan for having such a module in <em>package bar</em>?? I think (personally) that this is a MUST DO. This is typically the type of routines that I hear people use in e.g., idl etc. If this could be an optimised, fast (and easy to use) routine, all the better.</p></blockquote>
<p>As some one who spends a fair amount of time working on open source software I hear such remarks quite often. I am finding it harder and harder not to react negatively to these emails. Now I cannot consider myself as a contributor to <em>package bar</em>, and thus I can claim that I am not taking your comment personally.</p>
<p>Why aren&#8217;t package not up to the task? Will, the answer is quite simple: because they are developed by volunteers that do it on their spare time, late at night too often, or companies that put some of their benefits in open source rather in locking down a market. 90% of the time the reason the feature isn&#8217;t as good as you would want it is because of lack of time.</p>
<p>I personally find that suggesting that somebody else should put more of the time and money they are already giving away in improving a feature that you need is almost insulting.</p>
<p>I am aware that people do not realize how small the group of people that develop and maintain their toys is. Borrowing the figure below from <a href="http://www.euroscipy.org/file/6459?vid=download">Fernando Perez&#8217;s talk at Euroscipy</a>, the number of people that do 90% of the grunt work to get the core scientific Python ecosystem going is around two handfuls:</p>
<p><img style="vertical-align: middle;" src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/fperez_euroscipy_2011_contributors.jpg" alt="Commits per contributor in various scientific Python packages, from Fernando Perez" /></p>
<p>I&#8217;d like to think that this recruitment problem is a lack of skill set: users that have the ability to contribute are just too rare. This is not entirely true, there are scores of skilled people on the mailing lists. The poster himself mentioned his email that he was developing a package. I personally started contribution not knowing anything about software development. I struggled, I did the grunt work like maintaining wikis, answer questions on mailing list, and writing documentation. These easier tasks were useful to the community, I think, but must importantly, they taught me a lot because I was investing energy in them.</p>
<div>
<div><strong>If people want things to improve, they will have more successes sending in pull requests than messages on mailing list that sound condescending to my ears.</strong></div>
<div>I hope that I haven&#8217;t overreacted too badly :), that email turned me on. That said, I am not sure that people realize how much they owe to the open source developers breaking their backs on the packages they use.</div>
<div><img style="vertical-align: middle;" src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/fperez_euroscipy_2011_i_want_you.jpg" alt="" width="334" height="444" /></div>
<div>All credit for images goes to <a href="http://fperez.org/">Fernando Perez</a></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=162</wfw:commentRss>
		</item>
		<item>
		<title>Book review: NumPy 1.5 Beginner&#8217;s guide</title>
		<link>http://gael-varoquaux.info/blog/?p=161</link>
		<comments>http://gael-varoquaux.info/blog/?p=161#comments</comments>
		<pubDate>Tue, 10 Jan 2012 07:57:21 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[computational science]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[scientific computing]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=161</guid>
		<description><![CDATA[Packt publishing sent me a copy of NumPy 1.5 Beginner&#8217;s guide by Ivan Idris.


The book actually covers more than only numpy: it is a full introduction to numerical computing with Python. The table of contents is the following:

NumPy Quick Start
Beginning with NumPy Fundamentals
Get into Terms with Commonly Used Functions
Convenience Functions for Your Convenience
Working with Matrices [...]]]></description>
			<content:encoded><![CDATA[<p>Packt publishing sent me a copy of <a href="http://www.packtpub.com/numpy-1-5-using-real-world-examples-beginners-guide/Book">NumPy 1.5 Beginner&#8217;s guide</a> by Ivan Idris.
</p>
<p><iframe align="right" src="http://rcm.amazon.com/e/cm?t=gaelvaro-20&#038;o=1&#038;p=8&#038;l=as1&#038;asins=1849515301&#038;ref=qf_sp_asin_til&#038;fc1=000000&#038;IS2=1&#038;lt1=_blank&#038;m=amazon&#038;lc1=0000FF&#038;bc1=000000&#038;bg1=FFFFFF&#038;f=ifr" style="width:120px;height:240px;" scrolling="no" marginwidth="0" marginheight="0" frameborder="0"></iframe></p>
<p>The book actually covers more than only <a href="http://numpy.scipy.org/">numpy</a>: it is a full introduction to numerical computing with Python. The <a href="http://www.packtpub.com/toc/numpy-15-beginners-guide-table-contents">table of contents</a> is the following:</p>
<ul>
<li>NumPy Quick Start</li>
<li>Beginning with NumPy Fundamentals
<li>Get into Terms with Commonly Used Functions</li>
<li>Convenience Functions for Your Convenience</li>
<li>Working with Matrices and ufuncs</li>
<li>Move Further with NumPy Modules</li>
<li>Peeking Into Special Routines</li>
<li>Assure Quality with Testing</li>
<li>Plotting with Matplotlib</li>
<li>When NumPy is Not Enough: SciPy and Beyond</li>
</ul>
<p>The book is easy to read, as it requires no specific expertise other than knowing basic Python programming. It is full of examples and exercises, which is really great for learning. I find the style of the author, Ivan Idris, particularly amusing and relaxing, engaging the reader with questions, challenges, or even jokes (<i>&#8220;Have a go hero&#8221;</i>).</p>
<p>With regards to the formatting and the print, the book is written in large fonts, with sectioning information, tips and exercises clearly standing out.</p>
<p>It is full of practical information, such as how to install the software, or where to get help. Finally, One thing that I appreciated, is that the examples are typed in <a href="http://ipython.org/">IPython</a>. Each time I teach, I like to use IPython, because it is full of features to help plotting, debugging and profiling numerical code. The book even has a little introduction to some useful IPython features.</p>
<p>After an introduction to the work flow, the book explores array manipulation such as creation or reshaping, followed by some simple numerics and the battery of array-based operations on functions and polynomials. Then it presents linear algebra and signal processing basics (FFT). It also covers the financial functions that are present in numpy and mentions testing, which is very important to achieve quality code. The book finishes with matplotlib and scipy, two modules that are important to know to go further.</p>
<p>The examples are mostly drawn from statistics or financial applications, such as computing running averages on stock quotes. Basic math explanations, such as the definition of the Moore-Penrose pseudo-inverse, are given when needed.</p>
<p>To conclude, I enjoyed this book and I think that it is a nice addition to my library. It answers exactly it&#8217;s title: it is well-suited for beginners wanting to learn numpy. On the other hand, I would not recommend it as a reference material, or as a book to learn more general scientific or numerical computing with Python.</p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=161</wfw:commentRss>
		</item>
		<item>
		<title>Joblib beta release: fast compressed persistence + Python 3</title>
		<link>http://gael-varoquaux.info/blog/?p=159</link>
		<comments>http://gael-varoquaux.info/blog/?p=159#comments</comments>
		<pubDate>Sat, 07 Jan 2012 18:27:04 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[scientific computing]]></category>

		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=159</guid>
		<description><![CDATA[Joblib 0.6: better I/O and Python 3 support
Happy new year, every one. I have just released Joblib 0.6.0 beta. The highlights of the 0.6 release are a reworked enhanced pickler, and Python 3 support.
Many thanks go to the contributors to the 0.5.X series (Fabian Pedregosa, Yaroslav Halchenko, Kenneth C. Arnold, Alexandre Gramfort, Lars Buitinck, Bala [...]]]></description>
			<content:encoded><![CDATA[<h1>Joblib 0.6: better I/O and Python 3 support</h1>
<p>Happy new year, every one. I have just released <a href="">Joblib</a> 0.6.0 beta. The highlights of the 0.6 release are a reworked enhanced pickler, and Python 3 support.</p>
<p>Many thanks go to the contributors to the 0.5.X series (Fabian Pedregosa, Yaroslav Halchenko, Kenneth C. Arnold, Alexandre Gramfort, Lars Buitinck, Bala Subrahmanyam Varanasi, Olivier Grisel, Ralf Gommers, Juan Manuel Caicedo Carvajal, and myself). In particular Fabian made sure that Joblib worked under Python 3. </p>
<p>In this blog post, I&#8217;d like to discuss a bit more the compressed persistence engine, as it illustrates well key factors in implementing and using compressed serialization. </p>
<h1>Fast compressed persistence</h1>
<p>One of the key components of joblib is it&#8217;s ability to persist arbitrary Python objects, and read them back very quickly. It is particularly efficient for <strong>containers that do their heavy lifting with numpy arrays</strong>. The trick to achieving great speed has been to save in separate files the numpy arrays, and load them via <strong>memmapping</strong>.</p>
<p>However, one drawback of joblib, is that the caching mechanism may end up using a lot of disk space. As a result, there is strong interest in having <strong>compressed storage</strong>, provided it doesn&#8217;t slow down the library too much. Another use case that I have in mind for fast compressed persistence, is implementing <a href="http://en.wikipedia.org/wiki/Out-of-core_algorithm">out of core computation</a>.</p>
<p>There are some great compressed I/O libraries for Python, for instance <a href='http://pytables.github.com/index.html'>Pytables</a>. You may wonder why the need to code yet another one. The answer is that joblib is <strong>pure Python, depending only on the standard library</strong> (numpy is optional), but also that the goal here is <strong>black-box persistence of arbitrary objects</strong>.</p>
<h2>Comparing I/O speed and compression to other libraries</h2>
<p>Implementing efficient compressed storage was a bit of a struggle and I learned a lot. Rather than going into the details straight away, let me first discuss a few benchmarks of the resulting code. Benching such feature is very hard, first because you are fighting with the disk cache, second because they performances depends very much on the data at hand (some data compress better than others), last because they are three interesting metrics: disk space used, write speed, and read speed.</p>
<p><strong>Dataset used</strong> - I chose to compare the different strategies on some datasets that I work with, namely the probabilistic brain atlases MNI 1mm (62Mb uncompressed) and Juelich 2mm (105Mb uncompressed). Whether the data is represented as a Fortran-ordered array, or a C-ordered array is important for the I/O performance. This data is normally stored to disk compressed using the domain-specific Nifti format (<i>.nii</i> files), accessed in Python with  the <a href="http://nipy.sourceforge.net/nibabel/">Nibabel</a> library.
</p>
<p><strong>Libraries used</strong> - I benched different compression strategies in joblib against Nibabel&#8217;s Nifti I/O, compressed or not, and against using Pytables to store the data buffer (without the meta-informations). Pytables exposed a variety of compression strategies, with different speed compromises. In addition, I benched numpy&#8217;s builtin <i>save_compressed</i>.</p>
<p>I would like to stress that I am comparing a general purpose persistence engine (joblib) to specific I/O libraries either optimized for the data (Nifti), or requiring some massaging to enable persistence (pytables).</p>
<p><center><img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/disk.png" width=70%><br />
<br/></p>
<p><img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/write.png" width=70%><br />
<br/></p>
<p><img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/read.png" width=70%></p>
<p><br/></p>
<p><i>Comparing to other libraries</i></center></p>
<p>Actual numbers can be found <a href="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/results_nii.csv">here</a>.</p>
<p><strong>Take home messages</strong> - The graphs are not crystal-clear, but a few tendencies appear:
<ul>
<li>Pytables with LZO or blosc compression is the king of the hill for read and write speed.</li>
<li>I/O of compressed data is often faster than with uncompressed data for a good compression algorithm.</li>
<li>Joblib with Zlib compression level 1 performs honorably in terms of speed with only the Python standard library and no compiled code.</li>
<li>Read time of memmapping (with nibabel or joblib) is negligeable (it is tiny on the graphs), however the loading time appears when you start accessing the data.</li>
<li>Passing in arrays with a memory layout (Fortran versus C order) that the I/O library doesn&#8217;t expect can really slow down writing. </li>
<li>Compressing with Zlib compression-level 1 gets you most of the disk space gains for a reasonable cost in write/read speed.</li>
<li>Compressing with Zlib compression-level 9 (not shown on the figures) doesn&#8217;t buy you much in disk space, but costs a lot in writing time.</li>
</ul>
<h2>Benching datasets richer than pure arrays</h2>
<p>The datasets used so far are pretty much composed of one big array, a 4D smooth spatial map. I wanted to test on more datasets, to see how the performances varied with data type and richness. For this, I used the datasets of the <a href="http://scikit-learn.org">scikit-learn</a>, real life data of various nature, described <a href="http://scikit-learn.org/stable/datasets/index.html">here</a>:</p>
<ul>
<li><strong>20 news</strong> - 20 usenet news group: this data mainly consists of text, and not numpy arrays.</li>
<li><strong>LFW people</strong> - Labeled faces in the wild, many pictures of different people&#8217;s face.</li>
<li><strong>LFW pairs</strong> - Labeled faces in the wild, pairs of pictures for each individual. This is a high entropy dataset, it does not have much redundant information.</li>
<li><strong>Olivetti</strong> - Olivetti dataset: centered pictures of faces.</li>
<li><strong>Juelich(F)</strong> - Our previous Juelich atlas</li>
<li><strong>Big people</strong> - The LFW people dataset, but repeated 4 times, to put a strain on memory resources.</li>
<li><strong>MNI(F)</strong> - Our previous MNI atlas</li>
<li><strong>Species</strong> - Occurence of species measured in latin America, with a lot of missing data.</li>
</ul>
<p><img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/joblib_disk.png" width=32%> <img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/joblib_write.png" width=32%> <img src="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/joblib_read.png" width=32%></p>
<p><center><i>Testing compression strategies on various datasets</i></center></p>
<p>Actual numbers can be found <a href="http://gael-varoquaux.info/blog/wp-content/uploads/2012/joblib_rel_0.6_speed/joblib_results.csv">here</a>.</p>
<p><strong>What this tells us</strong> - The main message from these benchmarks is that datasets with redundant information, i.e. that compress well, give fast I/O. This is not surprising. In particular, good compression can give good I/O on text (20 news). Another result, more of a sanity check, is that compressed I/O on big data (Big people, ) works as well as on smaller data. Earlier code would start to swap. Finally, I conclude from these graphs, that compression levels from 1 to 3 buy you most of the gains for reasonable costs, and that going up to 9 is not recommended, unless you know that your data can be compressed a lot (species).</p>
<h2>Lessons learned</h2>
<p>I&#8217;ll keep this paragraph short, because the information is really in <a href="https://github.com/joblib/joblib/blob/0.5.X/joblib/numpy_pickle.py">joblib&#8217;s code and comments</a>. Don&#8217;t hesitate to have a look, it&#8217;s BSD-licenced, so you are free to borrow what you please.</p>
<ol>
<li>Memory copies, of arrays, but also of strings and byte streams can really slow you down with big data.</li>
<li>To avoid copies with numpy arrays, fully embrace numpy&#8217;s strided memory model. For instance, you do not need to save arrays in C order, if they are given to you in a different order. Accessing the memory in the wrong striding direction explains the poor write performance of pytables on Fortran-ordered Juelich.</li>
<li>When dealing with the file system, the OS makes so much magic (e.g. prefetching) that clever hacks tend not to work: always benchmark.</li>
<li>Depending on the size of the data, it may be more efficient to store subsets in different files: it introduces &#8216;chunk&#8217; that avoid filling in the memory too much (parameter <i>cache_size</i> in joblib&#8217;s code). In addition, data of a same nature tends to compress better.</li>
<li>The I/O stream or file object interfaces are abstractions that can hide the data movement and the creation of large temporaries. After experiments with GZipFile and StringIO/BytesIO I found it more efficient to fall back to passing around big buffer object, numpy arrays, or strings.</li>
<li>For reasons 4 and 5, I ended up avoiding the gzip module: raw access to the zlib with buffers gives more control. This explains a good part of the differences in read speed for pure arrays with numpy&#8217;s <i>save_compressed</i>.</li>
</ol>
<p>One of my conclusions for joblib, is that I&#8217;ll probably use Pytables as an optional backend for persistence in a future release.</p>
<h2>Details on the benchmarks</h2>
<p>These benchmarks where run on a Dell Lattitude D630 laptop. That&#8217;s a dual-core Intel Core2 Duo box, with 2M of CPU cache.</p>
</p>
<p>The code for the benchmarks below can be found on <a href="https://gist.github.com/1551250">a gist</a>.</p>
<h2>Thanks</h2>
<p>I&#8217;d like to that Francesc Alted for very useful feedback he gave on this topics. In particular, the <a href="http://sourceforge.net/mailarchive/message.php?msg_id=28609087">following thread</a> on the pytables mailing-list may be of interest to the reader.</p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=159</wfw:commentRss>
		</item>
		<item>
		<title>Scikit-learn NIPS 2011 sprint: international thanks to our sponsors</title>
		<link>http://gael-varoquaux.info/blog/?p=158</link>
		<comments>http://gael-varoquaux.info/blog/?p=158#comments</comments>
		<pubDate>Fri, 18 Nov 2011 13:47:59 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[mayavi]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=158</guid>
		<description><![CDATA[The NIPS conference: time for a sprint. The NIPS conference, one of the major conferences in machine learning, is hosted in Granada this year. I believe that it is the first time that it is hosted in Europe. As many of the scikit-learn developers are part of the wider NIPS community, but also many live [...]]]></description>
			<content:encoded><![CDATA[<p><strong>The NIPS conference: time for a sprint.</strong> The <a href="http://nips.cc/">NIPS conference</a>, one of the major conferences in machine learning, is hosted in Granada this year. I believe that it is the first time that it is hosted in Europe. As many of the <a href="http://scikit-learn.org">scikit-learn</a> developers are part of the wider NIPS community, but also many live in Europe, we jumped on the occasion to organize a truly international sprint: the <a href="http://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events">NIPS 2011 scikit-learn sprint</a>. </p>
<p><strong>Finding money.</strong> As often with open source development, a lot of our contributors are young people, investing their free time outside of any request from their hierarchy. In such a situation, it can be hard to find travel money. So we started looking for sponsors. We needed to find a decent sum of money, as we were flying people in from places such as the West coast of the US, or even Japan. The good news is that we found money, and between supervisors pitching in, universities giving travel grants, and our generous sponsors, there will be an impressive list of contributors from all over the world at the sprint. </p>
<p><strong>Thanks to our sponsors.</strong> The first people that we need to thank are Google, who gave us a sizable sponsorship, and the <a href="http://www.python.org/psf/">PSF</a>, who made Google&#8217;s sponsorship possible through their accounting and sprints programs. We also need to thanks our other sponsors, namely <a href="http://www.tinyclues.com/">Tinyclues</a>. Thanks to these sponsors, and additional investment from many universities and research group, we have been able to gather a total of 12 contributors in Granada, a handful coming from overseas. Also, we are indebted to the <a href="http://www.ugr.es/">University of Granada</a>, and the Gnu/Linux Granada Group (GGG), who are providing hosting for the sprint, as well as Régine Bricquet, from INRIA, who did a lot of the trip planing for the sponsored people. </p>
<p>I am very much looking forward to the sprint. It will be the first time that meet in real life many of the contributors, and judging by the warmness of the on-line exchanges, it will be a great moment. Besides, Granada is known to be a lively and historical city. </p>
<p>If you are around and want to join us, to work on Python in machine learning, send us a mail on the <a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general">mailing list</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=158</wfw:commentRss>
		</item>
		<item>
		<title>Cython example of exposing C-computed arrays in Python without data copies</title>
		<link>http://gael-varoquaux.info/blog/?p=157</link>
		<comments>http://gael-varoquaux.info/blog/?p=157#comments</comments>
		<pubDate>Wed, 28 Sep 2011 22:42:40 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[scientific computing]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=157</guid>
		<description><![CDATA[Colleagues who are exposing a numerical C code in Python asked me for some advice on the best way to pass arrays from C to Python avoiding copies. They had Cython in mind, and I must agree with them that I have found the Cython code to be more maintainable than hand-written Python C-API code.

When [...]]]></description>
			<content:encoded><![CDATA[<p>Colleagues who are exposing a numerical C code in Python asked me for some advice on the best way to pass arrays from C to Python avoiding copies. They had Cython in mind, and I must agree with them that I have found the Cython code to be more maintainable than hand-written Python C-API code.
</p>
<p>When writing my answer, I found out that there was no self-contained example of creating numpy arrays from existing data in Cython. Thus I created my own. The full code with readme build and demo scripts is available on a <a href="https://gist.github.com/1249305">gist</a>. Here I only give an executive summary.
</p>
<p>
The core functionality is implemented by the <a href="http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#PyArray_SimpleNewFromData">PyArray_SimpleNewFromData</a> function of the C API of numpy that can create an ndarray from a pointer to the data, a simple data type, and the shape of the data. The Cython file just builds around that function:
</p>
<p><script src="https://gist.github.com/1249305.js?file=cython_wrapper.pyx"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=157</wfw:commentRss>
		</item>
		<item>
		<title>Python at scientific conferences</title>
		<link>http://gael-varoquaux.info/blog/?p=156</link>
		<comments>http://gael-varoquaux.info/blog/?p=156#comments</comments>
		<pubDate>Sun, 11 Sep 2011 14:52:54 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[computational science]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[science]]></category>

		<category><![CDATA[scientific computing]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=156</guid>
		<description><![CDATA[
Top notch scientific conferences are starting to add Python tracks to their program. This is good news. Indeed, it scientific Python conferences (namely Scipy, EuroSciPy and Scipy India) are doing great to get together people who have already heard about Python for science, but we need to reach out to specific Python communities to maximize [...]]]></description>
			<content:encoded><![CDATA[<p>
Top notch scientific conferences are starting to add Python tracks to their program. This is good news. Indeed, it scientific Python conferences (namely <a href="http://conference.scipy.org/scipy2011/">Scipy</a>, <a href="http://www.euroscipy.org/">EuroSciPy</a> and <a href="http://scipy.in/scipyin/2011/">Scipy India</a>) are doing great to get together people who have already heard about Python for science, but we need to reach out to specific Python communities to maximize impact.
</p>
<h1>ESCO 2012 - European Seminar on Coupled Problems</h1>
<p><a href="http://esco2012.femhub.com/">ESCO 2012</a> is the 3rd event in a series of interdisciplineary meetings dedicated to computational science challenges in multi-physics and PDEs.
</p>
<p>
I was invited as ESCO last year. It was an aboslute pleasure, because it is a small conference that is very focused on discussions. I learned a lot and could sit down with people who code top notch PDE libraries such as FEniCS and have technical discussions. Besides, it is hosted in the historical brewery where the Pilsner was invented. Plenty of great beer.
</p>
<p> <strong>Application areas</strong> Theoretical results as well as applications are welcome. Application areas include, but are not limited to: Computational electromagnetics, Civil engineering, Nuclear engineering, Mechanical engineering, Computational fluid dynamics, Computational geophysics, Geomechanics and rock mechanics, Computational hydrology, Subsurface modeling, Biomechanics, Computational chemistry, Climate and weather modeling, Wave propagation, Acoustics, Stochastic differential equations, and Uncertainty quantification. </p>
<p><strong>Minisymposia</strong></p>
<ul>
<li>Multiphysics and Multiscale Problems in Civil Engineering
</li>
<li>Modern Numerical Methods for ODE
</li>
<li>Porous Media Hydrodynamics
</li>
<li>Nuclear Fuel Recycling Simulations
</li>
<li>Adaptive Methods for Eigenproblems
</li>
<li>Discontinuous Galerkin Methods for Electromagnetics
</li>
<li>Undergraduate Projects in Technical Computing
</ul>
</p>
<p><strong>Software afternoon</strong> Important part of each ESCO conference is a software afternoon featuring software projects by participants. Presented can be any computational software that has reached certain level of maturity, i.e., it is used outside of the author&#8217;s institution, and it has a web page and a user documentation. If you would like to present your software project, let us know soon.
</p>
<p><strong>Proceedings</strong> For each ESCO we strive to reserve a special issue of an international journal with impact factor. Proceedings of ESCO 2008 appeared in Math. Comput. Simul., proceedings of ESCO 2010 in CiCP and Appl. Math. Comput. Proceedings of ESCO 2012 will appear in Computing.
</p>
<p>
<strong>Important Dates</strong></p>
<ul>
<li>December 15, 2011: Abstract submission deadline.</li>
<li>December 15, 2011: Minisymposia proposals.</li>
<li>January 15, 2012: Notification of acceptance.</li>
</ul>
<h1>PyHPC: Python for High performance computing</h1>
<p>If you are doing super computing, <a href="http://sc11.supercomputing.org/">SC11, the Super Computing conference</a> is <i>the</i> reference conference. This year there will a workshop on high performance computing with Python: <a href="http://www.dlr.de/sc/desktopdefault.aspx/tabid-1183/1638_read-31733/">PyHPC</a>.
</p>
<p>At the scipy conference, I was having a discussion with some of the attendees on how people often still do process management and I/O with Fortran in the big computing environment. This is counter productive. However, has success stories of supercomputing folks using high-level languages are not advertized, this is bound to stay. Come and tell us how you use Python for high performance computing!</p>
<p>
<strong>Topics</strong></p>
<ul>
<li>Python-based scientific applications and libraries
</li>
<li>High performance computing
</li>
<li>Parallel Python-based programming languages
</li>
<li>Scientific visualization
</li>
<li>Scientific computing education
</li>
<li>Python performance and language issues
</li>
<li>Problem solving environments with Python
</li>
<li>Performance analysis tools for Python application
</li>
</ul>
<p>
<strong>Papers</strong> We invite you to submit a paper of up to 10 pages via the submission site. Authors are encouraged to use IEEE two column format.
</p>
<p><strong>Important Dates</strong></p>
<ul>
<li>Full paper submission: September 19, 2011</li>
<li>Notification of acceptance: October 7, 2011</li>
<li>Camera-ready papers: October 31, 2011</li>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=156</wfw:commentRss>
		</item>
		<item>
		<title>Conference posters</title>
		<link>http://gael-varoquaux.info/blog/?p=155</link>
		<comments>http://gael-varoquaux.info/blog/?p=155#comments</comments>
		<pubDate>Mon, 05 Sep 2011 03:15:02 +0000</pubDate>
		<dc:creator>gael</dc:creator>
		
		<category><![CDATA[computational science]]></category>

		<category><![CDATA[machine learning]]></category>

		<category><![CDATA[python]]></category>

		<category><![CDATA[science]]></category>

		<category><![CDATA[scientific computing]]></category>

		<guid isPermaLink="false">http://gael-varoquaux.info/blog/?p=155</guid>
		<description><![CDATA[
At the request of a friend, I am putting up some of the posters that I recently presented at conferences.







Large-scale functional-connectivity graphical models for individual subjects using population prior.
This is a poster for our NIPS work








Multi-subject dictionary learning to segment an atlas of brain spontaneous activity.
This is a poster for our IPMI work








Mayavi for 3D [...]]]></description>
			<content:encoded><![CDATA[<p>
At the request of a friend, I am putting up some of the posters that I recently presented at conferences.
</p>
<table>
<tr>
<td>
<a href="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_nips.pdf"><br />
<img src="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_nips.png"  width=200></a></td>
<td>
<strong>Large-scale functional-connectivity graphical models for individual subjects using population prior.</strong><br />
This is a poster for <a href="http://hal.inria.fr/inria-00512451/en">our NIPS work</a></td>
</tr>
</table>
<table>
<tr>
<td>
<a href="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_ipmi.pdf"><br />
<img src="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_ipmi.png"  width=200></a></td>
<td>
<strong>Multi-subject dictionary learning to segment an atlas of brain spontaneous activity.</strong><br />
This is a poster for <a href="http://hal.inria.fr/inria-00588898/en">our IPMI work</a></td>
</tr>
</table>
<table>
<tr>
<td>
<a href="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_mayavi.pdf"><br />
<img src="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_mayavi.png"  width=200></a></td>
<td>
<strong>Mayavi for 3D visualization of neuroimaging data: powerful scripting and reusable components in Python.</strong></td>
</tr>
</table>
<table>
<tr>
<td>
<a href="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_scikit.pdf"><br />
<img src="http://gael-varoquaux.info/blog/wp-content/uploads/2011/poster_scikit.png"  width=200></a></td>
<td><strong>Machine learning for fMRI in Python: inverse inference with scikit-learn.</strong></td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://gael-varoquaux.info/blog/?feed=rss2&amp;p=155</wfw:commentRss>
		</item>
	</channel>
</rss>

