<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Gaël Varoquaux - programming</title><link href="https://gael-varoquaux.info/" rel="alternate"></link><link href="https://gael-varoquaux.info/feeds/programming.atom.xml" rel="self"></link><id>https://gael-varoquaux.info/</id><updated>2026-01-14T00:00:00+01:00</updated><entry><title>Stepping up as probabl’s CSO to supercharge scikit-learn and its ecosystem</title><link href="https://gael-varoquaux.info/programming/stepping-up-as-probabls-cso-to-supercharge-scikit-learn-and-its-ecosystem.html" rel="alternate"></link><published>2026-01-14T00:00:00+01:00</published><updated>2026-01-14T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2026-01-14:/programming/stepping-up-as-probabls-cso-to-supercharge-scikit-learn-and-its-ecosystem.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="../programming/attachments/probabl_team_2025.png" style="width: 400px;" /&gt;
&lt;p class="caption"&gt;Probabl’s get together, in falls 2025&lt;/p&gt;
&lt;/div&gt;
&lt;p class="last"&gt;I’m thrilled to announce that I’m stepping up as &lt;a class="reference external" href="https://probabl.ai/?utm_source=employee_blog&amp;amp;utm_medium=social_employee&amp;amp;utm_campaign=202601_probabl_awareness_post"&gt;Probabl&lt;/a&gt;’s CSO (Chief Science Officer) to supercharge
scikit-learn and its ecosystem, pursuing my dreams of tools that help go
from data to impact.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="scikit-learn-a-central-tool"&gt;
&lt;h2&gt;Scikit-learn, a central tool&lt;/h2&gt;
&lt;p&gt;Scikit-learn is central …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="../programming/attachments/probabl_team_2025.png" style="width: 400px;" /&gt;
&lt;p class="caption"&gt;Probabl’s get together, in falls 2025&lt;/p&gt;
&lt;/div&gt;
&lt;p class="last"&gt;I’m thrilled to announce that I’m stepping up as &lt;a class="reference external" href="https://probabl.ai/?utm_source=employee_blog&amp;amp;utm_medium=social_employee&amp;amp;utm_campaign=202601_probabl_awareness_post"&gt;Probabl&lt;/a&gt;’s CSO (Chief Science Officer) to supercharge
scikit-learn and its ecosystem, pursuing my dreams of tools that help go
from data to impact.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="scikit-learn-a-central-tool"&gt;
&lt;h2&gt;Scikit-learn, a central tool&lt;/h2&gt;
&lt;p&gt;Scikit-learn is central to data-scientists’ work: it is &lt;strong&gt;the most used
machine-learning package&lt;/strong&gt;. It has grown over more than a decade,
supported by volunteers’ time, donations, and grant funding, with a
central role of Inria.&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="../programming/attachments/scikit-learn_clickpy_2025.png" style="width: 350px;" /&gt;
&lt;p class="caption"&gt;Scikit-learn download numbers; &lt;a class="reference external" href="https://clickpy.clickhouse.com/dashboard/scikit-learn"&gt;reproduce and explore on clickpy&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;And the usage numbers keep going up…&lt;/p&gt;
&lt;p&gt;Scikit-learn keeps growing because it enables crucial applications:
machine-learning that can be easily adapted to a given application. This
type of AI does not make the headlines, but it is central to the value
brought by data science. It is used across the board to extract insights
from data and automate business-specific processes, thus ensuring
function and efficiency of a wide variety of activities.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;And scikit-learn is quietly but steadily advancing. The recent releases
bring progress in all directions: computational foundations (&lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_8_0.html#array-api-support-enables-gpu-computations"&gt;the array
API enabling GPU support&lt;/a&gt;),
user interface (&lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_8_0.html#html-representation-of-estimators"&gt;rich HTML displays&lt;/a&gt;),
new models (eg &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/generated/sklearn.cluster.HDBSCAN.html"&gt;HDBSCAN&lt;/a&gt;,
&lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_8_0.html#temperature-scaling-in-calibratedclassifiercv"&gt;temperature-scaling recalibration&lt;/a&gt; …), and always algorithmic
improvements (release 1.8 brought &lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_8_0.html#efficiency-improvements-in-linear-models"&gt;marked speed ups to linear models&lt;/a&gt; or
&lt;a class="reference external" href="https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_8_0.html#decisiontreeregressor-with-criterion-absolute-error"&gt;trees with MAE&lt;/a&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-new-opportunity-to-boost-scikit-learn-and-its-ecosystem"&gt;
&lt;h2&gt;A new opportunity to boost scikit-learn and its ecosystem&lt;/h2&gt;
&lt;p&gt;Probabl recently raised a &lt;a class="reference external" href="https://blog.probabl.ai/probabl-raises-a-13m-in-seed-to-accelerate-enterprise-grade-ai?utm_source=employee_blog&amp;amp;utm_medium=social_employee&amp;amp;utm_campaign=202601_blog_awareness_post"&gt;beautiful seed funding&lt;/a&gt;
from investors who really understand the value and perspective of
scikit-learn. We have a unique opportunity to accelerate scikit-learn’s
development. Our analysis is that &lt;strong&gt;enterprises need dedicated tooling and
partners to build best on scikit-learn&lt;/strong&gt;, and we’re hard at work to provide
this.&lt;/p&gt;
&lt;p&gt;2/3rd of probabl’s founders are scikit-learn contributors and we have
been investing in all aspects of scikit-learn: features, releases,
communication, documentation, and training. In addition, part of
scikit-learn’s success has always been to nurture an ecosystem, for
instance via its simple API that has become a standard. Thus Probabl is
not only consolidating scikit-learn, but also this ecosystem: the &lt;a class="reference external" href="https://skops.readthedocs.io/en/stable/"&gt;skops
project, to put scikit-learn based models in production&lt;/a&gt;, the &lt;a class="reference external" href="https://skrub-data.org"&gt;skrub project, that
facilitates data preparation&lt;/a&gt;, the &lt;a class="reference external" href="https://skore.probabl.ai/?utm_source=employee_blog&amp;amp;utm_medium=social_employee&amp;amp;utm_campaign=202601_skore_awareness_post"&gt;young skore
project to track data science&lt;/a&gt;, &lt;a class="reference external" href="https://fairlearn.org/"&gt;fairlearn
to help avoiding machine learning that discriminates&lt;/a&gt;, and more upstream projects, such as &lt;a class="reference external" href="https://joblib.readthedocs.io/en/stable/"&gt;joblib
for parallel computing&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="my-obsession-as-probabl-cso-serving-the-data-scientists"&gt;
&lt;h2&gt;My obsession as Probabl CSO: serving the data scientists&lt;/h2&gt;
&lt;p&gt;As CSO (Chief Science Officer) at Probabl, my role is to nourish our
development strategy with understanding of machine learning, data
science, and open source. Making sure that &lt;strong&gt;scikit-learn and its
ecosystem are enterprise ready&lt;/strong&gt; will bring resources for scikit-learn’s
sustainability, enabling its ecosystem to grow into a standard-setting
platform for the industry, that continues &lt;strong&gt;to serve data scientists&lt;/strong&gt;.
This mission will require consolidating the existing tools and patterns,
and inventing new ones.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Probabl is in a unique position for this endeavor: Our core is an amazing
team of engineers with deep knowledge of data science. Working directly
with businesses gives us an acute understanding of where the ecosystem
can be improved. On this topic, I also profoundly enjoy working with
people who have a different DNA than the historical DNA of scikit-learn,
with product research, marketing, and business mindsets. I believe that
the union of our different cultures will make the scikit-learn ecosystem
better.&lt;/p&gt;
&lt;p&gt;Beyond the Probabl team, we have an amazing community, with a broader
group of scikit-learn contributors who do an amazing job bringing
together what makes scikit-learn so versatile, with a deep ecosystem of
Python data tools enriched by so many different actors. I’m deeply
greatful to the many scikit-learn and pydata contributors. At Probabl, we
are very attuned to enabling the open-source contributor community. Such
a community is what enables a single tool, scikit-learn, to serve a long
tail of diverse usages.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="open source"></category><category term="growth"></category><category term="communities"></category><category term="scikit-learn"></category><category term="inria"></category><category term="probabl"></category></entry><entry><title>Skrub 0.2.0: tabular learning made easy</title><link href="https://gael-varoquaux.info/programming/skrub-020-tabular-learning-made-easy.html" rel="alternate"></link><published>2024-07-03T00:00:00+02:00</published><updated>2024-07-03T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2024-07-03:/programming/skrub-020-tabular-learning-made-easy.html</id><summary type="html">&lt;img alt="" class="align-center" src="attachments/skrub_schematic.png" style="width: 500px;" /&gt;
&lt;p&gt;We just released &lt;a class="reference external" href="https://skrub-data.org"&gt;skrub 0.2.0&lt;/a&gt;. This release
markedly simplifies learning on complex dataframes.&lt;/p&gt;
&lt;div class="section" id="model-tabular-learner-classifier"&gt;
&lt;h2&gt;&lt;cite&gt;model = tabular_learner(‘classifier’)&lt;/cite&gt;&lt;/h2&gt;
&lt;div class="align-right docutils container"&gt;
Simple, yet solid default baseline&lt;/div&gt;
&lt;p&gt;The highlight of the release is the &lt;a class="reference external" href="https://skrub-data.org/stable/generated/skrub.tabular_learner.html"&gt;tabular_learner&lt;/a&gt;
function, which facilitates creating pipelines that readily perform
machine learning on dataframes, adding preprocessing to a scikit-learn
compatible learner …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;img alt="" class="align-center" src="attachments/skrub_schematic.png" style="width: 500px;" /&gt;
&lt;p&gt;We just released &lt;a class="reference external" href="https://skrub-data.org"&gt;skrub 0.2.0&lt;/a&gt;. This release
markedly simplifies learning on complex dataframes.&lt;/p&gt;
&lt;div class="section" id="model-tabular-learner-classifier"&gt;
&lt;h2&gt;&lt;cite&gt;model = tabular_learner(‘classifier’)&lt;/cite&gt;&lt;/h2&gt;
&lt;div class="align-right docutils container"&gt;
Simple, yet solid default baseline&lt;/div&gt;
&lt;p&gt;The highlight of the release is the &lt;a class="reference external" href="https://skrub-data.org/stable/generated/skrub.tabular_learner.html"&gt;tabular_learner&lt;/a&gt;
function, which facilitates creating pipelines that readily perform
machine learning on dataframes, adding preprocessing to a scikit-learn
compatible learner. The function packs defaults and heuristics
to transform all forms of dataframes to a representation that is well
suited to a learner, and it can adapt these transformation:
&lt;cite&gt;tabular_learner(HistGradientBoostingClassifier())&lt;/cite&gt; encodes categories
differently than &lt;cite&gt;tabular_learner(LogisticRegression())&lt;/cite&gt;.&lt;/p&gt;
&lt;p&gt;The heuristics are tuned based on much benchmarking and experience shows
that they give good tradeoffs. The default
&lt;cite&gt;tabular_learner(‘classifier’)&lt;/cite&gt; is often a strong baseline.&lt;/p&gt;
&lt;p&gt;The benefit are visible in a really simple example:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
&amp;gt;&amp;gt;&amp;gt; # First retrieve data
&amp;gt;&amp;gt;&amp;gt; from skrub.datasets import fetch_employee_salaries
&amp;gt;&amp;gt;&amp;gt; dataset = fetch_employee_salaries()
&amp;gt;&amp;gt;&amp;gt; df = dataset.X
&amp;gt;&amp;gt;&amp;gt; y = dataset.y
&amp;gt;&amp;gt;&amp;gt; # The dataframe is a quite rich and complex dataframe, with various columns
&amp;gt;&amp;gt;&amp;gt; df
&lt;/pre&gt;
&lt;img alt="" src="attachments/employee_salaries_df.png" /&gt;
&lt;p&gt;We can then easily build a learner that applies readily to this
dataframe, without any transformation:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
&amp;gt;&amp;gt;&amp;gt; from skrub import tabular_learner
&amp;gt;&amp;gt;&amp;gt; learner = tabular_learner('regressor')
&amp;gt;&amp;gt;&amp;gt; # The resulting learner can apply all the machine-learning conveniences (eg cross-validation) directly on the dataframe
&amp;gt;&amp;gt;&amp;gt; from sklearn.model_selection import cross_val_score
&amp;gt;&amp;gt;&amp;gt; cross_val_score(learner, df, y)
array([0.89370447, 0.89279068, 0.92282557, 0.92319094, 0.92162666])
&lt;/pre&gt;
&lt;/div&gt;
&lt;div class="section" id="transformer-tablevectorizer"&gt;
&lt;h2&gt;&lt;cite&gt;transformer = TableVectorizer()&lt;/cite&gt;&lt;/h2&gt;
&lt;div class="align-right docutils container"&gt;
Making encoding complex dataframes easy&lt;/div&gt;
&lt;p&gt;Behind the hood, the work is done by the &lt;a class="reference external" href="https://skrub-data.org/stable/generated/skrub.TableVectorizer.html"&gt;skrub.TableVectorizer()&lt;/a&gt;, a
scikit-learn compatible transformer that facilitates combining multiple
transformations on the different columns of a dataframe. The
TableVectorizer is not new in the 0.2.0 release, but we have completely
revamped its internals to cover really well edge cases. Indeed, one
challenge is to make sure that nothing different or strange happens at
test time. Actually, enforcing consistency between train-time and
test-time transformation is the real value of skrub compared to using
pandas or polars to do transformation.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="increasing-support-of-polars"&gt;
&lt;h2&gt;Increasing support of polars&lt;/h2&gt;
&lt;div class="align-right docutils container"&gt;
Short-term goal of optimized support for pandas and polars&lt;/div&gt;
&lt;p&gt;We have implemented a new mechanism for supporting both pandas and
polars. It has not been applied on all the codebase, hence the support is
still imperfect. However, we are seeing increasing support for polars in
skrub, and our goal in the short term is to provide rock-solid polar
support.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img alt="" class="align-right" src="attachments/skrub_logo.png" style="width: 200px;" /&gt;
&lt;p&gt;Try skrub out! It’s still young, but in my opinion, it provides a lot
of value to tabular learning.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="skrub"></category><category term="scikit-learn"></category><category term="tabular"></category><category term="machine learning"></category><category term="open source"></category><category term="software"></category></entry><entry><title>Promoting open-source, from inria to :probabl.</title><link href="https://gael-varoquaux.info/programming/promoting-open-source-from-inria-to-probabl.html" rel="alternate"></link><published>2024-06-09T00:00:00+02:00</published><updated>2024-06-09T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2024-06-09:/programming/promoting-open-source-from-inria-to-probabl.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;img alt="" class="align-right" src="../programming/attachments/scikit-learn_at_probabl.png" style="width: 300px;" /&gt;
&lt;p class="last"&gt;Open-source efforts around scikit-learn at Inria are spinning off to a
new enterprise, &lt;a class="reference external" href="https://probabl.ai"&gt;Probabl&lt;/a&gt;, in charge of
sustainable development of a data-science commons.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="contents topic" id="contents"&gt;
&lt;p class="topic-title"&gt;Contents&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference internal" href="#prelude-funding-scikit-learn-is-hard" id="toc-entry-1"&gt;Prelude: funding scikit-learn is hard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#the-birth-of-a-new-ambition" id="toc-entry-2"&gt;The birth of a new ambition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#probabl-a-mission-driven-enterprise" id="toc-entry-3"&gt;Probabl, a mission-driven enterprise&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#probabl-is-already-having-an-impact" id="toc-entry-4"&gt;Probabl is already having an impact&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#my-position-within-probabl-my-vested-interests" id="toc-entry-5"&gt;My position within Probabl …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;img alt="" class="align-right" src="../programming/attachments/scikit-learn_at_probabl.png" style="width: 300px;" /&gt;
&lt;p class="last"&gt;Open-source efforts around scikit-learn at Inria are spinning off to a
new enterprise, &lt;a class="reference external" href="https://probabl.ai"&gt;Probabl&lt;/a&gt;, in charge of
sustainable development of a data-science commons.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="contents topic" id="contents"&gt;
&lt;p class="topic-title"&gt;Contents&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference internal" href="#prelude-funding-scikit-learn-is-hard" id="toc-entry-1"&gt;Prelude: funding scikit-learn is hard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#the-birth-of-a-new-ambition" id="toc-entry-2"&gt;The birth of a new ambition&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#probabl-a-mission-driven-enterprise" id="toc-entry-3"&gt;Probabl, a mission-driven enterprise&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#probabl-is-already-having-an-impact" id="toc-entry-4"&gt;Probabl is already having an impact&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#my-position-within-probabl-my-vested-interests" id="toc-entry-5"&gt;My position within Probabl, my vested interests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#more-to-come" id="toc-entry-6"&gt;More to come&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="prelude-funding-scikit-learn-is-hard"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-1"&gt;Prelude: funding scikit-learn is hard&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Scikit-learn is a &lt;a class="reference external" href="../programming/people-underestimate-how-impactful-scikit-learn-continues-to-be.html"&gt;central software component in today’s machine learning
landscape&lt;/a&gt;,
and it is open source, governed by a community, easy to install, and well
documented. It started many years ago as a project that we did on the
side, and we were joined by many volunteers, which was key to the success
of the project. We soon decided to ensure that scikit-learn was not
&lt;em&gt;only&lt;/em&gt; a volunteer-based effort. Over more than a decade, I’ve dedicated
a lot of energy to this, using a variety of funding mechanisms: first
grants (as an academic), then sponsoring and related contracts with
various actors.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
&lt;em&gt;Digital commons eliminate scarcity and exclusivity&lt;/em&gt;&lt;/div&gt;
&lt;p&gt;Funding digital commons is really hard. People build fortunes by
leveraging competitive advantages, by creating lock-ins, or selling
access to data. What makes a great open-source library, as scikit-learn,
is exactly what prevents these tricks: we are committed to being
independent, easy to use and install, lightweight…&lt;/p&gt;
&lt;img src="../programming/attachments/probabl_rocket.svg" class="align-right" width="150px"&gt;&lt;/div&gt;
&lt;div class="section" id="the-birth-of-a-new-ambition"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-2"&gt;The birth of a new ambition&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Scikit-learn is very successful, but it could be more. For instance, it
does not facilitate pushing to production as much as tensorflow, which
can be served, deployed to android… And scikit-learn is not very
visible to top decision makers: it’s not a line on their budget, a brand
that they know. As a consequence, it is not reaping the benefit of its
success &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;.&lt;/p&gt;
&lt;table class="side-hanging docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Many commercial tools are sitting on top of open source software
like scikit-learn (splunk, sagemaker, to name only a few), making
profits, and not helping in any way the open source world that they
build upon.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="align-right docutils container"&gt;
&lt;em&gt;The French government is backing us to push the envelope&lt;/em&gt;&lt;/div&gt;
&lt;p&gt;3 years ago, the French government challenged us to go further, to consolidate
the ecosystem into a consistent data-science commons. The strategic
interest of France is to preserve some technological autonomy on data, eg
sensitive data. Thus, the government offered us, at Inria, a funding
opportunity to go further.&lt;/p&gt;
&lt;p&gt;They promised us a lot of money (dozens of millions of Euros), but with a
specific mission to develop a sustainable “data-science commons” &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;
ecosystem around scikit-learn. I’ll spare you the details of the amount
of meetings we had, documents that we wrote, to sketch the outline of the
project. I pushed forward a vision of technical components that fit in
the broader open-source ecosystem, complementing it.&lt;/p&gt;
&lt;table class="side-hanging docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The letter that we received from the French government
specifically defines the objective in these words: “data-science
common” (“Communs numériques pour la Science des Données”)&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As I moved forward, I faced a difficulty: the French government wanted a
&lt;strong&gt;sustainability plan&lt;/strong&gt;, and private investment to back it. To be honest,
this is not what I’m good at. François Goupil, the COO of the
scikit-learn consortium, was helping me, but we needed more for our
ambitions. And this is when we started talking to &lt;a class="reference external" href="https://www.linkedin.com/in/ylechelle/"&gt;Yann Lechelle&lt;/a&gt;, a tech entrepreneur with an
impressive track record interested in the impact of France on the global
tech world.&lt;/p&gt;
&lt;img alt="" class="align-right" src="../programming/attachments/probabl_logo.jpeg" style="width: 100px;" /&gt;
&lt;/div&gt;
&lt;div class="section" id="probabl-a-mission-driven-enterprise"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-3"&gt;Probabl, a mission-driven enterprise&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;With Yann, we built a new vision. Our challenge is to be long-term
sustainable and virtuous for scikit-learn, its broader ecosystem, and its
community. Yann brought in a business point of view, and I tried to bring
that of open-source communities beyond probabl &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;, for instance
avoiding to getting in the way of others building businesses that
contribute to scikit-learn. Indeed, we are convinced that having a broad
and diverse community around scikit-learn is central to its future.&lt;/p&gt;
&lt;table class="side-hanging docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;One of the first things that Probabl did (Guillaume Lemaître, to
be specific), was submit a grant application (to the Chang-Zuckenberg
Institute), to fund, via NumFocus, a developer employed by
Quantsight, with no money transiting via Probabl (one reason being
that we have no operations outside of Europe so far).&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Our sustainability model is still being finetuned. What I can tell is
that it will involve a mix of professional service, support &amp;amp; sponsorship
agreement, as well as a product-based offer, where we supplement
scikit-learn with enterprise features. Our focus will be on features that
are typically not the focus of open-source developers: integration in
large structures, such as access control, LDAP connection, regulatory
compliance. We will not shoehorn scikit-learn in open core or dual
licensing approaches: we want our incentives to be aligned with
scikit-learn, and its ecosystem, being as complete as possible.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Foster growth and adoption of our open-source stack&lt;/div&gt;
&lt;p&gt;In a sense, our inspiration is that of RedHat, where the growth of the
company fosters the growth and adoption of the software (Linux in the case
of RedHat), beyond the company, in an ecosystem, and for a wide variety
of applications.&lt;/p&gt;
&lt;p&gt;Strong growth will mean external capital. To ensure that we do not lose
the focus on our mission, building data-science commons, Yann penciled
down a specific governance of the company (and then validated it with
many people, as we are a spin-off from a governmental organization). The
ultimate share structure, and the board, are divided in three electoral
colleges: one for outside investors, one for founders and employees, and
one for public institutions. This ensures a balance of power that
hopefully will keep us aligned to our mission. I think that this
structure sends a strong signal that we are not just another for-profit
that will go from creating useful tech to dark money-generating patterns.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="probabl-is-already-having-an-impact"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-4"&gt;Probabl is already having an impact&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;A strong open-source team&lt;/strong&gt; In February, the whole team developing
scikit-learn at Inria moved to Probabl, joined by Adrin Jalali, a
Berlin-based core developer of scikit-learn and fairlearn. We’ve been
hiring excellent people, and we now have &lt;strong&gt;9 people on open-source&lt;/strong&gt; (see
the &lt;a class="reference external" href="https://probabl.ai/about"&gt;Probabl team&lt;/a&gt;), spending their time
contributing to open source (Jérémie, for instance, has been doing the
last releases for scikit-learn).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fostering an ecosystem&lt;/strong&gt; Probabl is not only about scikit-learn. We are
prioritizing &lt;a class="reference external" href="https://probabl.ai/open-source"&gt;8 libraries&lt;/a&gt;, central to
the machine-learning and data science ecosystem: joblib, fairlearn,
imbalanced-learn… In general, as we have always done, we will not
hesitate contributing to upstream or related projects. Our goal is to
have a healthy open-source ecosystem around data-science.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Not only software&lt;/strong&gt; Not everybody sees the important lines of code.
I’ve become increasingly aware of the need to do outreach and
communication, to coders, but also to decision makers. At Probabl we
dedicate energy to be in business meetings, to participate in the tech
narrative, to teach how to best do data science, &lt;em&gt;eg&lt;/em&gt; with didactic
videos. We’re starting a mentioning program, we’ll be organizing
sprints… I am convinced that all this is a useful long-term investment.&lt;/p&gt;
&lt;img alt="" class="align-center" src="../programming/attachments/probabl_robot_dog.jpeg" style="width: 360px;" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="my-position-within-probabl-my-vested-interests"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-5"&gt;My position within Probabl, my vested interests&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I am a French civil servant (a researcher at Inria, one of our national
research institute). Such a position comes with strong responsibilities
to control conflicts of interest. The creation of Probabl underwent
strict scrutiny (that took a long long time). I have been recently
cleared to take an active role: 10% of my time is allocated to be a
&lt;strong&gt;scientific and open-source advisor for Probabl&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I am not paid by Probabl&lt;/strong&gt;. 100% of my salary comes from Inria (and I was
not given a raise because of my involvement in Probabl). I do have financial
interests as a founder, but given that I have a small active part, I have
one of the smallest amount of shares among founders.&lt;/p&gt;
&lt;p&gt;My main interest in Probabl is really the success of its mission: the
long-term growth of an open-source data-science ecosystem. Spinning-off
from Inria actually continues my efforts in this direction, but with more
agility and breadth. And having on top of open source a variety of
complementary commercial activities makes it stronger, by answering
better the needs of some actors.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="more-to-come"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-6"&gt;More to come&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are many things that we are still ironing. Clearing out
specific details takes time (for instance, clearing my role took a
while). We are still to announce the future of the sponsorship program
that we had set up at the Inria foundation. Its mission has been
transferred to Probabl. Currently, Probabl’s open
source team is ensuring continuity of our work with the existing sponsors.
But we will set up broader
partnership opportunities, with a similar governance, that enable
third-parties to invest in open source on a roadmap decided jointly with
the open-source community.&lt;/p&gt;
&lt;p&gt;I believe that we need a lot of &lt;strong&gt;transparency&lt;/strong&gt; in how we decide upon priorities
in our open source team. Our 2024 priorities for scikit-learn are visible
&lt;a class="reference external" href="https://papers.probabl.ai/scikit-learns-priorities-at-probabl"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I look forward to when Probabl will start adding value to scikit-learn
for enterprises with an offer enriching scikit-learn and the broader
open-source ecosystem.&lt;/p&gt;
&lt;p&gt;I am acutely aware that good &lt;strong&gt;open source is made of communities&lt;/strong&gt;, and that
communities need trust and understanding of big players such as Probabl
(well, so far we are not that big). I hope that with time our actions
will become easy to read and speak of themselves.&lt;/p&gt;
&lt;img src="../programming/attachments/probabl_machine_heart.svg" class="align-center" width="400px"&gt;&lt;/div&gt;
</content><category term="programming"></category><category term="open source"></category><category term="growth"></category><category term="communities"></category><category term="scikit-learn"></category><category term="inria"></category><category term="probabl"></category></entry><entry><title>People underestimate how impactful Scikit-learn continues to be</title><link href="https://gael-varoquaux.info/programming/people-underestimate-how-impactful-scikit-learn-continues-to-be.html" rel="alternate"></link><published>2023-11-27T00:00:00+01:00</published><updated>2023-11-27T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2023-11-27:/programming/people-underestimate-how-impactful-scikit-learn-continues-to-be.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;François Chollet rightfully said that people often underestimate the
impact of scikit-learn. I give here a few illustrations to back his
claim.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A few days ago, François Chollet (the creator of Keras, the library that
that democratized deep learning) &lt;a class="reference external" href="https://twitter.com/fchollet/status/1727186047115882624"&gt;posted&lt;/a&gt;:&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://twitter.com/fchollet/status/1727186047115882624"&gt;&lt;img alt="Tweet from François Chollet: &amp;quot;People underestimate how impactful scikit-learn continues to be&amp;quot;" class="align-center" src="../programming/attachments/chollet_scikit_learn_impact.png" /&gt;&lt;/a&gt;
&lt;p&gt;Indeed, scikit-learn continues to be the most popular machine …&lt;/p&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;François Chollet rightfully said that people often underestimate the
impact of scikit-learn. I give here a few illustrations to back his
claim.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A few days ago, François Chollet (the creator of Keras, the library that
that democratized deep learning) &lt;a class="reference external" href="https://twitter.com/fchollet/status/1727186047115882624"&gt;posted&lt;/a&gt;:&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://twitter.com/fchollet/status/1727186047115882624"&gt;&lt;img alt="Tweet from François Chollet: &amp;quot;People underestimate how impactful scikit-learn continues to be&amp;quot;" class="align-center" src="../programming/attachments/chollet_scikit_learn_impact.png" /&gt;&lt;/a&gt;
&lt;p&gt;Indeed, scikit-learn continues to be the most popular machine learning in
surveys:&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;a class="reference external image-reference" href="../programming/attachments/kaggle_survey_library_2022.png"&gt;&lt;img alt="" src="../programming/attachments/kaggle_survey_library_2022.png" /&gt;&lt;/a&gt;
&lt;p class="caption"&gt;Most popular machine-learning framework, according to &lt;a class="reference external" href="https://www.kaggle.com/kaggle-survey-2022"&gt;a Kaggle survey&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Scikit-learn is probably the most used machine-learning library&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This popularity is sometimes underestimated as scikit-learn is a small player
in terms of funding and size of the team, in particular
compared to giants such as tensorflow and pytorch. Size is limited
by nature of the project, based on a community without a strong commercial
entity backing the project.&lt;/p&gt;
&lt;p&gt;We target different technology than tensorflow and pytorch: we have
by design let the big players focus on deep learning, which demands much
more resources. Rather, we have focused on classic machine learning,
believing that it serves other important needs. While such technologies
make less the news, they are used a lot, and scikit-learn is massively
used:&lt;/p&gt;
&lt;table border="1" class="noborder docutils align-center"&gt;
&lt;caption&gt;&lt;strong&gt;Usage statistics&lt;/strong&gt; (from github)&lt;/caption&gt;
&lt;colgroup&gt;
&lt;col width="33%" /&gt;
&lt;col width="33%" /&gt;
&lt;col width="33%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;&lt;a class="reference external image-reference" href="https://github.com/scikit-learn/scikit-learn"&gt;&lt;img alt="sklearn_header" src="../programming/attachments/scikit-learn_header.png" /&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a class="reference external image-reference" href="https://github.com/pytorch/pytorch/"&gt;&lt;img alt="pytorch_header" src="../programming/attachments/pytorch_header.png" /&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a class="reference external image-reference" href="https://github.com/tensorflow/tensorflow"&gt;&lt;img alt="tensorflow_header" src="../programming/attachments/tensorflow_header.png" /&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;a class="reference external image-reference" href="https://github.com/scikit-learn/scikit-learn"&gt;&lt;img alt="sklearn_used_by" src="../programming/attachments/scikit-learn_used_by.png" /&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a class="reference external image-reference" href="https://github.com/pytorch/pytorch/"&gt;&lt;img alt="pytorch_used_by" src="../programming/attachments/pytorch_used_by.png" /&gt;&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;a class="reference external image-reference" href="https://github.com/tensorflow/tensorflow"&gt;&lt;img alt="tensorflow_used_by" src="../programming/attachments/tensorflow_used_by.png" /&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;By not focusing on deep learning, does scikit-learn risk to become
outdated? Surveys show that simple models such as linear models or models
based on trees (including boosting) are actually the most used models:&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;a class="reference external image-reference" href="../programming/attachments/popular_ml_algorithm_2022.png"&gt;&lt;img alt="" src="../programming/attachments/popular_ml_algorithm_2022.png" /&gt;&lt;/a&gt;
&lt;p class="caption"&gt;Most popular machine learning algorithm, according to &lt;a class="reference external" href="https://www.kaggle.com/code/dhirajkumar612/kaggle-survey-2022-data-analysis"&gt;a kaggle
survey&lt;/a&gt;
(apologies for the small fonts on the figure, I did not generate it)&lt;/p&gt;
&lt;/div&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Gradient Boosted Trees is a good go-to model&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;There is a lot of hype surrounding deep learning, but it is most
often not the right tool do tackle tabular data. Tabular data has
different properties than images or text: it comes with heterogeneous
columns which make sense by themselves, and tree-based models have the
right inductive bias &lt;a class="reference external" href="https://proceedings.neurips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-Abstract-Datasets_and_Benchmarks.html"&gt;[Grinsztajn et al 2023]&lt;/a&gt;.&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;a class="reference external image-reference" href="https://proceedings.neurips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-Abstract-Datasets_and_Benchmarks.html"&gt;&lt;img alt="" src="../programming/attachments/benchmark_tree_models.png" /&gt;&lt;/a&gt;
&lt;p class="caption"&gt;&lt;strong&gt;Benchmark comparing models on tabular data while tuning
hyper-parameters&lt;/strong&gt; (from &lt;a class="reference external" href="https://proceedings.neurips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-Abstract-Datasets_and_Benchmarks.html"&gt;Grinsztajn et al 2023&lt;/a&gt;) Each value corresponds to the test score of the
best model (on the validation set) after a specific time spent doing
random search. The
ribbon corresponds to the minimum and maximum scores on these 15
shuffles.
Models HistGradientBoostingTree, GradientBoostingTree, and
RandomForest come from scikit-learn. FTtransformer, Saint, ResNet and
MLP are all deep learning architecture, with FT transformer and Saint
models specifically developed for tabular data.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;As we can see, scikit-learn’s &lt;a class="reference external" href="https://scikit-learn.org/stable/modules/ensemble.html#histogram-based-gradient-boosting"&gt;HistGradientBoosting&lt;/a&gt; really shines in terms of good prediction performance for small computational costs. We strive to facilitate datascience: make it lightweight, give good documentation and APIs.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Linear models and tree-based models are there to stay. They answer strong
needs for many application settings and they come with small
operational cost.&lt;/p&gt;
&lt;p&gt;In my opinion, where scikit-learn could really grow to be even more
relevant is to integrate better in a broader ecosystem going from
databases to putting to production, being more “enterprise ready” :).&lt;/p&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="open-source"></category><category term="machine learning"></category></entry><entry><title>My Mayavi story: discovering open source communities</title><link href="https://gael-varoquaux.info/programming/my-mayavi-story-discovering-open-source-communities.html" rel="alternate"></link><published>2022-07-10T00:00:00+02:00</published><updated>2022-07-10T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2022-07-10:/programming/my-mayavi-story-discovering-open-source-communities.html</id><summary type="html">&lt;p class="align-right"&gt;&lt;em&gt;The Mayavi Python software, and my personal history: A thread on
Python and scipy ecosystems, building open source codebase, and
meeting really cool and friendly people&lt;/em&gt;&lt;/p&gt;
&lt;img alt="" class="align-right" src="attachments/mayavi/mayavi_ets.png" /&gt;
&lt;p&gt;I am writing today as a goodbye to the project: I used to be one of the
core contributors and maintainers but have been …&lt;/p&gt;</summary><content type="html">&lt;p class="align-right"&gt;&lt;em&gt;The Mayavi Python software, and my personal history: A thread on
Python and scipy ecosystems, building open source codebase, and
meeting really cool and friendly people&lt;/em&gt;&lt;/p&gt;
&lt;img alt="" class="align-right" src="attachments/mayavi/mayavi_ets.png" /&gt;
&lt;p&gt;I am writing today as a goodbye to the project: I used to be one of the
core contributors and maintainers but have been inactive for a while for
lack of time. Out of common agreement, we recently removed my commit
rights to limit security risks.&lt;/p&gt;
&lt;p&gt;Mayavi brought my so much!&lt;/p&gt;
&lt;div class="section" id="the-start-of-my-adventure-with-mayavi"&gt;
&lt;h2&gt;The start of my adventure with Mayavi&lt;/h2&gt;
&lt;img alt="" class="align-right" src="attachments/mayavi/example_magnetic_field_lines.jpg" /&gt;
&lt;p&gt;I got involved around 2007: I needed 3D visualization of magnetic fields as I was designing coils for my PhD &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;This led to an example in the Mayavi docs &lt;a class="reference external" href="http://docs.enthought.com/mayavi/mayavi/auto/example_magnetic_field_lines.html"&gt;http://docs.enthought.com/mayavi/mayavi/auto/example_magnetic_field_lines.html&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;I started as an early user of Mayavi2, a rewrite of Mayavi, and
eventually joined Prabhu Ramachandran and Enthought as a contributor.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="what-is-mayavi"&gt;
&lt;h2&gt;What is Mayavi?&lt;/h2&gt;
&lt;p&gt;Mayavi is a scientific 3D visualization library in Python.&lt;/p&gt;
&lt;p&gt;It enables interactive visualization to understand complex information in
3D, such as multi-physics fields, combined with &lt;a class="reference external" href="https://docs.enthought.com/mayavi/mayavi/mlab.html"&gt;simple scripting&lt;/a&gt; to integrate in a
broader scientific computing flow.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Mayavi was designed and founded around 2000 by Prabhu Ramachandran, a
researcher in computational fluid dynamics at IIT Bombay and long-time
open-source and Python figure.&lt;/p&gt;
&lt;p&gt;The key idea was to make VTK, a powerful C++ visualization library,
easily useful with a Python interface.&lt;/p&gt;
&lt;p&gt;Mayavi bridged the gap between the C++ data structures, and efficient Python data structures, exposing without copies to numpy arrays.&lt;/p&gt;
&lt;p&gt;It uses tools from Enthought (namely the entought tool suite) for an
interactive GUI built on a Python object model: fully scriptable (the
vision in explained in &lt;a class="reference external" href="https://hal.archives-ouvertes.fr/hal-00502548"&gt;an article Prabhu and I wrote&lt;/a&gt; )&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="attachments/mayavi/mayavi_application.png" /&gt;
&lt;p class="caption"&gt;Mayavi is a full-blown interactive application&lt;/p&gt;
&lt;/div&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="attachments/mayavi/mayavi_mlab.jpg" /&gt;
&lt;p class="caption"&gt;Mayavi is also a Python library, for full scripting&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="working-on-mayavi-taught-me-code-and-communities"&gt;
&lt;h2&gt;Working on Mayavi taught me code and communities&lt;/h2&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="attachments/mayavi/mayavi_ipython.png" /&gt;
&lt;p class="caption"&gt;Mayavi used within an interactive IPython – an image from the
&lt;a class="reference external" href="https://ieeexplore.ieee.org/abstract/document/5725237"&gt;Mayavi paper&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I joined to help with the “mlab” interface, for even simpler Python
scripting built upon functions. My goal was to make Mayavi natural to
matlab and matplotlib users, a product vision which was probably
important to push popularity even further.&lt;/p&gt;
&lt;p&gt;I was an isolated PhD student in a physics lab, emboldened by a
discussion with Fernando Perez, I started contributing and discussing
with Prabhu Ramanchandran. I remember my first skype discussion with
Prabhu, I was very intimidated.&lt;/p&gt;
&lt;p&gt;Understanding this large codebase was hard! And yet, slowly but surely, I
started making more and more meaningful contribution: on mlab, than on
the broader codbase, fixing bugs, a lot of work on documentation and
examples…&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="attachments/mayavi/scipy_conf.jpg" /&gt;
&lt;p class="caption"&gt;Prabhu and myself are in this scipy conference group picture! From &lt;a class="reference external" href="https://slideshare.net/enthought/scientific-computing-with-python-webinar-august-28-2009"&gt;https://slideshare.net/enthought/scientific-computing-with-python-webinar-august-28-2009&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Then Enthought funded my overseas travel to the scipy conference: a big
deal for me, as I was a peniless PhD student.&lt;/p&gt;
&lt;p&gt;My Mayavi story is that of meeting amazing people in the Python, scipy,
and pydata world; people who believe in building a tool stack to
democratize scientific computing; people from all over the world,
friendly, welcoming, passionate.&lt;/p&gt;
&lt;p&gt;It founded my belief in communities.&lt;/p&gt;
&lt;p&gt;This adventure led me to learn software engineering (&lt;a class="reference external" href="https://software-carpentry.org/"&gt;Software carpentry&lt;/a&gt; really helped getting started) to
work at Enthought (a software startup central to scientific computing in
Python), to change career from physics to computing, join Inria (French
national research in maths and computing), and I do other open source
projects…&lt;/p&gt;
&lt;p&gt;Mayavi was crucial to my personal adventure. Thank you Prabhu! Thank you
Enthought! Thank you the Scipy community!!&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="mayavi"></category><category term="python"></category><category term="science"></category><category term="conferences"></category></entry><entry><title>Hiring an engineer and post-doc to simplify data science on dirty data</title><link href="https://gael-varoquaux.info/programming/hiring-an-engineer-and-post-doc-to-simplify-data-science-on-dirty-data.html" rel="alternate"></link><published>2021-10-29T00:00:00+02:00</published><updated>2021-10-29T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2021-10-29:/programming/hiring-an-engineer-and-post-doc-to-simplify-data-science-on-dirty-data.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Join us to work on reinventing data-science practices and tools to
produce robust analysis with less data curation.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It is well known that data cleaning and preparation are a heavy burden to
the data scientist.&lt;/p&gt;
&lt;img alt="" class="align-center" src="attachments/big_data_borat_cleaning_data.png" style="width: 400px;" /&gt;
&lt;div class="section" id="dirty-data-research"&gt;
&lt;h2&gt;Dirty data research&lt;/h2&gt;
&lt;p&gt;In the &lt;a class="reference external" href="https://project.inria.fr/dirtydata/"&gt;dirty data project&lt;/a&gt;, we
have been conducting machine-learning research …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Join us to work on reinventing data-science practices and tools to
produce robust analysis with less data curation.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;It is well known that data cleaning and preparation are a heavy burden to
the data scientist.&lt;/p&gt;
&lt;img alt="" class="align-center" src="attachments/big_data_borat_cleaning_data.png" style="width: 400px;" /&gt;
&lt;div class="section" id="dirty-data-research"&gt;
&lt;h2&gt;Dirty data research&lt;/h2&gt;
&lt;p&gt;In the &lt;a class="reference external" href="https://project.inria.fr/dirtydata/"&gt;dirty data project&lt;/a&gt;, we
have been conducting machine-learning research to see how better
statistical models could readily ingest non-curated data, and reduce the
need of data preparation for data science. We now have a growing
understanding of the problems, theoretical and practical, which lie
across statistical and database topics.&lt;/p&gt;
&lt;p&gt;Machine learning leads to different tradeoffs than traditional
inferential statistics (because it can rely on more powerful model). For
instance, we now have a good understanding of the case of missing values:
in &lt;a class="reference external" href="https://arxiv.org/abs/2106.00311"&gt;Le Morvan et al&lt;/a&gt;, we showed that
with traditional methods, ignorable missingness &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt; and “good”
imputation are important, but it turns out for prediction, flexible
predictors are what matters and they can work on any missingness
mechanism.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;“Missing at Random”, where missingness is independent of the
hidden values&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Similarly, we have made good progress on tolerating normalization errors
and typos. We find that rather to attempt to deduplicate the entries or
fix the typos, it is best to represent similarities and ambiguities to
a flexible learning algorithm. The simplest and most reliable methods are
implemented in the &lt;a class="reference external" href="http://dirty-cat.github.io/"&gt;dirty-cat&lt;/a&gt; library, to
facilitate the life of data-scientists&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="reinventing-data-science"&gt;
&lt;h2&gt;Reinventing data science&lt;/h2&gt;
&lt;p&gt;With this understanding (and even more exciting on-going research), we
want to revisit data science. Machine-learning can provide flexible
models for many usages of data science. Our goal is to use it to help
assembling and analyzing datasets while minimizing human efforts. For
this, we need tools that can answer typical data-science questions using
machine learning and starting from the raw data, often spread in multiple
files or multiple tables of a databases. Building these tools requires
data-science research, a new vision of data-science APIs, and careful
software crafting.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="join-us-in-this-adventure"&gt;
&lt;h2&gt;Join us in this adventure&lt;/h2&gt;
&lt;p&gt;We have an &lt;a class="reference external" href="https://project.inria.fr/dirtydata/team/"&gt;awesome team&lt;/a&gt;,
with a great mix of people of different seniority, different expertise
(statistics, machine learning, databases, software engineering), sharing
offices with the &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/home/"&gt;scikit-learn at Inria&lt;/a&gt;. But we have too many
exciting ideas, so we are growing this team.&lt;/p&gt;
&lt;div class="section" id="a-data-science-engineer-new-software-with-new-ideas"&gt;
&lt;h3&gt;A data-science engineer: new software with new ideas&lt;/h3&gt;
&lt;p&gt;We are looking for someone with a background in data science or numerical
Python programming to join us, to help with designing a new data-science
library, evolving from &lt;a class="reference external" href="http://dirty-cat.github.io/"&gt;dirty-cat&lt;/a&gt;, and
to help with data-science experimentation for the research.&lt;/p&gt;
&lt;p&gt;We like people who care about data, designing good tools, and have vision
about data science. We are happy to consider different level of
experience. Apply on &lt;a class="reference external" href="https://jobs.inria.fr/public/classic/fr/offres/2021-04182"&gt;the job offer&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-post-doc-researcher-science-joining-data-engineering-to-deep-learning"&gt;
&lt;h3&gt;A post-doc researcher: science joining data engineering to deep learning&lt;/h3&gt;
&lt;p&gt;We will soon be announcing a post-doc position to join the team for
research in this scope. We are interested in questions around learning on
relational or tabular data, or learning data integration. We have plenty
of ideas to explore around embeddings in databases, learning to
aggregate, learning on sets, graph neural networks for databases, or
distributional matching for entity and schema alignment.
We expect to be borrowing tools (conceptual and practical) from deep
learning, but to blending them with techniques from data integration,
knowledge graphs, and databases.&lt;/p&gt;
&lt;p&gt;The job posting will be out soon, but I am running out of the office
right now for vacations (work-life balance also matters to us).&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Diversity is important&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://project.inria.fr/dirtydata/team/"&gt;Our team&lt;/a&gt; is not as
diverse as I would like it to be (though probably doing better than
typical computer-science team). We love diverse candidates. Do not
hesitate.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="machine learning"></category><category term="data science"></category><category term="dirty data"></category><category term="hiring"></category></entry><entry><title>Hiring someone to develop scikit-learn community and industry partners</title><link href="https://gael-varoquaux.info/programming/hiring-someone-to-develop-scikit-learn-community-and-industry-partners.html" rel="alternate"></link><published>2021-09-14T00:00:00+02:00</published><updated>2021-09-14T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2021-09-14:/programming/hiring-someone-to-develop-scikit-learn-community-and-industry-partners.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;With the growth of scikit-learn and the wider PyData ecosystem, we
want to recruit in the Inria scikit-learn team for &lt;a class="reference external" href="https://recrutement.inria.fr/public/classic/en/offres/2021-04058"&gt;a new role&lt;/a&gt;.
Departing from our usual focus on excellence in algorithms,
statistics, or code, we want to add to the team someone with some
technical understanding, but an …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;With the growth of scikit-learn and the wider PyData ecosystem, we
want to recruit in the Inria scikit-learn team for &lt;a class="reference external" href="https://recrutement.inria.fr/public/classic/en/offres/2021-04058"&gt;a new role&lt;/a&gt;.
Departing from our usual focus on excellence in algorithms,
statistics, or code, we want to add to the team someone with some
technical understanding, but an eye for people dynamics. Are you
passionate about developing open-source communities for data science?
This job is a unique opportunity.&lt;/p&gt;
&lt;p class="last"&gt;The mandate will be on the one hand to develop the wider community
behind scikit-learn, on the other hand to foster the foundation’s
partnerships, as this is our funding.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="context-scikit-learn-inria-foundation"&gt;
&lt;h2&gt;Context: Scikit-learn &amp;#64; Inria foundation&lt;/h2&gt;
&lt;div class="section" id="the-growth-of-scikit-learn"&gt;
&lt;h3&gt;The growth of Scikit-learn&lt;/h3&gt;
&lt;img alt="" class="align-right" src="../programming/attachments/scikit-learn-logo.png" style="width: 200px;" /&gt;
&lt;p&gt;Scikit-learn is used massively, from schools to major companies. It
underpins business-intelligence analysis or automates processes. Its
reliability is crucial for the enterprise. Its well-documented methods
help data-scientists run to valid analyses.&lt;/p&gt;
&lt;p&gt;Scikit-learn has hugely grown and is still growing in terms of userbase
and expectation of quality. These days, the development team is large,
with many grass-root volunteering and some contributors spending a
sizeable fraction of their work time.&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;img alt="" src="../programming/attachments/sklearn_website_access.png" style="width: 450px;" /&gt;
&lt;p class="caption"&gt;Number of monthly website access&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="scikit-learn-inria-foundation"&gt;
&lt;h3&gt;Scikit-learn &amp;#64; Inria foundation&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Birth of a foundation&lt;/strong&gt;
To ensure reliable funding to a small core of scikit-learn developers, we
set up a foundation &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt; a few years ago. The goal was to make sure that
we did not lose our experienced developers.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;See &lt;a class="reference external" href="http://gael-varoquaux.info/programming/a-foundation-for-scikit-learn-at-inria.html"&gt;the motivating announcement&lt;/a&gt; and the &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr"&gt;website&lt;/a&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Achieving sustainability&lt;/strong&gt;
The resulting structure is set up to provide a career path to a few of
our core people. As a consequence, it is a French legal entity, acting as
an employer, funded via sponsorship agreement with a few
of major economic users of scikit-learn (check out &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr"&gt;the list of our
sponsors&lt;/a&gt;). The priorities of
the team are set &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/how-are-the-priorities-of-the-consortium-defined/"&gt;jointly between the sponsors and the open-source
community&lt;/a&gt;. The setup is not without flaws, in particular it forces us to employ people &lt;a class="reference external" href="https://www.inria.fr/en/centre-inria-saclay-ile-de-france"&gt;on Campus&lt;/a&gt;, but it enables giving proper benefits to these contributors.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The team&lt;/strong&gt; The &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/people/"&gt;scikit-learn team at Inria foundation&lt;/a&gt; currently comprises 4
very experienced developers. In addition, we have other sources of
funding –research projects, &lt;a class="reference external" href="https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/"&gt;the scikit-learn MOOC&lt;/a&gt; –
that we use to create a larger team (currently 3 full-time positions).
Finally, various researchers on campus are heavily invested in
scikit-learn or related projects such as joblib. As a result, the amount
of technical skills is staggering.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Long story short, we want to add new DNA to this awesome team: someone
into peopleware as much as software.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="mandate"&gt;
&lt;h2&gt;Mandate&lt;/h2&gt;
&lt;p&gt;The goal of &lt;a class="reference external" href="https://recrutement.inria.fr/public/classic/en/offres/2021-04058"&gt;the new position&lt;/a&gt; is
to talk both to our wider open-source world and our corporate partners.
Both are crucial to fostering growth for scikit-learn.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://recrutement.inria.fr/public/classic/en/offres/2021-04058"&gt;official job posting&lt;/a&gt;
doesn’t convey as well as I would like what is behind this position. I’m
probably to blame :).&lt;/p&gt;
&lt;div class="section" id="growing-our-open-source-community"&gt;
&lt;h3&gt;Growing our open-source community&lt;/h3&gt;
&lt;img alt="" class="align-right" src="../programming/attachments/herdingcats.jpg" style="width: 300px;" /&gt;
&lt;p&gt;As both the scikit-learn and the PyData community have grown,
communication becomes a bottleneck. There are so many little things to
make an open-source community productive: facilitating on-boarding,
dividing efficiently the workload, documenting well the decision making,
organizing fun sprints, making sure that issue triaging is efficient…&lt;/p&gt;
&lt;p&gt;We are looking for someone passionate about open-source
communities and who wants to be herding such cats.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="increasing-our-corporate-visibility"&gt;
&lt;h3&gt;Increasing our corporate visibility&lt;/h3&gt;
&lt;p&gt;Scikit-learn is one of the most used data-science tools. However, talking
to senior decision makers, their perception sometimes differs. Indeed, we
are competing for visibility with many powerful actors.&lt;/p&gt;
&lt;p&gt;We must communicate beyond the open-source world to develop
a strong brand for scikit-learn. Good communication will help us find new
sponsors, a key ingredient of growth and sustainability for scikit-learn.&lt;/p&gt;
&lt;p&gt;We need to communicate on our progresses and our actions, as people are
often surprised by the breadth of our contributions &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;for instance, the foundation team has contributed &lt;a class="reference external" href="https://youtu.be/UVL4LFy8ch0?t=1437"&gt;improvements in
CPython itself&lt;/a&gt; , maintains
&lt;a class="reference external" href="https://github.com/cloudpipe/cloudpickle"&gt;cloudpickle&lt;/a&gt; a central
component of the data ecosystem).&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;As a foundation, we need to be transparent and accountable, which is
harder than it seems.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="a-good-fit"&gt;
&lt;h2&gt;A good fit&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://www.flickr.com/photos/randychiu/4602851011/"&gt;&lt;img alt="One Man Band, CCby2.0 from randychiu" class="align-right" src="../programming/attachments/one_man_band.jpg" style="width: 250px;" /&gt;&lt;/a&gt;
&lt;p&gt;We are looking for someone into open source, but also who likes writing
blog posts, social networks, organizing events, presenting scikit-learn,
and improving processes.&lt;/p&gt;
&lt;p&gt;We believe that such a job is best done by someone who has some technical
interest in scikit-learn: good advocacy needs with good understanding.&lt;/p&gt;
&lt;p&gt;Maybe this sounds daunting? Few people have all the skills, let alone the
experience. We are actually more &lt;strong&gt;looking for a passionate and promising
candidate&lt;/strong&gt;, whatever the length of the resume. We believe that
&lt;strong&gt;talented people can learn&lt;/strong&gt;, when they like what they do.&lt;/p&gt;
&lt;p&gt;This is a job about open-source, for open source! It’s not a perfect job:
we have many administrative constraints in running the foundation, we are
paying ourselves less than a non-open-source job.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Apply now&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We are looking forward to your application. You can submit them on
&lt;a class="reference external" href="https://recrutement.inria.fr/public/classic/en/offres/2021-04058"&gt;the official job offer&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="open source"></category><category term="growth"></category><category term="communities"></category><category term="scikit-learn"></category><category term="inria"></category><category term="foundation"></category></entry><entry><title>Technical discussions are hard; a few tips</title><link href="https://gael-varoquaux.info/programming/technical-discussions-are-hard-a-few-tips.html" rel="alternate"></link><published>2020-05-28T00:00:00+02:00</published><updated>2020-05-28T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2020-05-28:/programming/technical-discussions-are-hard-a-few-tips.html</id><summary type="html">&lt;!-- Emma, Eliz, Rashema, Ralf Gommers to read this --&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;This post discuss the difficulties of communicating while developing
open-source projects and tries to gives some simple advice.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A large software project is above all a social exercise in which technical
experts try to reach good decisions together, for instance on github
pull requests. But communication is difficult, in …&lt;/p&gt;</summary><content type="html">&lt;!-- Emma, Eliz, Rashema, Ralf Gommers to read this --&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;This post discuss the difficulties of communicating while developing
open-source projects and tries to gives some simple advice.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;A large software project is above all a social exercise in which technical
experts try to reach good decisions together, for instance on github
pull requests. But communication is difficult, in particular between
diverging points of view. It is easy to
underestimate how much well-intended persons can misunderstand
each-other and get hurt, in open source as elsewhere. Knowing why
there are communication challenges can help, as well as applying a few
simple rules.&lt;/p&gt;
&lt;img alt="" class="align-right" src="../programming/attachments/communication.png" style="width: 300px;" /&gt;
&lt;div class="contents topic" id="contents"&gt;
&lt;p class="topic-title"&gt;Contents&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference internal" href="#maintainer-s-anxiety" id="toc-entry-1"&gt;Maintainer’s anxiety&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#contributor-s-fatigue" id="toc-entry-2"&gt;Contributor’s fatigue&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#communication-is-hard" id="toc-entry-3"&gt;Communication is hard&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference internal" href="#little-things-that-help" id="toc-entry-4"&gt;Little things that help&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The first challenge is to understand the other’s point of view: the
different parties see the problem differently.&lt;/p&gt;
&lt;!-- TODO: put a few things in bold --&gt;
&lt;div class="section" id="maintainer-s-anxiety"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-1"&gt;Maintainer’s anxiety&lt;/a&gt;&lt;/h2&gt;
&lt;div class="section" id="open-source-can-be-anxiety-generating-for-the-maintainers"&gt;
&lt;h3&gt;Open source can be anxiety-generating for the maintainers&lt;/h3&gt;
&lt;p&gt;Maintainers ensure the quality and the long-term life of an open-source
project. As such, &lt;strong&gt;they feel responsible for any shortcoming in
the product&lt;/strong&gt;. In addition, they often do this work because they care,
even though it may not bring any financial support.
But they can quickly become a converging point of anxiety-generating
feedback:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Code has bugs; the more code, the more bugs. Watching a issue tracker
fill up with a long list of bugs is frightening to people who
feel in charge.&lt;/li&gt;
&lt;li&gt;Given that maintainers are visible and qualified, they become the
target of constant requests for attention: from pleas to prioritize a
specific issue to solicitations for advice.&lt;/li&gt;
&lt;li&gt;A small fraction of these interactions come as plain
aggressions. I have been insulted many times by unsatisfied
users. Each time, it hurts me a lot. My policy is to
disengage from the conversation, but I am left shaking and staring at
my computer in the evening.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="side-hanging small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;&lt;strong&gt;Related writings&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Ralf Gommers discusses &lt;a class="reference external" href="https://rgommers.github.io/2019/06/the-cost-of-an-open-source-contribution/"&gt;the cost of an open source
contribution&lt;/a&gt;, from the point of view of the maintainer.&lt;/p&gt;
&lt;p&gt;Ilya Grigorik suggests: &lt;a class="reference external" href="https://www.igvita.com/2011/12/19/dont-push-your-pull-requests/"&gt;Don’t push your pull request&lt;/a&gt;.&lt;/p&gt;
&lt;p class="last"&gt;Brett Cannon: &lt;a class="reference external" href="https://snarky.ca/setting-expectations-for-open-source-participation/"&gt;Setting expectations for open source participation&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The more popular a project, the more weight it puts on its maintainers’
shoulders. A consequence is that &lt;strong&gt;maintainers are tired&lt;/strong&gt;, and can
sometimes approach discussions in a defensive way. Also, we may be plain
scared of integrating a code that we do not fully comprehend.&lt;/p&gt;
&lt;p&gt;Open-source developers may even, unconsciously, adopt a simple, but
unfortunate, protection mechanism: being rude. The logic is flawless: if
I am nasty to people, or I set unreasonnable expectations, people will let me alone.
Alas, this strategy leads to toxic environments. It not only makes people
unhappy but also harms the community dynamics that ground the excellence
of open source.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-danger-abusive-gatekeeping"&gt;
&lt;h3&gt;The danger abusive gatekeeping&lt;/h3&gt;
&lt;!-- add a image of puppy? And a gate? --&gt;
&lt;p&gt;A maintainer quickly learns that every piece of code, no matter how cute
it might be, will give him or her work in the long run, &lt;a class="reference external" href="https://snarky.ca/setting-expectations-for-open-source-participation/#submittingacontribution"&gt;just like a puppy&lt;/a&gt;. This
is unavoidable given that the complexity of code grows faster than its number of
features &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;, and, even for a company as rich as Google,
project maintenance becomes intractable on huge projects &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;div class="side-hanging docutils container"&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;a class="reference external" href="https://ieeexplore.ieee.org/document/1702600"&gt;An Experiment on Unit Increase in Problem Complexity, Woodfield 1979&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;To quote tensorflow developers
&lt;a class="reference external" href="https://github.com/tensorflow/tensorflow/pull/33460"&gt;“Every [code addition] takes around 16 CPU/GPU
hours of [quality control]. As such, we cannot just run every
[code addition] through the [quality control] infrastructure.”&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;A maintainer’s job is to say no often&lt;/strong&gt;, to protect the project. But,
as any gatekeeping, it can unfortunately become an excercise in unchecked
power. Making objective choices for these difficult decisions is hard,
and we all tend naturally to trust more people that we know.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Most often we are not aware of our shortcomings, let alone are we doing
them on purpose.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="contributor-s-fatigue"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-2"&gt;Contributor’s fatigue&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;A new contributor starting a conversation with a group of seasoned
project maintainers may easily &lt;strong&gt;feel an imposter&lt;/strong&gt;. The
new contributor knows less about the project. In addition, he or she is engaging
with a group of people that know each-other well, and is not yet part of
that &lt;em&gt;inner&lt;/em&gt; group.&lt;/p&gt;
&lt;p&gt;This person does not know the code base, or the conventions, and must &lt;strong&gt;make
extra efforts&lt;/strong&gt;, compared to the seasoned developers, to propose a
contribution suitable for the project. Often, he or she does
not understand fully the reasons for the project guidelines, or for the
feedback given. Request for changes can easily be seen as trifles.&lt;/p&gt;
&lt;p&gt;Integrating the contribution can often be a lengthy process –in
particular in scikit-learn. Indeed, it will involve not only shaping up
the contribution, but also learning the skills and discovering the
process. These &lt;strong&gt;long cycles can undermine motivation&lt;/strong&gt;: humans need
successes to feel enthusiasm. Also, the contributor may legitimately
worry: Will all these efforts be fruitful? Will the contribution make its
way to the project?&lt;/p&gt;
&lt;p&gt;Note that for these reasons, it is recommended to start contributing with
very simple features, and to seek feedback on the scope of the
contribution before writing the code.&lt;/p&gt;
&lt;p&gt;Finally, contributors are seldom paid to work on the project, and there
is no single line of command that makes decisions and controls incentives
for all the people on the project. No one is responsible when things go
astray, which means that the weight falls on the shoulder of the
individuals.&lt;/p&gt;
&lt;!-- fun pictures, to relax atmosphere, but only later, first write and
review --&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The danger behind the lengthy cycle of reviews and improvements needed to
contribute is &lt;strong&gt;death by a thousands cuts&lt;/strong&gt;. The contributor looses
motivation, and no longer finds the energy to finish the work.&lt;/p&gt;
&lt;div class="grey docutils container"&gt;
&lt;p&gt;&lt;strong&gt;How about users?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This article is focused on developers. Yet, users are also an
important part of the discussion around open source.&lt;/p&gt;
&lt;p&gt;Often communication failures with users are due to frustration.
Frustration of being unable to use the software, of hitting a bug, of
seeing an important issue still not addressed. This frustration stems
from incorrect expectations, which can often be traced to
misunderstanding of the processes and the dynamics. Managing
expectations is important to improve the dialogue, via the
documentation, via notes on the issue tracker.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="communication-is-hard"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-3"&gt;Communication is hard&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Communication is hard: messages are sometimes received differently than
we would like. &lt;strong&gt;Overworked people discussing very technically
challenging issues&lt;/strong&gt; only makes the matter worse. I have seen people not
come across well, while I know they are absolutely lovely and caring.&lt;/p&gt;
&lt;p&gt;We are human beings; we are limited; we misunderstand things, and we have
feelings.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Emotions&lt;/strong&gt; –
My most vivid memory of a communication failure was when I was a sailing
instructor. Trainees that were under my responsibility had put themselves
at risk, causing me a lot of worry. During the debrief, I was angry. My
failure to convey the messages without emotional loading undermined my
leadership on the group, putting everybody at risk for the rest of the
week.&lt;/p&gt;
&lt;p&gt;Inability to understand the others’ point of view, or to communicate
ours, can bring in emotions. Emotions most often impedes technical
communication.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Limited attention&lt;/strong&gt; –
We, in particular maintainers, are bombarded with email, notifications,
text and code to read.
As a consequence, it is easy to read things too fast, to stop in the
middle, to forget.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Language barriers&lt;/strong&gt; –
Most discussions happen in English; but most of us are not native English
speakers. We may hide well our difficulties, but nuances are often lost.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Clique effects&lt;/strong&gt; –
Most interactions in open source are done in writing, with low
communication bandwidth. It can be much harder to convince a maintainer
on the other side of the world than a colleague in the same room. Schools
of thoughts naturally emerge when people work a lot together. These
create bubbles, where we have the impression that everything we say is
obvious and uncontroversial, and yet we fail to convince people outside
of our bubble.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="little-things-that-help"&gt;
&lt;h2&gt;&lt;a class="toc-backref" href="#toc-entry-4"&gt;Little things that help&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Communication can improved by continuously working on it &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;.
It may be obvious to some, but it personally took me many years to learn.&lt;/p&gt;
&lt;table class="side-hanging docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Training materials for managers often discuss communication, and
give tricks. I am sure that there are better references than my
list below. But that’s the best I can do.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="hear-the-other-exchange"&gt;
&lt;h3&gt;Hear the other: exchange&lt;/h3&gt;
&lt;div class="side-hanging small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;&lt;strong&gt;Related presentation&lt;/strong&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;a class="reference external" href="https://docs.google.com/presentation/d/1mEMjGQXErZC-mBeCt0quLz7b5ODQnehmfwwnCeggzcU/edit#slide=id.g5135b4b0eb_1_14"&gt;How can we have healthier technical discussions?&lt;/a&gt; by Nathaniel J. Smith&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Foster multiway discussions&lt;/strong&gt; – The goal of a technical discussion is to
come up to the best solution. Better solutions emerge via confronting
different points of view: a single brilliant individual
probably cannot find or recognize the best solution alone.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Integrate input from as many perspectives as possible.&lt;/li&gt;
&lt;li&gt;Make sure everyone feels heard.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Don’t seek victory&lt;/strong&gt; – Most important to keep in mind is that giving
up on an argument and accepting the other point of view is a perfectly
valid option. I naturally biased to think that my view on topics dear to
me is the right one. However, I’ve learned that adopting the view of the
other could bring a lot to the social dynamics of a project: we are often
debating over details and the bigger benefit comes from moving forward.&lt;/p&gt;
&lt;p&gt;In addition, if several very bright people have different conclusions
than me about something that they’ve thought a lot, who am I to disagree?&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="convey-ideas-well-pedagogy"&gt;
&lt;h3&gt;Convey ideas well: pedagogy&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Explain&lt;/strong&gt; – Give the premises of your thoughts. Unroll your thought
processes. People are not sitting in your head, and need to hear not only
your conclusion, but how you got there.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Repeat things&lt;/strong&gt; – Account for the fact that people can forget, and
never hesitate to gently restate important points. Reformulating
differently can also help explaining.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Keep it short&lt;/strong&gt; – A typical reading speed is around 200 words a
minute. People have limited time and attention span. The greatest help
you can provide to your reader is to condense your ideas: let us avoid
long threads that require several dozens of minutes to read and digest.
There is a tension between this point and the above. My suggestion:
remove every word that is not useful, move details to footnotes or
postscriptums.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="cater-for-emotions-tone"&gt;
&lt;h3&gt;Cater for emotions: tone&lt;/h3&gt;
&lt;div class="side-hanging small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;&lt;strong&gt;Related good advice&lt;/strong&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;a class="reference external" href="https://www.mozilla.org/en-US/about/governance/policies/participation/#expected-behavior"&gt;Mozilla participation guide, expected behavior section&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Stay technical&lt;/strong&gt; – Always try to get to the technical aspect of the
matter, and never the human. Give specific code and wording suggestions.
When explaining a decision, give technical arguments, even if they feel
obvious to you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Be positive&lt;/strong&gt; – Being positive in general helps people feeling happy and
motivated. It is well known that positive feedback leads to quicker
progress than negative, as revealed &lt;em&gt;eg&lt;/em&gt; by studies of class rooms. I am
particularly guilty of this: I always forget to say something nice,
although I may be super impressed by a contribution. Likewise, avoid
negative words when giving feedback (stay technical).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Avoid “you”&lt;/strong&gt; – The mere use of the pronoun “you” puts the person we are
talking to in the center of message. But the message should not be about
the person, it should be about the work. It’s very easy to react
emotionally when it’s about us. The passive voice can be useful to avoid
putting people as the topic. If the topic is indeed people, sometimes “we”
is an adequate substitute for “you”.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Assume good faith&lt;/strong&gt; – There are so many misunderstandings that can
happen. People forget things, people make mistakes, people fail to convey
their messages. Most often, all these failures are in good faith, and
misunderstandings are legitimate. In the rare cases there might possibly
be some bad faith, accounting for it will only make communication worse,
not better. Along the same line, we should ignore when we feel assaulted
or insulted, and avoid replying in kind.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Choose words wisely&lt;/strong&gt; – The choice of words matter, because they convey
implicit messages. In particular, avoid terms that carry judgement
values: “good” or “bad”. For example “This is done wrong” (note that this
sentence already avoids “you”), could be replaced by “There might be more
numerically stable / efficient way of doing it” (note also the use of
precise technical wording rather than the generic term “better”).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Use moderating words&lt;/strong&gt; – Try to leave room for the other in the
discussion. Statements too assertive close the door to different points
of view: “this must be changed” (note the lack of “you”) should be
avoided while “this should be changed” is better. For this reason, this
article is riddled with words such as “tend”, “often”, “feel”, “may”,
“might”.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Don’t blame someone else&lt;/strong&gt; – If you feel that there is some pattern that
you would like to change, do not point fingers, do not blame others.
Rather, point yourself at the center of the story, find an example of
this pattern with you, and the message should be that “it is a pattern
that &lt;em&gt;we&lt;/em&gt; should avoid. &lt;em&gt;“We”&lt;/em&gt; is such a powerful term. It unites; it
builds a team.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Give your understanding&lt;/strong&gt; – If you feel that there is a misunderstanding,
explain how you are feeling. But do it using “I”, and not “you”, and
acknowledge the subjectivity: “I feel ignored” rather than “you are
ignoring me”. Even better: only talk about the feeling: “I am loosing
motivation, because this is not moving forward”, or “I think that am
failing to convey why this numerical problem is such an important issue”
(note the use of “I think”, which avoids casting the situation as
necessarily true).&lt;/p&gt;
&lt;div class="side-hanging small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;&lt;strong&gt;Implicit messages&lt;/strong&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Four-sides_model"&gt;The four sides&lt;/a&gt;
view of communication highlights the multiple messages present even in
simple statements.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I hope this can be useful. I personally try to apply these rules, because
I want to work better with others.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Thanks&lt;/p&gt;
&lt;p&gt;to many who gave me feedback: Adrin Jalali, Andreas Mueller,
Elizabeth DuPre, Emmanuelle Gouillart, Guillaume
Lemaitre, Joel Nothman, Joris Van den Bossche, Nicolas Hug.&lt;/p&gt;
&lt;/div&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;PS: note how many times I’ve used “you” above. I can clearly get better
at communication!&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="programming"></category><category term="open source"></category><category term="people"></category></entry><entry><title>Getting a big scientific prize for open-source software</title><link href="https://gael-varoquaux.info/programming/getting-a-big-scientific-prize-for-open-source-software.html" rel="alternate"></link><published>2019-12-01T06:00:00+01:00</published><updated>2019-12-01T06:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2019-12-01:/programming/getting-a-big-scientific-prize-for-open-source-software.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;An important acknowledgement for a different view of doing science:
open, collaborative, and more than a proof of concept.&lt;/p&gt;
&lt;/div&gt;
&lt;img alt="" class="align-right" src="attachments/sklearn_prize_academie/prize.jpg" style="width: 350px;" /&gt;
&lt;p&gt;A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand
Thirion, and myself received the &lt;a class="reference external" href="https://www.academie-sciences.fr/fr/Laureats/prix-inria-academie-des-sciences-2019-vincent-hayward-equipe-scikit-learn-et-maria-naya-plasencia.html"&gt;“Académie des Sciences Inria prize for transfer”&lt;/a&gt;,
for our contributions to the scikit-learn project …&lt;/p&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;An important acknowledgement for a different view of doing science:
open, collaborative, and more than a proof of concept.&lt;/p&gt;
&lt;/div&gt;
&lt;img alt="" class="align-right" src="attachments/sklearn_prize_academie/prize.jpg" style="width: 350px;" /&gt;
&lt;p&gt;A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand
Thirion, and myself received the &lt;a class="reference external" href="https://www.academie-sciences.fr/fr/Laureats/prix-inria-academie-des-sciences-2019-vincent-hayward-equipe-scikit-learn-et-maria-naya-plasencia.html"&gt;“Académie des Sciences Inria prize for transfer”&lt;/a&gt;,
for our contributions to the scikit-learn project. To put things simply,
it’s quite a big deal to me, because I feel that it illustrates a change
of culture in academia.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Recognizing an open view of scientific contributions&lt;/div&gt;
&lt;p&gt;It is a great honor, because the selection was made by the members of the
Académie des Sciences, very accomplished scientists with impressive
contributions to science. The “Académie” is the hallmark of fundamental
academic science in France. To me, this prize is also symbolic because it
recognizes an open view of academic research and transfer, a view that
sometimes felt as not playing according to the incentives. We started
scikit-learn as a crazy endeavor, a bit of a &lt;em&gt;hippy&lt;/em&gt; science thing.
People didn’t really take us seriously. We were working on software, and
not publications. We were doing open source, while industrial transfer is
made by creating startups or filing patents. We were doing Python, while
academic machine learning was then done in Matlab, and industrial
transfer in C++. We were not pursuing the latest publications, while
these are thought to be research’s best assets. We were interested in
reaching out to non experts, while partners considered as
interesting have qualified staff.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Quality and openness, at the cost of quantity and control&lt;/div&gt;
&lt;p&gt;No. We did it different. We reached out to an open community. We did
BSD-licensed code. We worked to achieve quality at the cost of quantity. We
cared about installation issues, on-boarding biologists or medical
doctors, playing well with the wider scientific Python ecosystem.
We gave decision power to people outside of Inria, sometimes whom we had
never met in real life. We made sure that Inria was never the sole actor,
the sole stake-holder. We never pushed our own scientific publications in
the project. We limited complexity, trading off performance for ease of
use, ease of installation, ease of understanding.&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;object data="attachments/sklearn_prize_academie/sklearn_website_stats_white.svg" style="width: 25%;" type="image/svg+xml"&gt;&lt;/object&gt;
&lt;/div&gt;
&lt;p&gt;As a consequence, we slowly but surely assembled a large community. In
such a community, the
sum is greater than the parts. The breadth of interlocutors and cultures
slows movement down, but creates better results, because these results are
understandable to many and usable on a diversity of problems. The
consequence of this quality is that
we were progressively used in more and more places: industrial
data-science labs, startups, research in applied or fundamental
statistical learning, teaching. Ironically, the institutional world did
not notice. It got hard, next to impossible, to get funding &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;. A few years
ago, I was told by a central governmental agency that we, open-source
zealots, were destroying an incredible amount of value by giving away
for free the production of research &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;. The French report on AI, lead by a
Fields medal, cited tensorflow and theano –a discontinued software–, but
ignored scikit-learn; maybe because we were doing “boring science”?&lt;/p&gt;
&lt;p&gt;But, scikit-learn’s amazing community continued plowing forward. We grew
so much that we were heard from the top. The prize from the Académie shows
that we managed to capture the attention of senior scientists with
open-source software, because this software is really having a worldwide
impact in many disciplines.&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;img alt="" src="attachments/sklearn_prize_academie/academie_presentation.jpeg" style="width: 70%;" /&gt;
&lt;p class="caption"&gt;Presenting scikit-learn at the Academie Des Sciences&lt;/p&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="align-right docutils container"&gt;
An accomplishment of the community&lt;/div&gt;
&lt;p&gt;There were only five of us on stage, as the prize is for Inria permanent
staff. But this is of course not a fair account of how the project has
grown and what made it successful.&lt;/p&gt;
&lt;p&gt;In 2011, at &lt;a class="reference external" href="scikit-learn-nips-2011-sprint-international-thanks-to-our-sponsors.html"&gt;the first international sprint&lt;/a&gt;,
I felt something was happening: Incredible people whom I had never met
before were sitting next to me, working very hard on solving problems
with me. This experience of being united to solve difficult problems is
something amazing. And I deeply thank every single person who has worked
on this project, the 1500 contributors, many of those that I have never
met, in particular &lt;a class="reference external" href="https://scikit-learn.org/stable/about.html#authors"&gt;the core team&lt;/a&gt; who is committed
to making sure that every detail of scikit-learn is solid and serves the
users. The team that has assembled over the years is of incredible
quality.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="align-right docutils container"&gt;
The promises of data science need open source&lt;/div&gt;
&lt;p&gt;The world does not understand how much the promises of data science,
for today and tomorrow, need open source projects, easy to install and to use
by everybody. These projects are like &lt;a class="reference external" href="https://www.fordfoundation.org/work/learning/research-reports/roads-and-bridges-the-unseen-labor-behind-our-digital-infrastructure/"&gt;roads and bridges&lt;/a&gt;:
they are needed for growth thought no one wants to pay for maintaining
them. I hope that I can use the podium that the prize will give us to
stress the importance of the battle that we are fighting.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Getting funding from the government implied too much politics and
risks. For these reasons, I turned to private donors, in a
&lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/"&gt;foundation&lt;/a&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Inria &lt;em&gt;always&lt;/em&gt; supported us, and often paid developers in my team
out of its own pockets.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;PS: As an another illustration of the culture change toward openness in
science, it was announced during the ceremony that the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Comptes_rendus_de_l%27Acad%C3%A9mie_des_Sciences"&gt;“Compte Rendu de
l’Académie des Sciences”&lt;/a&gt; is becoming open access, without publication
charges!&lt;/p&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="science"></category><category term="scientific computing"></category><category term="open source"></category><category term="software"></category></entry><entry><title>A foundation for scikit-learn at Inria</title><link href="https://gael-varoquaux.info/programming/a-foundation-for-scikit-learn-at-inria.html" rel="alternate"></link><published>2018-09-17T00:00:00+02:00</published><updated>2018-09-17T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2018-09-17:/programming/a-foundation-for-scikit-learn-at-inria.html</id><summary type="html">&lt;p&gt;We have just announced that a foundation will be supporting scikit-learn
at Inria &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;: &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr"&gt;scikit-learn.fondation-inria.fr&lt;/a&gt;&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Growth and sustainability&lt;/div&gt;
&lt;p&gt;This is an exciting turn for us, because it enables us to receive private
funding. As a result, we will be able to have secure employment for some
existing core …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We have just announced that a foundation will be supporting scikit-learn
at Inria &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;: &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr"&gt;scikit-learn.fondation-inria.fr&lt;/a&gt;&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Growth and sustainability&lt;/div&gt;
&lt;p&gt;This is an exciting turn for us, because it enables us to receive private
funding. As a result, we will be able to have secure employment for some
existing core contributors, and to hire more people on the team. The goal
is to help sustaining quality (more frequent releases?) and to tackle
some ambitious features.&lt;/p&gt;
&lt;div class="section" id="a-foundation-what-and-why"&gt;
&lt;h2&gt;A foundation? What and why?&lt;/h2&gt;
&lt;p&gt;Open source lives and thrives by its base, the community of developers.
And scikit-learn is a fantastic example of these dynamics. Because of its
grass-root origins, it has focused on features that matter for the small
and the many, such as ease of use and statistical models that work well
in data-poor situations. Over the years, decisions have been based on
their technical merit, rather than the importance of displaying a list of
features that are trendy. A consequence of the breadth of contributors
with different backgrounds is the library tends to be well-suited for
many applications, including some models that are less mainstream.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
People with dedicated time to support the community&lt;/div&gt;
&lt;p&gt;That said, over time this is an increasing need for a core team of
maintainers. As the library gets bigger, is it more and more difficult to
have a full view of what is happening. Integration of new features,
quality assurances, and releases are best done by developers who can
dedicate a large amount of time to the library. Also, ambitious changes
to the library, such as improving the parallel computing engine, need
long efforts. For many years, we have always had people with dedicated
time to support the community. In France, we were going through hoops to
find public money to found them. As someone who has done this effort, I
can tell you that is a complicated one &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The ability to receive money from sponsors will enable us to scale up our
operations. I was initially worried that we would have difficulties
finding partners that accepted to give us money without asking for
control on the project. However, I was proven wrong, and we have found a
small set of &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/en/home/#sponsors"&gt;great partners&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="what-will-people-work-on-how-will-decisions-be-made"&gt;
&lt;h2&gt;What will people work on? How will decisions be made?&lt;/h2&gt;
&lt;p&gt;It can be a difficult exercise to balance how money is used in a
community-driven project. The project should not loose its drive where
the community of developers is important. Interests of the sponsors
should not prime over interests of the user base.&lt;/p&gt;
&lt;p&gt;We will make sure that the money that the foundation receives is invested
for the interest of the community. We have a technical committee that
supervises the activity of the foundation. Its decisions will be informed
by the community &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;. For this, we have an advisory board composed of
core contributors of scikit-learn. Beside the advisory board, the
technical committee also comprises a delegate from each sponsor. I am
excited about the input that our partners will provide us on
the priorities for them, as they represent various industries.
Voting power will be spread so that sponsors and community have the same
voting power.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="why-not-an-existing-foundation-such-as-numfocus-or-the-psf"&gt;
&lt;h2&gt;Why not an existing foundation such as NumFOCUS, or the PSF?&lt;/h2&gt;
&lt;p&gt;There are several reasons why we choose this particular legal vessel. Our
endeavor is slightly from the prominent foundations in our ecosystem,
&lt;a class="reference external" href="https://numfocus.org"&gt;NumFocus&lt;/a&gt; and the &lt;a class="reference external" href="https://www.python.org/psf"&gt;PSF (Python Software
Foundation)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The first important aspect is that we want to employ full-time
developers. Different countries have very different legal frameworks, and
it is really hard to transfer money overseas in a non profit. Physical
assets like employing people or owning real estate is even harder. We
needed something in France. And there might be a need for something else
in another country at some point.&lt;/p&gt;
&lt;p&gt;Another reason to be embedded in the Inria foundation is that it is
giving us a really good deal. We basically get legal advice, accounting,
office space, and IT support, for an 8% overhead. This is an excellent
deal and is part of the sponsoring efforts that Inria will keep doing.&lt;/p&gt;
&lt;p&gt;Last, we feel that a foundation targeting specifically scikit-learn can
raise money from different people than other foundations. I think that
there is value  having multiple foundations seeking money for open-source
software. Indeed, a foundation builds a case and an image, to convince
donors. Different donors require a different case and a different image.
For instance the president of NumFOCUS &lt;a class="reference external" href="https://twitter.com/aterrel/status/1039488246454083585"&gt;argues for a name less focused on
numerics&lt;/a&gt;. Yet,
too wide of a scope can dilute the image.&lt;/p&gt;
&lt;p&gt;We have in mind to make it easy for other foundations to support
scikit-learn. We have majors contributors at leading institutions, such
as Andreas Mueller at Columbia or Joel Nothman at Sydney university. It
is important that these institutions can easily gather donations too, in
the legal framework suited to their country. Hence the name reflects that
the foundation is embedded at Inria, leaving room for other initiatives.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="what-s-the-scope"&gt;
&lt;h2&gt;What’s the scope?&lt;/h2&gt;
&lt;p&gt;The scope of our work is everything scikit-learn related. It is not the
whole pydata or scipy ecosystem: it is focused on scikit-learn. But we
will not hesitate contribute fixes and enhancements to neighboring
projects, like in the past, even all the way up to core Python &lt;a class="footnote-reference" href="#footnote-4" id="footnote-reference-4"&gt;[4]&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;I’m am very excited. A strong team of full-time contributors will allow
us to do ambitious things with scikit-learn.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Join us&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We will be recruiting! See &lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/people"&gt;our positions&lt;/a&gt;. Come work with us
in Paris.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I want to end by thanking the amazing men and women who have been
contributing to scikit-learn, and are with us in this fantastic
adventure! The energy that is in this project is incredible. We are
are launching this effort thank to you, and to empower you even more.&lt;/p&gt;
&lt;img alt="" class="align-center" src="attachments/code_sklearn_crop.jpg" style="width: 90%;" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;I am quite proud that over the years, my group has employed
&lt;a class="reference external" href="https://github.com/ogrisel"&gt;Olivier Grisel&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/jorisvandenbossche"&gt;Joris van den Bossche&lt;/a&gt; (working on pandas in
addition to scikit-learn), &lt;a class="reference external" href="https://github.com/glemaitre"&gt;Guillaume Lemaître&lt;/a&gt; (working on imbalanced-learn in
addition to scikit-learn), &lt;a class="reference external" href="https://github.com/jeremiedbb"&gt;Jérémie du Boisberranger&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/tomMoral"&gt;Tom Moreau&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/lesteve"&gt;Loic Estève&lt;/a&gt;,
&lt;a class="reference external" href="https://github.com/fabianp"&gt;Fabian Pedregosa&lt;/a&gt;, to name only a
few. All these people, and the many others students that we have
payed part time to work on software, have had an structuring
impact on our ecosystem, going beyond the bounds of scikit-learn
and touching many aspects of computing in Python. However, because
of the constraints of research funding in France, public money
forced my to hire them with short-term contracts.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Technically, it is a tax-deductible scikit-learn consortium inside
the Inria foundation, which is an non-profit entity related to Inria.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Details on the goverance of the foundation can be found at
&lt;a class="reference external" href="https://scikit-learn.fondation-inria.fr/en/mission-and-governance"&gt;https://scikit-learn.fondation-inria.fr/en/mission-and-governance&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-4" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-4"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;For instance Olivier and Tom have been making parallelism more
robust in Python 3.7 (amongst various issues
&lt;a class="reference external" href="https://bugs.python.org/issue33056"&gt;https://bugs.python.org/issue33056&lt;/a&gt; and
&lt;a class="reference external" href="https://bugs.python.org/issue31699"&gt;https://bugs.python.org/issue31699&lt;/a&gt;). Olivier helped defining the
&lt;a class="reference external" href="https://www.python.org/dev/peps/pep-0574/"&gt;new pickling protocol&lt;/a&gt;, crucial to
efficient persistence.
This is hard work. Yet it is
important, because it benefits all libraries.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="open-source"></category><category term="sustainabilty"></category><category term="scientific software"></category></entry><entry><title>Sprint on scikit-learn, in Paris and Austin</title><link href="https://gael-varoquaux.info/programming/sprint-on-scikit-learn-in-paris-and-austin.html" rel="alternate"></link><published>2018-08-01T00:00:00+02:00</published><updated>2018-08-01T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2018-08-01:/programming/sprint-on-scikit-learn-in-paris-and-austin.html</id><summary type="html">&lt;p&gt;Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is
a brief report, on progresses and challenges.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Several sprints&lt;/p&gt;
&lt;p&gt;We actually held two sprint in Austin: one open sprint, at the scipy
conference sprints, which was open to new contributors, and one core
sprint, for more …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is
a brief report, on progresses and challenges.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Several sprints&lt;/p&gt;
&lt;p&gt;We actually held two sprint in Austin: one open sprint, at the scipy
conference sprints, which was open to new contributors, and one core
sprint, for more advanced contributors. Thank you to all who joined
the scipy conference sprint. As I wasn’t there, I cannot report on
it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="many-achievements"&gt;
&lt;h2&gt;Many achievements&lt;/h2&gt;
&lt;p&gt;Too many things were done to be listed here. Here is brief overview:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Optics got merged&lt;/strong&gt;: &lt;a class="reference external" href="http://scikit-learn.org/dev/modules/clustering.html#optics"&gt;The optics clustering algorithm&lt;/a&gt; is a
density-base clustering, as DBScan, but with hyperparameters more
flexible and easier to set. Our implementation is also more scaleable
for very large number of samples. The &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/1984"&gt;Pull request&lt;/a&gt; was opened
in 2013, and got many many improvements over the years.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yeo-Johnson&lt;/strong&gt;: &lt;a class="reference external" href="http://scikit-learn.org/dev/modules/preprocessing.html#mapping-to-a-gaussian-distribution"&gt;The Yeo-Johnson transform&lt;/a&gt;
is a simple parametric transformation of the data that can be used to
make it more Gaussian. It is similar to the Box-Cox transform but can
deal with negative data
(&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11520"&gt;PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Novelty versus outlier detection&lt;/strong&gt;: Novelty detection attempts to
find on new data observations that differ from train data. Outlier
detection considers that even in the train data there are aberrant
observation. New modes in scikit-learn enable both usage scenario with
the same algorithms (see &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/issues/8693"&gt;this issue&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/10700"&gt;this
PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Missing-value indicator&lt;/strong&gt;: a new transform that adds indicator columns
marking missing data
(&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/8075"&gt;PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pypy support&lt;/strong&gt;: pypy support was merged.
(&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11010"&gt;PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Random Forest with 100 estimators&lt;/strong&gt; The default of &lt;cite&gt;n_estimator&lt;/cite&gt; in
RandomForest was changed from 10, which was fast but statistically
poor, to 100 (&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11542"&gt;PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changing to 5-fold&lt;/strong&gt;: we changed to default of cross-validation from
3-fold to 5-fold
(&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11557"&gt;PR&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Toward release 0.20&lt;/strong&gt;: most of the effort of the sprint was actually
spent on addressing issues for the 0.20 release: a long list of quality
improvements
(&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/milestone/24"&gt;milestone&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="scikit-learn-is-hard-work"&gt;
&lt;h2&gt;Scikit-learn is hard work&lt;/h2&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="attachments/dev_scikit-learn.png" style="width: 300px;" /&gt;
&lt;p class="caption"&gt;Even for the almighty &amp;#64;amueller&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Two days of intense group work on scikit-learn reminded me how much it is
hard work. I thought that it was maybe a good idea to try to illustrate
why.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Mathematical errors&lt;/strong&gt;: maintaining the library requires mathematical
understanding of the models. For instance Ivan Panico &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11585"&gt;fixed the sparse
PCA&lt;/a&gt;, for
which the transform was mathematically incorrect.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Numerical instabilities&lt;/strong&gt;: sometimes, however, when models give a
result different from the expected one, this is due to numerical
instability. For instance, Sergul Aydöre &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11587"&gt;changed the tolerance for
certain variants of ridge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keeping examples and documentation up to date&lt;/strong&gt;:
Each change requires changing all documentation and examples. We have a
lot these. For instance, Alex Boucault &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11557"&gt;had to update many examples and
documentation pages when changing the default cross-validation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clean deprecation path&lt;/strong&gt;: We make sure that our changes do not break
users code, and therefore we provide a smooth update path, with
progressive deprecations. For instance, &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11557"&gt;the change of default
cross-validation&lt;/a&gt; introduce
an intermediate step where the default is kept the same and warns that
it will change in two releases.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consistent behavior across the library&lt;/strong&gt;:
One of the acclaimed values of scikit-learn is that it has a very
consistent behavior across different models. We enforce this by “common
tests”, that test some properties of the estimators altogether. For
instance, Sergul implemented &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11558"&gt;common tests for sample weights&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Extensive testing&lt;/strong&gt;: We test many many things in scikit-learn:
that the code snippets in the documentation are correct, that &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/11421"&gt;the
docstring conventions&lt;/a&gt; are
respected, that there are no deprecation errors raised, including from
our dependencies. As a results, continuous integration is a core part
of our development. During the sprint, we flooded our cloud-based
continuous integration, and as a result iteration really slowed down.
&lt;a class="reference external" href="https://travis-ci.org/"&gt;TravisCI&lt;/a&gt; were kind enough to fix this by
allocating us freely more computing power.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Supporting many versions&lt;/strong&gt;: Least by not least, one constraint that
makes development hard with scikit-learn is that we support many
different versions of Python, of our dependencies, of linear-algebra
libraries, and of operating system. This makes development harder, and
continuous integration slower. But we feel that this is very valuable
for a core library: narrowing the supported versions means that users
are more likely to end up in unsatisfiable dependencies situations,
where different parts of a project want different version numbers of a
dependency.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="admonition warning"&gt;
&lt;p class="first admonition-title"&gt;Warning&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dropping support for Python 2&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Supporting many version slows development. It also prevents
implementing new features: supporting Python 2 makes it harder to
provide  better parallelism or traceback management.&lt;/p&gt;
&lt;p class="last"&gt;Python 3 has been out for 10 years. It is solid and comes with many
improvements over Python 2. Alongside with &lt;a class="reference external" href="http://python3statement.org"&gt;many other projects&lt;/a&gt;, we will be requiring Python 3 for
the future releases of scikit-learn (0.21 and later). scikit-learn
0.20 will be the last release to support Python 2. It will enable
us to develop faster a better toolkit.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="credits-and-acknowledgments"&gt;
&lt;h2&gt;Credits and acknowledgments&lt;/h2&gt;
&lt;div class="section" id="contributors-to-the-sprint"&gt;
&lt;h3&gt;Contributors to the sprint&lt;/h3&gt;
&lt;div class="sidebar"&gt;
&lt;p class="first sidebar-title"&gt;Women contributors&lt;/p&gt;
&lt;p class="last"&gt;We deeply regret having only one woman in this long list of
contributors. We care about diversity and welcome contributors from
under-represented groups &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[*]&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;In Paris&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="columns simple"&gt;
&lt;li&gt;Albert Thomas, Huawey&lt;/li&gt;
&lt;li&gt;Alexandre Boucaud, Inria&lt;/li&gt;
&lt;li&gt;Alexandre Gramfort, Inria&lt;/li&gt;
&lt;li&gt;Eric Lebigot, CFM&lt;/li&gt;
&lt;li&gt;Gaël Varoquaux, Inria&lt;/li&gt;
&lt;li&gt;Ivan Panico, Deloitte&lt;/li&gt;
&lt;li&gt;Jean-Baptiste Schiratti, Telecom ParisTech&lt;/li&gt;
&lt;li&gt;Jérémie du Boisberranger, Inria&lt;/li&gt;
&lt;li&gt;Léo Dreyfus-Schmidt, Dataiku&lt;/li&gt;
&lt;li&gt;Nicolas Goix&lt;/li&gt;
&lt;li&gt;Samuel Ronsin, Dataiku&lt;/li&gt;
&lt;li&gt;Sebastien Treguer, Independent&lt;/li&gt;
&lt;li&gt;Sergül Aydöre, Stevens Institute of Technology&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;In Austin&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="columns simple"&gt;
&lt;li&gt;Andreas Mueller, Columbia&lt;/li&gt;
&lt;li&gt;Guillaume Lemaître, Inria&lt;/li&gt;
&lt;li&gt;Jan van Rijn, Columbia&lt;/li&gt;
&lt;li&gt;Joan Massich, Inria&lt;/li&gt;
&lt;li&gt;Joris Van den Bossche, Inria&lt;/li&gt;
&lt;li&gt;Loïc Estève, Inria&lt;/li&gt;
&lt;li&gt;Nicolas Hug, Columbia&lt;/li&gt;
&lt;li&gt;Olivier Grisel, Inria&lt;/li&gt;
&lt;li&gt;Roman Yurchak, independent&lt;/li&gt;
&lt;li&gt;William de Vazelhes, Inria&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Remote&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="columns simple"&gt;
&lt;li&gt;Hanmin Qin, Peking University&lt;/li&gt;
&lt;li&gt;Joel Nothman, University of Sydney&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="sponsors"&gt;
&lt;h3&gt;Sponsors&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://franceisai.com/"&gt;France Is AI&lt;/a&gt; payed the travel of the French
contributors to Austin&lt;/li&gt;
&lt;li&gt;The NSF and the Sloan foundation payed the travel of the people from
Columbia.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://scipy2018.scipy.org"&gt;SciPy 2018&lt;/a&gt; organizers (and sponsors) hosted the first part of the sprint in Austin,&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://www.enthought.com/"&gt;Enthought&lt;/a&gt; hosted the second part of the sprint in Austin,&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://www.dataiku.com/"&gt;Dataiku&lt;/a&gt; hosted us in Paris&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://travis-ci.org/"&gt;TravisCI&lt;/a&gt; raised our number of workers for
online testing&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://www.meetup.com/Paris-Machine-learning-applications-group/"&gt;ParisML meetup&lt;/a&gt; helped us with the organization&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thank you all for the support&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Also thanks to Andy Mueller and Olivier Grisel for feedback on this blog post.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[*]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;We aspire to treat everybody excatly the same way. However,
acknowledging the fact that there is currently a lack of diversity, we
are happy to do some outreach and give extra help onboarding
newcomers.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="open-source"></category><category term="reproducible research"></category><category term="scientific software"></category></entry><entry><title>Beyond computational reproducibility, let us aim for reusability</title><link href="https://gael-varoquaux.info/programming/beyond-computational-reproducibility-let-us-aim-for-reusability.html" rel="alternate"></link><published>2017-09-19T12:10:00+02:00</published><updated>2017-09-19T12:10:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2017-09-19:/programming/beyond-computational-reproducibility-let-us-aim-for-reusability.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Scientific progress calls for reproducing results. Due to limited
resources, this is difficult even in computational sciences. Yet,
reproducibility is only a means to an end. It is not enough by itself
to enable new scientific results. Rather, new discoveries must build
on reuse and modification of the state …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Scientific progress calls for reproducing results. Due to limited
resources, this is difficult even in computational sciences. Yet,
reproducibility is only a means to an end. It is not enough by itself
to enable new scientific results. Rather, new discoveries must build
on reuse and modification of the state of the art. As time goes, this
state of the art must be consolidated in software libraries, just as
scientific knowledge as been consolidated on bookshelves of
brick-and-mortar libraries.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="small docutils container"&gt;
I am reposting &lt;a class="reference external" href="https://openlab-flowers.inria.fr/uploads/default/original/1X/65addc14bb2a6a7feaf7690865fa3708d5b0990f.pdf"&gt;an essay&lt;/a&gt;
that I wrote on reproducible science and software libraries. The full
discussion is in &lt;a class="reference external" href="https://openlab-flowers.inria.fr/t/ieee-cis-newsletter-on-cognitive-and-developmental-systems/129/1"&gt;IEEE CIS TC Cognitive and Developmental Systems&lt;/a&gt;,
but I’ve been told that it is hard to find.&lt;/div&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Science is based on the ability to falsify claims. Thus, reproduction or
replication of published results is central to the progress of science.
Researchers failing to reproduce a result will raise questions:
Are these investigators not skilled enough? Did they misunderstand the
original scientific endeavor? Or is the scientific claim unfounded? For
this reason, the quality of the methods description in a research paper
is crucial. Beyond papers, computers —central to science in our digital
era— bring the hope of automating reproduction. Indeed, computers excel
at doing the same thing several times.&lt;/p&gt;
&lt;p&gt;However, there are many challenges to computational reproducibility. To
begin with, computers enable reproducibility only if all steps of a
scientific study are automated. In this sense, interactive environments
—productivity-boosters for many— are detrimental unless they enable easy
recording and replay of the actions performed. Similarly, as a
computational-science study progresses, it is crucial to keep track of
changes to the corresponding data and scripts. With a
software-engineering perspective, version control is the solution. It
should be in the curriculum of today’s scientists. But it does not
suffice. Automating a computational study is difficult. This is because
it comes with a large maintenance burden: operations change rapidly,
straining limited resources —processing power and storage. Saving
intermediate results helps. As does devising light experiments that are
easier to automate. These are crucial to the progress of science, as
laboratory classes or thought experiments in physics. A software
engineer would relate them to unit tests, elementary operations checked
repeatedly to ensure the quality of a program.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Archiving computers in thermally-regulated nuclear-proof vaults?&lt;/div&gt;
&lt;p&gt;Once a study is automated and published, ensuring reproducibility should
be easy; just a matter of archiving the computer used, preferably in a
thermally-regulated nuclear-proof vault. Maybe, dear reader, the
scientist in you frowns at this solution. Indeed, studies should also be
reproduced by new investigators. Hardware and software variations then
get in the way. Portability, &lt;em&gt;ie&lt;/em&gt; achieving identical results across
platforms, is well-known by the software industry as being a difficult
problem. It faces great hurdles due to incompatibilities in compilers,
libraries, or operating systems. Beyond these issues, portability also
faces numerical and statistical stability issues in scientific computing.
Hiding instability problems with heavy restrictions on the environment is
like rearranging deck chairs on the Titanic. While enough freezing will
recover reproducibility, unstable operations cast doubt upon scientific
conclusions they might lead to. Computational reproducibility is more
than a software engineering challenge; it must build upon solid numerical
and statistical methods.&lt;/p&gt;
&lt;p&gt;Reproducibility is not enough. It is only a means to an end, scientific
progress. Setting in stone a numerical pipeline that produces a figure is
of little use to scientific thinking if it is a black box. Researchers
need to understand the corresponding set of operations to relate them to
modeling assumptions. New scientific discoveries will arise from varying
those assumptions, or applying the methodology to new questions or new
data. Future studies build upon past studies, standing on the shoulders
of giants, as Isaac Newton famously wrote. In this process, published
results need to be modified and adapted, not only reproduced. Enabling
reuse is an important goal.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Libraries as reusable computational experiments&lt;/div&gt;
&lt;p&gt;To a software architect, a reusable computational experiment may sound
like a library. Software libraries are not only a good analogy, but also
an essential tool. The demanding process of designing a good library
involves isolating elementary steps, ensuring their quality, and
documenting them. It is akin to the editorial work needed to assemble a
textbook from the research literature.&lt;/p&gt;
&lt;p&gt;Science should value libraries made of code, and not only bookshelves.
But they are expensive to develop, and even more so to maintain. Where to
set the cursor? It is clear that in physics not every experimental setup
can be stored for later reuse. Costs are less tangible with computational
science; but they should not be underestimated. In addition, the race to
publish creates legions of studies. As an example, Google scholar lists
28000 publications concerning compressive sensing in 2015. Arguably many
are incremental and research could do with less publications. Yet the
very nature of research is to explore new ideas, not all of which are to
stay.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Identifying and consolidating major results for reuse&lt;/div&gt;
&lt;p&gt;Computational research will best create scientific progress by
identifying and consolidating the major results. It is a difficult but
important task. These studies should be made reusable. Limited resources
imply that the remainder will suffer from “code rot”, with results
becoming harder and harder to reproduce as their software environment
becomes obsolete. Libraries, curated and maintained, are the building
blocks that can enable progress.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="grey docutils container"&gt;
If you want to cite this essay in an academic publication, please
cite the version in
&lt;a class="reference external" href="https://openlab-flowers.inria.fr/t/ieee-cis-newsletter-on-cognitive-and-developmental-systems/129/1"&gt;IEEE CIS TC Cognitive and Developmental Systems&lt;/a&gt;
(volume 32, number 2, 2016).&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Related posts&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="software-for-reproducible-science-lets-not-have-a-misunderstanding.html"&gt;Software for reproducible science: let’s not have a misunderstanding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="mloss-2015-wising-up-to-building-open-source-machine-learning.html"&gt;MLOSS 2015: wising up to building open-source machine learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="../science/publishing-scientific-software-matters.html"&gt;Publishing scientific software matters&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="science"></category><category term="scientific computing"></category><category term="publishing"></category><category term="software"></category><category term="reproducible research"></category></entry><entry><title>Scikit-learn Paris sprint 2017</title><link href="https://gael-varoquaux.info/programming/scikit-learn-paris-sprint-2017.html" rel="alternate"></link><published>2017-06-23T00:00:00+02:00</published><updated>2017-06-23T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2017-06-23:/programming/scikit-learn-paris-sprint-2017.html</id><summary type="html">&lt;object class="align-right" data="attachments/scikit-learn-logo.svg" style="width: 400px;" type="image/svg+xml"&gt;&lt;/object&gt;
&lt;p&gt;Two week ago, we held in Paris a large international sprint on
&lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt;. It was incredibly productive
and fun, as always. We are still busy merging in the work, but I think
that know is a good time to try to summarize the sprint.&lt;/p&gt;
&lt;div class="section" id="a-massive-workforce"&gt;
&lt;h2&gt;A massive workforce&lt;/h2&gt;
&lt;img alt="" class="align-center" src="attachments/sklearn_sprint_2017/P1060011.jpg" style="width: 100%;" /&gt;
&lt;p&gt;We had a …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;object class="align-right" data="attachments/scikit-learn-logo.svg" style="width: 400px;" type="image/svg+xml"&gt;&lt;/object&gt;
&lt;p&gt;Two week ago, we held in Paris a large international sprint on
&lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt;. It was incredibly productive
and fun, as always. We are still busy merging in the work, but I think
that know is a good time to try to summarize the sprint.&lt;/p&gt;
&lt;div class="section" id="a-massive-workforce"&gt;
&lt;h2&gt;A massive workforce&lt;/h2&gt;
&lt;img alt="" class="align-center" src="attachments/sklearn_sprint_2017/P1060011.jpg" style="width: 100%;" /&gt;
&lt;p&gt;We had a mix of core contributors and newcomers, which is a great
combination, as it enables us to be productive, but also to foster the
new generation of core developers. Were present:&lt;/p&gt;
&lt;ul class="columns simple"&gt;
&lt;li&gt;Albert Thomas&lt;/li&gt;
&lt;li&gt;Alexandre Abadie&lt;/li&gt;
&lt;li&gt;Alexandre Gramfort&lt;/li&gt;
&lt;li&gt;Andreas Mueller&lt;/li&gt;
&lt;li&gt;Arthur Imbert&lt;/li&gt;
&lt;li&gt;Aurélien Bellet&lt;/li&gt;
&lt;li&gt;Bertrand Thirion&lt;/li&gt;
&lt;li&gt;Denis Engemann&lt;/li&gt;
&lt;li&gt;Elvis Dohmatob&lt;/li&gt;
&lt;li&gt;Gael Varoquaux&lt;/li&gt;
&lt;li&gt;Jan Margeta&lt;/li&gt;
&lt;li&gt;Joan Massich&lt;/li&gt;
&lt;li&gt;Joris Van den Bossche&lt;/li&gt;
&lt;li&gt;Laurent Direr&lt;/li&gt;
&lt;li&gt;Lemaitre Guillaume&lt;/li&gt;
&lt;li&gt;Loic Esteve&lt;/li&gt;
&lt;li&gt;Mohamed Maskani Filali&lt;/li&gt;
&lt;li&gt;Nathalie Vauquier&lt;/li&gt;
&lt;li&gt;Nicolas Cordier&lt;/li&gt;
&lt;li&gt;Nicolas Goix&lt;/li&gt;
&lt;li&gt;Olivier Grisel&lt;/li&gt;
&lt;li&gt;Patricio Cerda&lt;/li&gt;
&lt;li&gt;Paul Lagrée&lt;/li&gt;
&lt;li&gt;Raghav RV&lt;/li&gt;
&lt;li&gt;Roman Yurchak&lt;/li&gt;
&lt;li&gt;Sebastien Treger&lt;/li&gt;
&lt;li&gt;Sergei Lebedev&lt;/li&gt;
&lt;li&gt;Thierry Guillemot&lt;/li&gt;
&lt;li&gt;Thomas Moreau&lt;/li&gt;
&lt;li&gt;Tom Dupré la Tour&lt;/li&gt;
&lt;li&gt;Vlad Niculae&lt;/li&gt;
&lt;/ul&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Manoj Kumar (could not come to Paris because of visa issues)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And many more people participating remote, and I am pretty certain that I
forgot people.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="support-and-hosting"&gt;
&lt;h2&gt;Support and hosting&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Hosting&lt;/strong&gt;:
As the sprint extended through a French bank holiday and the week end,
we were hosted in a variety of venues:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://lapaillasse.org"&gt;La paillasse&lt;/a&gt;, a Paris bio-hacker space&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.criteo.com"&gt;Criteo&lt;/a&gt;, a French company doing word-wide
add-banner placement. The venue there was absolutely gorgeous, with a
beautiful terrace on the roofs of Paris. And they even had a social
event with free drinks one evening.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Guillaume Lemaître did most of the organization, and at Criteo Ibrahim
Abubakari was our host. We were treated like kings during the whole stay;
each host welcoming us as well they could.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Financial support by France is IA&lt;/strong&gt;: Beyond our hosts, we need to thank
&lt;a class="reference external" href="https://franceisai.com/"&gt;France is IA&lt;/a&gt; who fund the sprint, covering
some of the lunches, accomodations, and travel expenses to bring in our
contributors from abroad (3000 euros travel &amp;amp; accomodation, and 1000
euros for food and a venue during the week end).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="some-achievements-during-the-sprint"&gt;
&lt;h2&gt;Some achievements during the sprint&lt;/h2&gt;
&lt;p&gt;I would be hard to list everything that we did during the sprint (have a
look at the &lt;a class="reference external" href="http://scikit-learn.org/dev/whats_new.html#version-0-14"&gt;development changelog&lt;/a&gt; if you’re curious). Here are some&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p class="first"&gt;Quantile transformer, to transform the data distribution into uniform,
or Gaussian distributions
(&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/8363"&gt;PR&lt;/a&gt;,
&lt;a class="reference external" href="http://scikit-learn.org/dev/auto_examples/preprocessing/plot_all_scaling.html"&gt;example&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Before&lt;/strong&gt;&lt;/p&gt;
&lt;img alt="" src="attachments/sklearn_sprint_2017/original_distributions.png" style="width: 500px;" /&gt;
&lt;p&gt;&lt;strong&gt;After&lt;/strong&gt;&lt;/p&gt;
&lt;img alt="" src="attachments/sklearn_sprint_2017/quantile_transform.png" style="width: 500px;" /&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Memory saving by avoiding to cast to float64 if X is given as float32:
we are slowly making sure that, as much as possible, all models avoid
using internal representations of a dtype float64 when the data is
given as float32. This reduces significantly memory usage and can give
speed ups up to a factor of two.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;API test on instances rather than class. This is to facilitate testing
packages in &lt;a class="reference external" href="https://github.com/scikit-learn-contrib/scikit-learn-contrib"&gt;scikit-learn-contrib&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Many small API fixes to ensure better consistency of models, as well as
cleaning the codebase, making sure that examples display well under
matplotlib 2.x.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;Many bug fixes, include fixing corner cases in our average precision,
which was dear to me (&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/9017"&gt;PR&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Work soon to be merged&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;ColumnTransformer (&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/9012"&gt;PR&lt;/a&gt;): from
pandas dataframe to feature matrix, by applying different transformers
to different columns.&lt;/li&gt;
&lt;li&gt;Fixing t-SNE (&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/9032"&gt;PR&lt;/a&gt;): our
t-SNE implementation was extremely memory-inefficient, and on top of
this had minor bugs. We are fixing it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There is a lot more of pending work that the sprint help moved forward.
You can also glance at the &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pulse/monthly"&gt;monthly activity report on github&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Joblib progress&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pythonhosted.org/joblib/"&gt;Joblib&lt;/a&gt;, the parallel-computing
engine used by scikit-learn, is getting extended to work in distributed
settings, for instance using dask distributed as a &lt;a class="reference external" href="http://distributed.readthedocs.io/en/latest/joblib.html"&gt;backend&lt;/a&gt;.
At the sprint, we made progress running a grid-search on Criteo’s Hadoop
cluster.&lt;/p&gt;
&lt;img alt="" class="align-center" src="attachments/sklearn_sprint_2017/P1060014.jpg" style="width: 100%;" /&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="sprint"></category><category term="scikit-learn"></category><category term="python"></category><category term="machine learning"></category></entry><entry><title>Data science instrumenting social media for advertising is responsible for todays politics</title><link href="https://gael-varoquaux.info/programming/data-science-instrumenting-social-media-for-advertising-is-responsible-for-todays-politics.html" rel="alternate"></link><published>2016-11-11T00:00:00+01:00</published><updated>2016-11-11T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2016-11-11:/programming/data-science-instrumenting-social-media-for-advertising-is-responsible-for-todays-politics.html</id><summary type="html">&lt;p&gt;&lt;em&gt;To my friends developing data science for the social media, marketing, and
advertising industries,&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;It is time to accept that we have our share of responsibility in the outcome of
the US elections and the vote on Brexit. We are not creating the
society that we would like. Facebook,
Twitter …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;em&gt;To my friends developing data science for the social media, marketing, and
advertising industries,&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;It is time to accept that we have our share of responsibility in the outcome of
the US elections and the vote on Brexit. We are not creating the
society that we would like. Facebook,
Twitter, targeted advertising, customer profiling, are harmful to truth
and have helped Brexiting and electing Trump. Journalism
has been replaced by social media and commercial content tailored to
influence the reader: your own personal distorted reality.&lt;/p&gt;
&lt;p&gt;There are many deep reasons why Trump won the election. Here, as a
data scientist, I want to talk about the factors created by data science.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Rumor replaces truth&lt;/strong&gt;: the way we, data-miners, aggregate and
recommend content is based on its popularity, on readership statistics.
In no way is it based in the truthfulness of the content. As a
result, Facebook, Twitter, Medium, and the like amplify rumors and
sensational news, with no reality check &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This is nothing new: clickbait and tabloids build upon it. However, social networking and
active recommendation makes things significantly worst. Indeed, birds of
a feather flock together, reinforcing their own biases. &lt;strong&gt;We receive
filtered information&lt;/strong&gt;: have you noticed that every single argument you
heard was overwhelmingly against (or in favor of) Brexit? To make matters
even worse, our brain loves it: to resolve cognitive dissonance we avoid
information that contradicts our biases &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;We all believe more information when it confirms our biases&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Gossiping, rumors, and propaganda have always made sane decisions
difficult. The &lt;strong&gt;filter bubble&lt;/strong&gt;, algorithmically-tuned rose-colored
glasses of Facebook, escalate this problem into a major dysfunction of
our society. They amplify messy and false information better than
anything before. Soviet-style propaganda builds on a carefully-crafted
lies; post-truth politics build on a flood of information that does not
even pretend to be credible in the long run.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Active distortion of reality&lt;/strong&gt;: amplifying biases to the point that
they drown truth is bad. Social networks actually do worse: they give
tools for active manipulation of our perception of the world. Indeed, the
revenue of today’s Internet information engines comes from advertising.
For this purpose they are designed to learn as much as possible about the
reader. Then they sell this information bundled with a slot where the
buyer can insert the optimal message to influence the reader.&lt;/p&gt;
&lt;a class="reference external image-reference" href="https://www.flickr.com/photos/benterrett/6929895752/"&gt;&lt;img alt="" class="align-right" src="https://farm8.staticflickr.com/7212/6929895752_2e359557b8_z_d.jpg" style="width: 25%;" /&gt;&lt;/a&gt;
&lt;p&gt;The Trump campaign used targeted Facebook ads presenting to
unenthusiastic democrats information about Clinton tuned to discourage
them from voting. For instance, &lt;a class="reference external" href="http://www.theverge.com/2016/10/27/13434246/donald-trump-targeted-dark-facebook-ads-black-voters"&gt;portraying her as racist to black voters&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Information manipulation works. The Trump campaign has been a smearing
campaign aimed at suppressing votes of his opponent. Release of
negative information on Clinton &lt;a class="reference external" href="https://medium.com/&amp;#64;jonathonmorgan/we-are-more-than-our-partisanship-4ea179592c1f"&gt;did affect her supporter allegiance&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tech created the perfect mind-control tool, with an eyes on
sales revenue. Someone used it for politics.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The tech industry is mostly socially-liberal and highly educated,
wishing the best for society. But it must accept its share of the blame.
My friends improving machine-learning for costumer profiling and ad
placement, &lt;strong&gt;you help shaping a world of lies and deception&lt;/strong&gt;. I will
not blame you for accepting this money: if it were not for you, others
would do it. But we should all be thinking about how do we improve this
system. How do we use data science to build a world based on objectivity,
transparency, and truth, rather than Internet-based marketing?&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;References analysing the erosion of truth&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.economist.com/news/briefing/21706498-dishonesty-politics-nothing-new-manner-which-some-politicians-now-lie-and"&gt;Must-read article in the economist on lies in politics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Post-truth_politics"&gt;Wikipedia page on Post-truth politics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://nymag.com/selectall/2016/11/donald-trump-won-because-of-facebook.html"&gt;Donald Trump won because of Facebook&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://inverseprobability.com/2016/06/23/the-real-story-behind-todays-referendum"&gt;The real story behind todays referendum&lt;/a&gt; : Neil Lawrence’s analysis of the filter-bublle effect in Brexit&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://users.polisci.wisc.edu/behavior/Papers/Toff&amp;amp;Kim2013.pdf"&gt;A 2013 academic study showing that twitter increases partisan
polarization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Disgression: other social issues of data science&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The tech industry is &lt;strong&gt;increasing inequalities&lt;/strong&gt;, making the rich richer and
leaving the poor behind. Data-science, with its ability to automate
actions and wield large sources of information, is a major contributor
to these sources of inequalities.&lt;/li&gt;
&lt;li&gt;Internet-based marketing is building &lt;strong&gt;a huge spying machine&lt;/strong&gt; that
infers as much as possible about the user. The Trump campaign was able
to target a specific population, black voters leaning towards
democrats. What if this data was used for direct executive action? This
could come quicker than we think, given how intelligence agencies tap
into social media.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I preferred to focus this post on how data-science can help distort truth.
Indeed, it is a problem too often ignored by data scientists who like to
think that they are empowering users.&lt;/p&gt;
&lt;/div&gt;
&lt;!-- The wikileaks dumps of Clinton's mail resemble the
`Kompromat &lt;https://en.wikipedia.org/wiki/Kompromat&gt;`_ techniques used
by post-soviet regimes, using private information on opponents to
control them. --&gt;
&lt;p class="align-right"&gt;In memory of &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Aaron_Swartz"&gt;Aaron Schwartz&lt;/a&gt;
who fought centralized power on Internet.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Facebook was until recently using human curators, &lt;a class="reference external" href="http://arstechnica.com/business/2016/08/facebook-fires-human-editors-algorithm-immediately-posts-fake-news/"&gt;but fired them,
leading to a loss of control on veracity&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;It is a well-known and well-studied cognitive bias that
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Cognitive_dissonance"&gt;individuals strive to reduce cognitive dissonace and actively avoid
situations and information likely to increase it&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;a class="reference external image-reference" href="https://www.flickr.com/photos/cdevers/4602805654"&gt;&lt;img alt="" class="align-center" src="https://farm2.staticflickr.com/1376/4602805654_db8b6569fb_z_d.jpg" style="width: 80%;" /&gt;&lt;/a&gt;
</content><category term="programming"></category><category term="politics"></category><category term="data science"></category><category term="software"></category><category term="machine learning"></category><category term="society"></category></entry><entry><title>Better Python compressed persistence in joblib</title><link href="https://gael-varoquaux.info/programming/new_low-overhead_persistence_in_joblib_for_big_data.html" rel="alternate"></link><published>2016-05-20T00:00:00+02:00</published><updated>2016-05-20T00:00:00+02:00</updated><author><name>Alexandre Abadie &amp; Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2016-05-20:/programming/new_low-overhead_persistence_in_joblib_for_big_data.html</id><summary type="html">&lt;p class="first last"&gt;New persistence in joblib enables low-overhead storage of big data contained in arbitrary objects&lt;/p&gt;
</summary><content type="html">&lt;div class="section" id="problem-setting-persistence-for-big-data"&gt;
&lt;h2&gt;Problem setting: persistence for big data&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://pythonhosted.org/joblib/"&gt;Joblib&lt;/a&gt; is a powerful Python package
for management of computation: parallel computing, caching, and
primitives for out-of-core computing. It is handy when working on so
called &lt;strong&gt;big data&lt;/strong&gt;, that can consume more than the available RAM (several GB
nowadays). In such situations, objects in the working space must be
persisted to disk, for out-of-core computing, distribution of jobs, or
caching.&lt;/p&gt;
&lt;p&gt;An efficient strategy to write code dealing with big data is to rely on
&lt;strong&gt;numpy arrays to hold large chunks of structured data&lt;/strong&gt;.
The code then handles objects or arbitrary containers (list, dict) with
numpy arrays. For data management, joblib provides transparent disk
persistence that is very efficient with such objects. The internal
mechanism relies on specializing &lt;a class="reference external" href="https://docs.python.org/3/library/pickle.html"&gt;pickle&lt;/a&gt; to handle better numpy
arrays.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://github.com/joblib/joblib/pull/260"&gt;Recent improvements&lt;/a&gt;
reduce vastly the memory overhead of data persistence.&lt;/p&gt;
&lt;div class="section" id="limitations-of-the-old-implementation"&gt;
&lt;h3&gt;Limitations of the old implementation&lt;/h3&gt;
&lt;p&gt;❶ Dumping/loading persisted data &lt;strong&gt;with compression&lt;/strong&gt; was a memory hog,
because of internal copies of data, limiting the maximum size
of usable data with compressed persistence:&lt;/p&gt;
&lt;img alt="" class="large" src="https://gael-varoquaux.info/programming/attachments/old_pickle_mem_profile.png" /&gt;
&lt;p&gt;We see the increased memory usage during the calls to &lt;tt class="docutils literal"&gt;dump&lt;/tt&gt; and
&lt;tt class="docutils literal"&gt;load&lt;/tt&gt; functions, profiled using the &lt;a class="reference external" href="https://pypi.python.org/pypi/memory_profiler"&gt;memory_profiler package&lt;/a&gt; with this &lt;a class="reference external" href="https://gist.github.com/aabadie/7cba3385406d1cec7d3dd4407ba3f164"&gt;gist&lt;/a&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;❷ Another drawback was that large numpy arrays (&amp;gt;10MB) contained in an
arbitrary Python object were dumped in separate &lt;tt class="docutils literal"&gt;.npy&lt;/tt&gt; file, increasing
the load on the file system &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;joblib&lt;/span&gt; &lt;span class="c1"&gt;# joblib version: 0.9.4&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;

&lt;span class="c1"&gt;# 3 files are generated:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl_01.npy.z&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl_02.npy.z&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
 &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.47006195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.5436392&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.1218267&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.48592789&lt;/span&gt;&lt;span class="p"&gt;]])]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;!-- XXX: announce content of post earlier

Let's now discover the new features and improvements that comes with
version 0.10.0. After that, we'll compare speed and memory consumption with
other libraries and discuss the results. Then we'll give some details about the
new internal implementation. --&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="what-s-new-compression-low-memory"&gt;
&lt;h2&gt;What’s new: compression, low memory…&lt;/h2&gt;
&lt;p&gt;❶ &lt;strong&gt;Memory usage is now stable&lt;/strong&gt;:&lt;/p&gt;
&lt;img alt="" src="https://gael-varoquaux.info/programming/attachments/new_pickle_mem_profile.png" /&gt;
&lt;p&gt;❷ &lt;strong&gt;All numpy arrays are persisted in a single file&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;joblib&lt;/span&gt; &lt;span class="c1"&gt;# joblib version: 0.10.0 (dev)&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;

&lt;span class="c1"&gt;# only 1 file is generated:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
 &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.47006195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.5436392&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.1218267&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.48592789&lt;/span&gt;&lt;span class="p"&gt;]])]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;❸ &lt;strong&gt;Persistence in a file handle&lt;/strong&gt; (ongoing work in a &lt;a class="reference external" href="https://github.com/joblib/joblib/pull/351"&gt;pull request&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;❹ &lt;strong&gt;More compression formats are available&lt;/strong&gt;&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Backward compatibility&lt;/p&gt;
&lt;p&gt;Existing joblib users can be reassured: the new version is &lt;strong&gt;still
compatible with pickles generated by older versions&lt;/strong&gt; (&amp;gt;= 0.8.4). You
are encouraged to update (rebuild?) your cache if you want to take
advantage of this new version.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="benchmarks-speed-and-memory-consumption"&gt;
&lt;h2&gt;Benchmarks: speed and memory consumption&lt;/h2&gt;
&lt;p&gt;Joblib strives to have &lt;strong&gt;minimum dependencies&lt;/strong&gt; (only numpy) and to
&lt;strong&gt;be agnostic to the input data&lt;/strong&gt;. Hence the goals are to deal with any
kind of data while trying to &lt;strong&gt;be as efficient as possible with numpy arrays&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;To illustrate the benefits and cost of the new persistence implementation, let’s
now compare a real life use case
(&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_lfw_people.html"&gt;LFW dataset from scikit-learn&lt;/a&gt;)
with different libraries:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Joblib, with 2 different versions,
0.9.4 and master (dev),&lt;/li&gt;
&lt;li&gt;Pickle&lt;/li&gt;
&lt;li&gt;Numpy&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="" class="large" src="https://gael-varoquaux.info/programming/attachments/persistence_lfw_bench.png" /&gt;
&lt;p&gt;The four first lines use non compressed persistence strategies, the last
four use persistence with zlib/gzip &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt; strategies. Code to reproduce the
benchmarks is available on this &lt;a class="reference external" href="https://gist.github.com/aabadie/2ba94d28d68f19f87eb8916a2238a97c"&gt;gist&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;⚫ &lt;strong&gt;Speed&lt;/strong&gt;: the results between joblib 0.9.4 and 0.10.0 (dev) are
similar whereas &lt;strong&gt;numpy and pickle are clearly slower than joblib&lt;/strong&gt; in both
compressed and non compressed cases.&lt;/p&gt;
&lt;p&gt;⚫ &lt;strong&gt;Memory consumption&lt;/strong&gt;: Without compression, old and
new joblib versions are the same; with compression, the new joblib version is
much better than the old one.
&lt;strong&gt;Joblib clearly outperforms pickle and numpy in terms of
memory consumption&lt;/strong&gt;. This can be explained by the fact that numpy relies on
pickle if the object is not a pure numpy array (a list or a dict with arrays for
example), so in this case it inherits the memory drawbacks from pickle. When
persisting pure numpy arrays (not tested here), numpy uses its internal save/load
functions which are efficient in terms of speed and memory consumption.&lt;/p&gt;
&lt;p&gt;⚫ &lt;strong&gt;Disk used&lt;/strong&gt;: results are as expected: non compressed files have
the same size as the in-memory data; compressed files are smaller.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Caveat Emptor: performance is data-dependent&lt;/p&gt;
&lt;p&gt;Different data compress more or less easily. Speed and disk used will
vary depending on the data. Key considerations are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Fraction of data in arrays&lt;/strong&gt;: joblib is efficient if much of the
data is contained in numpy arrays. The worst case scenario is
something like a large dictionary of random numbers as keys and
values.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Entropy of the data&lt;/strong&gt;: an array fully of zeros will compress well
and fast. A fully random array will compress slowly, and use a lot
of disk. Real data is often somewhere in the middle.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="extra-improvements-in-compressed-persistence"&gt;
&lt;h2&gt;Extra improvements in compressed persistence&lt;/h2&gt;
&lt;div class="section" id="new-compression-formats"&gt;
&lt;h3&gt;New compression formats&lt;/h3&gt;
&lt;p&gt;Joblib can use new compression formats based on Python standard library modules:
&lt;strong&gt;zlib, gzip, bz2, lzma and xz&lt;/strong&gt; (the last 2 are available for Python
greater than 3.3). &lt;strong&gt;The compressor is
selected automatically when the file name has an explicit extension&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.z&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# zlib&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.z&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# gzip&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.bz2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# bz2&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.bz2&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.lzma&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# lzma&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.lzma&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.xz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# xz&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.xz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;One can tune the compression level, setting the compressor explicitly:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.compressed&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;zlib&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.compressed&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;/tmp/test.compressed&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;compress&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;lzma&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.compressed&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;On loading, joblib uses the magic number of the file to determine the
right decompression method. This makes loading compressed pickle transparent:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.compressed&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
 &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.47006195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.5436392&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.1218267&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.48592789&lt;/span&gt;&lt;span class="p"&gt;]])]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Importantly, the generated compressed files use a &lt;strong&gt;standard
compression file format&lt;/strong&gt;: for instance, regular command line tools (zip/unzip,
gzip/gunzip, bzip2, lzma, xz) can be used to compress/uncompress a pickled file
generated with joblib. Joblib will be able to load cache compressed with those
tools.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Toward more and faster compression&lt;/p&gt;
&lt;p&gt;Specific compression strategies have been developped for fast
compression, sometimes even faster than disk reads such as &lt;a class="reference external" href="http://google.github.io/snappy/"&gt;snappy&lt;/a&gt; , &lt;a class="reference external" href="http://www.blosc.org/"&gt;blosc&lt;/a&gt;, LZO or LZ4. With a file-like interface, they should be
readily usable with joblib.&lt;/p&gt;
&lt;p&gt;In the benchmarks above, loading and dumping with compression is
slower than without (though only by a factor of 3 for loading). These
were done on a computer with an SSD, hence with very fast I/O. In a
situation with slower I/O, as &lt;strong&gt;on a network drive, compression could
save time&lt;/strong&gt;. With faster compressors, compression will save time on most
hardware.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="compressed-persistence-into-a-file-handle"&gt;
&lt;h3&gt;Compressed persistence into a file handle&lt;/h3&gt;
&lt;p&gt;Now that everything is stored in a
single file using standard compression formats, joblib can
persist in an &lt;a class="reference external" href="https://github.com/joblib/joblib/pull/351"&gt;open file handle&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;wb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
 &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.47006195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.5436392&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.1218267&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.48592789&lt;/span&gt;&lt;span class="p"&gt;]])]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This also works with compression file object available in the standard library,
like &lt;tt class="docutils literal"&gt;gzip.GzipFile&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;bz2.Bz2File&lt;/tt&gt; or &lt;tt class="docutils literal"&gt;lzma.LzmaFile&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;gzip&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GzipFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;wb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;    &lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gzip&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GzipFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;    &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
 &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.47006195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.5436392&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.1218267&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.48592789&lt;/span&gt;&lt;span class="p"&gt;]])]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Be sure that you use a decompressor matching the internal compression when
loading with the above method. If
unsure, simply use &lt;tt class="docutils literal"&gt;open&lt;/tt&gt;, joblib will &lt;strong&gt;select the right decompressor&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;/tmp/test.pkl.gz&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;rb&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt;     &lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;joblib&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
 &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt; &lt;span class="mf"&gt;0.47006195&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.5436392&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.1218267&lt;/span&gt; &lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="mf"&gt;0.48592789&lt;/span&gt;&lt;span class="p"&gt;]])]&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Towards dumping to elaborate stores&lt;/p&gt;
&lt;p&gt;Working with file handles opens the door to &lt;strong&gt;storing cache data in database blob or cloud
storage such as Amazon S3, Amazon Glacier and Google Cloud Storage&lt;/strong&gt;
(for instance via the Python package &lt;a class="reference external" href="https://github.com/boto/boto"&gt;boto&lt;/a&gt;).&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;A Pickle Subclass&lt;/strong&gt;: joblib relies on subclassing the Python Pickler/Unpickler
&lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;. These are state machines that walk the graph of nested objects (a
dict may contain a list, that may contain…), creating a string
representation of each object encountered. The new implementation
proceeds as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Pickling an arbitrary object&lt;/strong&gt;: when an &lt;tt class="docutils literal"&gt;np.ndarray&lt;/tt&gt; object is reached,
instead of using the default pickling functions (__reduce__()), the joblib
Pickler replaces in pickle stream the ndarray with a wrapper object containing
all important array metadata (shape, dtype, flags). Then it writes the array
content in the pickle file. Note that this step breaks the pickle
compatibility. One benefit is that it enables using fast code for
copyless handling of the numpy array. For compression, we pass chunks
of the data to a compressor object (using the buffer protocol to avoid
copies).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unpickling from a file&lt;/strong&gt;: when pickle reaches the array wrapper, as the
object is in the pickle stream, the file handle is at the
beginning of the array content. So at this point the Unpickler simply
constructs an array based on the metadata contained in the wrapper and then
fills the array buffer directly from the file. The object returned is the
reconstructed array, the array wrapper being dropped. A benefit is that
if the data is stored not compressed, &lt;strong&gt;the array can be directly memory
mapped from the storage&lt;/strong&gt; (the mmap_mode option of &lt;a class="reference external" href="https://pythonhosted.org/joblib/generated/joblib.load.html"&gt;joblib.load&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This technique allows joblib to pickle all objects in a single file but also to
have memory-efficient dump and load.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;A fast compression stream&lt;/strong&gt;: as the pickling refactoring opens the door
to file objects usage, joblib is now able to persist data in any kind of file
object: &lt;tt class="docutils literal"&gt;open&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;gzip.GzipFile&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;bz2.Bz2file&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;lzma.LzmaFile&lt;/tt&gt;. For
performance reason and usability, the new joblib version uses its own file
object &lt;tt class="docutils literal"&gt;BinaryZlibFile&lt;/tt&gt; for zlib compression. Compared to
&lt;tt class="docutils literal"&gt;GzipFile&lt;/tt&gt;, it disables crc computation, which bring a performance gain of 15%.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Speed penalties of on-the-fly writes&lt;/p&gt;
&lt;p&gt;There’s also a small speed difference with dict/list objects between new/old
joblib when using compression.
The old version pickles the data inside a &lt;tt class="docutils literal"&gt;io.BytesIO&lt;/tt&gt; buffer and then
compress it in a row whereas the new version write “on the fly” compressed
chunk of pickled data to the file.
Because of this internal buffer the old implementation is not memory safe as it
indeed copy the data in memory before compressing. The small speed difference
was judged acceptable compared to this memory duplication.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion-and-future-work"&gt;
&lt;h2&gt;Conclusion and future work&lt;/h2&gt;
&lt;p&gt;Memory copies were a limitation when caching on disk very large
numpy arrays, e.g arrays with a size close to the available RAM on the computer.
The problem was solved via intensive buffering and a lot of hacking on top of
pickle and numpy. Unfortunately, our strategy has poor performance with
big dictionaries or list compared to a &lt;tt class="docutils literal"&gt;cPickle&lt;/tt&gt;, hence try to use
numpy arrays in your internal data structures (note that something like
scipy sparse matrices works well, as it builds on arrays).&lt;/p&gt;
&lt;p&gt;For the future, maybe numpy’s pickle methods could be improved and make a
better use of &lt;a class="reference external" href="https://www.python.org/dev/peps/pep-3154/#bit-opcodes-for-large-objects"&gt;64-bit opcodes for large objects&lt;/a&gt;
that were introduced in Python recently.&lt;/p&gt;
&lt;p&gt;Pickling using file handles is a first step toward pickling in
sockets, enabling broadcasting of data between computing units
on a network. This will be priceless with &lt;a class="reference external" href="https://github.com/joblib/joblib/pull/325"&gt;joblib’s new distributed backends&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Other improvements will come from better compressor, making everything
faster.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;The pull request was implemented by &lt;a class="reference external" href="https://github.com/aabadie"&gt;&amp;#64;aabadie&lt;/a&gt;. He thanks &lt;a class="reference external" href="https://github.com/lesteve"&gt;&amp;#64;lesteve&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/ogrisel"&gt;&amp;#64;ogrisel&lt;/a&gt;
and &lt;a class="reference external" href="https://github.com/GaelVaroquaux"&gt;&amp;#64;GaelVaroquaux&lt;/a&gt; for the valuable
help, reviews and support.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;The load created by multiple files on the filesystem is
particularly detrimental for network filesystems, as it triggers
multiple requests and isn’t cache friendly.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;gzip is based on zlib with additional crc checks and a default
compression level of 3.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;A drawback of subclassing the Python Pickler/Unpickler is that it
is done for the pure-Python version, and not the “cPickle” version.
The latter is much faster when dealing with a large number of Python
objects. Once again, joblib is efficient when most of the data is
represented as numpy arrays or subclasses.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="joblib"></category><category term="persistence"></category><category term="big data"></category></entry><entry><title>Of software and Science. Reproducible science: what, why, and how</title><link href="https://gael-varoquaux.info/programming/of-software-and-science-reproducible-science-what-why-and-how.html" rel="alternate"></link><published>2015-12-16T00:00:00+01:00</published><updated>2015-12-16T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-12-16:/programming/of-software-and-science-reproducible-science-what-why-and-how.html</id><summary type="html">&lt;p&gt;At &lt;a class="reference external" href="mloss-2015-wising-up-to-building-open-source-machine-learning.html"&gt;MLOSS 15&lt;/a&gt; we
brainstormed on reproducible science, discussing &lt;strong&gt;why we care about
software in computer science&lt;/strong&gt;. Here is a summary blending &lt;a class="reference external" href="https://gist.github.com/GaelVaroquaux/33e7a7b297425890fefa"&gt;notes from
the discussions&lt;/a&gt; with my
opinion.&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
“Without engineering, science is not more than philosophy”
&amp;nbsp; &amp;nbsp; —  &amp;nbsp; &amp;nbsp;
&lt;a class="reference external" href="https://twitter.com/GaelVaroquaux/status/619767624654786560"&gt;the community&lt;/a&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do we enable better Science? Why do we do software …&lt;/strong&gt;&lt;/p&gt;</summary><content type="html">&lt;p&gt;At &lt;a class="reference external" href="mloss-2015-wising-up-to-building-open-source-machine-learning.html"&gt;MLOSS 15&lt;/a&gt; we
brainstormed on reproducible science, discussing &lt;strong&gt;why we care about
software in computer science&lt;/strong&gt;. Here is a summary blending &lt;a class="reference external" href="https://gist.github.com/GaelVaroquaux/33e7a7b297425890fefa"&gt;notes from
the discussions&lt;/a&gt; with my
opinion.&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
“Without engineering, science is not more than philosophy”
&amp;nbsp; &amp;nbsp; —  &amp;nbsp; &amp;nbsp;
&lt;a class="reference external" href="https://twitter.com/GaelVaroquaux/status/619767624654786560"&gt;the community&lt;/a&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;How do we enable better Science? Why do we do software in science?&lt;/strong&gt;
These are the questions that we were interested in.&lt;/p&gt;
&lt;div class="grey docutils container"&gt;
&lt;strong&gt;Improving reproducility of our scientific studies makes us more
efficient in the long run&lt;/strong&gt; to do good science: even inside a lab, new
research efforts build upon the previous work.&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="forms-of-reproducible-science-reproduction-replication-reuse"&gt;
&lt;h2&gt;Forms of reproducible science: reproduction, replication, &amp;amp; reuse&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://politicalsciencereplication.wordpress.com/2013/02/24/is-there-a-difference-between-replication-reproduction-and-re-analysis/"&gt;The classic concepts of reproducible science&lt;/a&gt;
are:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;: being able to rerun an experiment as it was run,
for instance by reanalysing data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Replicability&lt;/strong&gt;: being able to redo an experiment from scratch.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;em&gt;reproducible science&lt;/em&gt; movement argues sharing source code of
experiments is a need for &lt;em&gt;reproduction&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;For reproduction, fields like computer science (development of methods)
and biology (challenging data acquisition) have very different
constraints, with the complexity allocated differently between data and
code.&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
“Machine learning people use hugely complex algorithms on trivially
simple datasets. Biology does trivially simple algorithms on hugely
complex datasets.”
&amp;nbsp; &amp;nbsp; —  &amp;nbsp; &amp;nbsp;
&lt;em&gt;an MLOSS15 attendee&lt;/em&gt;&lt;/blockquote&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;We felt that computer science needed an additional notion, complementing
replication and reproduction:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Reusability&lt;/strong&gt;: applying the process to a new yet similar question.
For instance for a paper contributing data analysis method, applying it
to new data.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="align-right docutils container"&gt;
Reusability is more valuable than reproducibility.&lt;/div&gt;
&lt;p&gt;Reproducibility without reusability in method development may hinder the
advancement of science as it pushes people to do all the same
things, &lt;em&gt;eg&lt;/em&gt; always running experiments on the same data.&lt;/p&gt;
&lt;p&gt;Reusability enables results that the original investigator did not have in
mind. It implies that the experimental protocol extends further than the
exact scope of the question initially asked. For software development, it
is also harder, as it implies more robustness and flexibility.&lt;/p&gt;
&lt;p&gt;Finally sharing source code is not enough: &lt;strong&gt;readability&lt;/strong&gt; of the code is
necessary.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="roadblocks-to-reproducible-science"&gt;
&lt;h2&gt;Roadblocks to reproducible science&lt;/h2&gt;
&lt;div class="section" id="man-power"&gt;
&lt;h3&gt;Man power&lt;/h3&gt;
&lt;p&gt;Reusability, readability, support of released code, all actually take a
lot of time, even though it is seldom acknowledged in talks about
reproducible science. Given a fixed man power, it is impossible to
achieve reusability and high quality for everything.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="computing-power"&gt;
&lt;h3&gt;Computing power&lt;/h3&gt;
&lt;p&gt;Some numerical experiments or complex data analysis require weeks of
cluster to run. These will be much harder to reproduce. Also, rerunning
an analysis from scratch on a regular basis is a good recipe to achieve a
robust path from data to results. The more computing power is a limiting
resource, the more likely it is that a glitch is not detected.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="data-availability"&gt;
&lt;h3&gt;Data availability&lt;/h3&gt;
&lt;p&gt;No access, or restricted access, to data is a show stopper for
reproducibility. Data sharing requirements are becoming common –from
funding agencies, or journals. However, privacy concerns, or confidential
information get in the way of making data public, for instance in medical
research or micro-economy. Often, these concerns serve as a pretext
to people who actually do not want to relinquish &lt;em&gt;control&lt;/em&gt; &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;A related post by Deevy Bishop: &lt;a class="reference external" href="http://deevybee.blogspot.co.uk/2015/11/whos-afraid-of-open-data.html?m=1"&gt;Who’s afraid of open data&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="section" id="incentives-problem"&gt;
&lt;h3&gt;Incentives problem&lt;/h3&gt;
&lt;p&gt;Fancy new results are what matters for success in academia. “High impact”
journals such as Nature or Science accept papers that amaze and impress,
often with subpar inspection of the materials and methods &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;. The rate of
publication in many leading groups is incompatible with consolidation
efforts required for strong reproducibility.&lt;/p&gt;
&lt;p&gt;On the other hand, it is hard to tell beforehand if a new idea is a good
one. Hence letting imagination forward to foster impossible and
improbable ideas is a good path to innovation. The underlying questions
are: What are the best community rules for the advancement of knowledge?
What do we want from the way science moves forward? Rapid publication of
many incremental ideas, &lt;em&gt;eg&lt;/em&gt; at a conference, gives food for thoughts,
possibly at the sake of reproducibility.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;“Science, Nature and Cell, had a higher rate of retractions” –
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Invalid_science"&gt;Wikipedia: Invalid science&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="how-to-improve-the-situation"&gt;
&lt;h2&gt;How to improve the situation&lt;/h2&gt;
&lt;div class="section" id="docker-containers-and-virtual-machines"&gt;
&lt;h3&gt;Docker, containers, and virtual machines&lt;/h3&gt;
&lt;p&gt;Docker, or other virtual machine technologies, enable shipping a software
environment. It diminishes the challenges of building software and
setting up an analysis. Virtual machines are used as a way to avoid
software packaging issues. This seems to me as a plaster on a wooden leg.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Containers give easy reproduction, to the cost of hard
replication and reuse.&lt;/div&gt;
&lt;p&gt;Indeed, an analysis that lives in a box can be reproduced, but can it be
understood, modified, or applied to new data? New science is likely going
to come from modifying this analysis, or combining it with other tools,
or new data. If these other tools live in a different virtual machine,
the combination will be challenging.&lt;/p&gt;
&lt;p&gt;In addition, people are using containers as an excuse to avoid tackling
the need for proper documentation of requirements, and the process to set
them up. They sometimes even try justify binary blobs &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;. This is
wrong. An analysis should be runnable without requiring the stars to
align, and it should be understandable.&lt;/p&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;See also Titus Brown’s post: &lt;a class="reference external" href="http://ivory.idyll.org/blog/2014-containers.html"&gt;The post-apocalyptic world of binary
containers&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="section" id="version-control-wear-your-seatbelt"&gt;
&lt;h3&gt;Version control: wear your seatbelt&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control"&gt;Version control&lt;/a&gt;
is like a time machine: if used with regular commits, it enables rolling
back to any point in time. For my work, it’s always been a crucial aspect
to reproducing what me or my students did a while ago. I often meet
researchers that feel they lack time to learn it. I really cannot support
this position. &lt;a class="reference external" href="http://try.github.io"&gt;http://try.github.io&lt;/a&gt; is an easy way to learn version
control.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Hint&lt;/em&gt;: use a “tag” to pin-point a position in the history that you might
want to repeat, such as making a figure or the publication of an article.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="sotware-libraries-curated-and-maintained"&gt;
&lt;h3&gt;Sotware libraries, curated and maintained&lt;/h3&gt;
&lt;p&gt;Consolidating an analysis pipeline, a standard visualization, or any
computational aspect of a paper into a software library is a sure way to
make the paper more reproducible. It will also make the steps reusable,
and a replication easier. If continued effort is put in the library,
chances are that computational efficiency will improve over time, thus
helping in the long run with the challenge of computing power.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Tough choices: not every variant of an analysis can be forever
reproducible.&lt;/div&gt;
&lt;p&gt;Maintaining the library will ensure that results are still reproducible
on new hardware, or with evolution of the general software stack (a new
Python or Matlab release, for instance). Documentation and curated
examples will lower the bar to reuse and facilitate replication of the
original scientific results.&lt;/p&gt;
&lt;p&gt;To avoid feature creep and technical debt, a library calls for focused
efforts on selecting the most important operations.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="datasets-serving-as-model-experiments-tractable-and-open"&gt;
&lt;h3&gt;Datasets, serving as model experiments, tractable and open&lt;/h3&gt;
&lt;p&gt;Sometimes researchers create a toy data, with a well-posed question, that
is curated and open, small enough to be tractable yet large enough to be
relevant to the application field. This is an invaluable service to the
field. One example is the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Netflix_Prize"&gt;netflix prize&lt;/a&gt; in machine learning,
which led to a standard dataset. Unfortunately, the dataset was taken
down some years later due to copyright concerns. But it has been
replaced, &lt;em&gt;eg&lt;/em&gt; by the &lt;a class="reference external" href="http://grouplens.org/datasets/movielens/"&gt;movielens dataset&lt;/a&gt;. For computer vision, a
series of datasets –&lt;a class="reference external" href="http://www.vision.caltech.edu/Image_Datasets/Caltech101/"&gt;Caltech101&lt;/a&gt;, &lt;a class="reference external" href="https://www.cs.toronto.edu/~kriz/cifar.html"&gt;CIFAR&lt;/a&gt;, &lt;a class="reference external" href="http://www.image-net.org/"&gt;ImageNet&lt;/a&gt;…– have led to continuous progress of the
field. In bioinformatics, standard data are regularly created, for
instance by the &lt;a class="reference external" href="http://dreamchallenges.org/"&gt;DREAM challenges&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These reference open datasets serve as benchmarks and therefore foster
competition. They also define a canonical experiment, helping a wider
scientific community understand the questions that they ask. Ultimately,
they result in better software tools to solve the problem at hand, as
this problem becomes a standard example and application of tools.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Sage_Bionetworks"&gt;Sage bionetworks&lt;/a&gt;, for
instance, is a non-profit that collects and make biomedical data
available. These people believe, as I do, that such data will lead to
better medical care.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="changing-incentives-setting-the-right-goals"&gt;
&lt;h3&gt;Changing incentives: setting the right goals&lt;/h3&gt;
&lt;p&gt;Making sustainable, quality scientific work that facilitates reproduction
needs to be a clearly-visible benefit to researchers, young and senior.
Such contributions should help them get jobs and grants.&lt;/p&gt;
&lt;p&gt;An unsophisticated publication count is the basis of scientific
evaluation. We need to accept publications about data, software, and
replication of prior work in high-quality journals. They need to be
strictly reviewed, to establish high standards on these contributions.
This change is happening. &lt;a class="reference external" href="http://www.gigasciencejournal.com/"&gt;Gigascience&lt;/a&gt;, amongst other venues, publishes
data. The &lt;a class="reference external" href="http://jmlr.org/mloss/"&gt;MLOSS (machine learning open source software) track&lt;/a&gt; of the JMLR (journal of machine learning
research) publishes software, with a tough review on the software quality
of the project.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Researchers should cite the software they use.&lt;/div&gt;
&lt;p&gt;Yet software is still often under cited: many will use a software
implementing a method, and only cite the original paper that proposed the
method. Another remaining challenge is: how to give credit for continuing
development and maintenance.&lt;/p&gt;
&lt;p&gt;Fast-paced science is probably useful even if fragile. But the difference
between a quick proof of concept and solid, reproducible and reusable
work needs to be acknowledged. It is important to select for publication
not only impressive results, but also sound reusable material and
methods. The latter are the foundation of future scientific developments,
but high-impact journals tend to focus on the former.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Related posts&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="software-for-reproducible-science-lets-not-have-a-misunderstanding.html"&gt;Software for reproducible science: let’s not have a misunderstanding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="mloss-2015-wising-up-to-building-open-source-machine-learning.html"&gt;MLOSS 2015: wising up to building open-source machine learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="reproducible research"></category><category term="science"></category><category term="software"></category><category term="machine learning"></category><category term="scientific software"></category></entry><entry><title>Nilearn 0.2: more powerful machine learning for neuroimaging</title><link href="https://gael-varoquaux.info/programming/nilearn-02-more-powerful-machine-learning-for-neuroimaging.html" rel="alternate"></link><published>2015-12-13T00:00:00+01:00</published><updated>2015-12-13T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-12-13:/programming/nilearn-02-more-powerful-machine-learning-for-neuroimaging.html</id><summary type="html">&lt;div class="small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;Nilearn’s goals&lt;/p&gt;
&lt;p class="last"&gt;Make advanced machine learning techniques easy for neuroimaging
research.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;After 6 months of efforts, We just released version 0.2 of &lt;a class="reference external" href="http://nilearn.github.io"&gt;nilearn&lt;/a&gt;, dedicated to making &lt;strong&gt;machine learning in
neuroimaging easier and more powerful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This release integrates the features of the &lt;a class="reference external" href="nilearn_july_2015_sprint.html"&gt;july sprint&lt;/a&gt;, and &lt;a class="reference external" href="http://nilearn.github.io/whats_new.html"&gt;more&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="highlights"&gt;
&lt;h2&gt;Highlights&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Better documentation …&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;Nilearn’s goals&lt;/p&gt;
&lt;p class="last"&gt;Make advanced machine learning techniques easy for neuroimaging
research.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;After 6 months of efforts, We just released version 0.2 of &lt;a class="reference external" href="http://nilearn.github.io"&gt;nilearn&lt;/a&gt;, dedicated to making &lt;strong&gt;machine learning in
neuroimaging easier and more powerful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This release integrates the features of the &lt;a class="reference external" href="nilearn_july_2015_sprint.html"&gt;july sprint&lt;/a&gt;, and &lt;a class="reference external" href="http://nilearn.github.io/whats_new.html"&gt;more&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="highlights"&gt;
&lt;h2&gt;Highlights&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Better documentation with narrative examples&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The example can now be broken down into blocks (as &lt;a class="reference external" href="http://nilearn.github.io/auto_examples/connectivity/plot_signal_extraction.html#sphx-glr-auto-examples-connectivity-plot-signal-extraction-py"&gt;here&lt;/a&gt;)
for a better narration (thanks to &lt;a class="reference external" href="http://sphinx-gallery.readthedocs.org/en/latest/"&gt;sphinx-gallery&lt;/a&gt;).&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="figure align-right"&gt;
&lt;a class="reference external image-reference" href="http://nilearn.github.io/auto_examples/decoding/plot_mixed_gambles_space_net.html"&gt;&lt;img alt="" src="http://nilearn.github.io/_images/sphx_glr_plot_mixed_gambles_space_net_001.png" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Space net: spatial regularizations in decoding&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="http://nilearn.github.io/decoding/space_net.html"&gt;“SpaceNet” decoder&lt;/a&gt; does spatial
regularizations such as TV-l1 or Graph-Net to identify predictive regions
in decoding.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="figure align-right"&gt;
&lt;a class="reference external image-reference" href="http://nilearn.github.io/auto_examples/connectivity/plot_compare_resting_state_decomposition.html"&gt;&lt;img alt="" src="http://nilearn.github.io/_images/sphx_glr_plot_compare_resting_state_decomposition_002.png" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Dictionnary learning for resting-state parcellations&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Dictionnary learning is a &lt;a class="reference external" href="http://nilearn.github.io/connectivity/resting_state_networks.html#beyond-ica-dictionary-learning"&gt;promising alternative to ICA to learn networks&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="figure align-right"&gt;
&lt;a class="reference external image-reference" href="http://nilearn.github.io/auto_examples/manipulating_visualizing/plot_prob_atlas.html#sphx-glr-auto-examples-manipulating-visualizing-plot-prob-atlas-py"&gt;&lt;img alt="" src="http://nilearn.github.io/_images/sphx_glr_plot_prob_atlas_003.png" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Plotting sets of probabilistic maps&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With &lt;a class="reference external" href="http://nilearn.github.io/manipulating_visualizing/plotting.html#different-plotting-functions"&gt;a simple function&lt;/a&gt;,
you can plot outlines for multiple maps.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="figure align-right"&gt;
&lt;a class="reference external image-reference" href="http://nilearn.github.io/auto_examples/manipulating_visualizing/plot_extract_rois_statistical_maps.html"&gt;&lt;img alt="" src="http://nilearn.github.io/_images/sphx_glr_plot_extract_rois_statistical_maps_003.png" /&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Separating regions out of maps&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We have a set of functions to &lt;a class="reference external" href="http://nilearn.github.io/auto_examples/manipulating_visualizing/plot_extract_rois_statistical_maps.html"&gt;separate regions on maps&lt;/a&gt; or &lt;a class="reference external" href="http://nilearn.github.io/auto_examples/connectivity/plot_extract_regions_canica_maps.html"&gt;turn networks into a probabilistic parcellation&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;strong&gt;Classification on connectomes&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We now have advanced connectivity measures to do &lt;a class="reference external" href="http://nilearn.github.io/auto_examples/connectivity/plot_connectivity_measures.html"&gt;comparisons across
connectomes for classification&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;Thanks&lt;/p&gt;
&lt;p&gt;Thanks to Alexandre Abraham who lead the effort, and &lt;a class="reference external" href="http://nilearn.github.io/whats_new.html#contributors"&gt;all the
contributors&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="neuroimaging"></category><category term="python"></category><category term="scientific computing"></category><category term="scipy"></category></entry><entry><title>MLOSS 2015: wising up to building open-source machine learning</title><link href="https://gael-varoquaux.info/programming/mloss-2015-wising-up-to-building-open-source-machine-learning.html" rel="alternate"></link><published>2015-11-28T00:00:00+01:00</published><updated>2015-11-28T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-11-28:/programming/mloss-2015-wising-up-to-building-open-source-machine-learning.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The 2015 edition of the machine learning open
source software (MLOSS) workshop was full of very mature discussions
that I strive to report here.&lt;/em&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;em&gt;I give links to the videos. Some machine-learning researchers have
great thoughts about growing communities of coders, about code as a
process and a deliverable …&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The 2015 edition of the machine learning open
source software (MLOSS) workshop was full of very mature discussions
that I strive to report here.&lt;/em&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;em&gt;I give links to the videos. Some machine-learning researchers have
great thoughts about growing communities of coders, about code as a
process and a deliverable.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I was a co-organizer of the &lt;a class="reference external" href="https://mloss.org/workshop/icml15/"&gt;MLOSS 2015 workshop&lt;/a&gt;, held during &lt;a class="reference external" href="http://icml.cc/2015/"&gt;ICML 2015&lt;/a&gt;. As I have finally figured out where the
videos are, now is a good time to summarize my impressions on the
workshop.&lt;/p&gt;
&lt;img alt="" src="attachments/mloss/mloss_t_shirt_white.png" style="width: 100%;" /&gt;
&lt;div class="section" id="online-videos-of-the-talks"&gt;
&lt;h2&gt;Online videos of the talks&lt;/h2&gt;
&lt;div class="small sidebar"&gt;
&lt;p class="first sidebar-title"&gt;Graphics &amp;amp; T-shirts&lt;/p&gt;
&lt;p&gt;The graphics were printed on T-shirts. We ran out, but the material is
&lt;a class="reference external" href="attachments/mloss/mloss_t_shirt_graphics.zip"&gt;here&lt;/a&gt; for you to
print.&lt;/p&gt;
&lt;p class="last"&gt;&lt;em&gt;Anyone wants to help making an online T-shirt ordering?&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The videos of all the talks are online:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/4216268dc28148c89d8b6e4eba1ad6e51d"&gt;Python and Parallelism or Dask&lt;/a&gt;
by &lt;em&gt;Matthew Rocklin&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/afe6f76b3bb1452790fc8982e28112641d"&gt;Collaborative filtering via matrix decomposition in mlpack&lt;/a&gt;
by &lt;em&gt;Ryan Curtin&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/9cd947554ddf404b9a40ca2601e44b4c1d"&gt;BLOG: a probabilistic programming language for open-universe contingent
Bayesian networks&lt;/a&gt;
by &lt;em&gt;Yi Wu&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/45c3bb312a37491dbce1af25f1aeba001d"&gt;Spotlights&lt;/a&gt;:&lt;ul&gt;
&lt;li&gt;Nilearn, machine learning for neuroimaging in Python (Alexandre
Abraham)&lt;/li&gt;
&lt;li&gt;KeLP: a Kernel-based Learning Platform in Java (Simone Filice)&lt;/li&gt;
&lt;li&gt;DiffSharp: Automatic Differentiation Library (Atılım Güneş Baydin)&lt;/li&gt;
&lt;li&gt;The FAST toolkit for Unsupervised Learning of HMMs (José P.
González-Brenes)&lt;/li&gt;
&lt;li&gt;OpenML: a Networked Science Platform for Machine Learning (Joaquin
Vanschoren)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/2529ebcb20794942874d5c277c5dcc981d"&gt;Julia’s Approach to Open Source Machine Learning&lt;/a&gt;
by &lt;em&gt;John Myles White&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/da4f7869f07745f7bbc5a2e5f31761b61d"&gt;Do it yourself deep learning with the Caffe community&lt;/a&gt;
by &lt;em&gt;Evan Shelhamer&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/2bc15b283f324784a945d79d9a06c76c1d"&gt;From flop to success in academic software development&lt;/a&gt;
by &lt;em&gt;Gaël Varoquaux&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="mloss-a-maturing-community"&gt;
&lt;h2&gt;MLOSS: a maturing community&lt;/h2&gt;
&lt;!-- Say that I was not enthousiastic, originaly, and say why (typical
flaws of academic software) --&gt;
&lt;p&gt;When Antti Honkela and Cheng Soon Ong approached me to co-organize an
MLOSS workshop, I felt that it was important to do it for the sake of
open source scientific software. But it didn’t feel very enthousiastic
about the event or the talks themselves. Boy I was wrong.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Huge attendance: open-source ML software is now mainstream.&lt;/div&gt;
&lt;p&gt;My first MLOSS workshop was at the ICML 2011 conference, in Haifa. The
workshop was in a tiny cramped room, with a couple of dozens of geeks,
and it felt like a clique of people on the side of the conference. This
year, we had a huge room and more than 200 people showed up.&lt;/p&gt;
&lt;p&gt;I am used to talks being about a grad student or young researcher that
has whiped the code of a paper on the web, with an open license but no
vision. This year, people were presenting actual projects, with long-term
goals and the desire to solve a problem large than their latest research.
It might explain why the attendance was huge: people came because talks
might genuinely help them.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;With Cheng and Antti, we had choosen as a theme &lt;em&gt;“open ecosystems”&lt;/em&gt;,
because ecosystems are the key to scaling computing and science. Between
us, imposing a theme on a workshop is something challenging, as people
submit abstracts, good or bad, and one has to compose with what one has.
However, at lot of talks mentioned how the projects slot in a wider
picture, and interact with a community. For instance, Evan attributes
part of the success of Cafe to the &lt;a class="reference external" href="https://github.com/BVLC/caffe/wiki/Model-Zoo"&gt;“Model Zoo”&lt;/a&gt; in which the community
contributes fitted models. At the other end of the spectrum, OpenML is a
full online project with the goal to foster collaboration and comparison.
Project developers have shown in their talk that they are very conscious
of other projects that might be used together with their’s.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="accepting-the-sustainability-challenges"&gt;
&lt;h2&gt;Accepting the sustainability challenges&lt;/h2&gt;
&lt;p&gt;Over the time, I have gradually realized the importance of community
building, &lt;em&gt;ie&lt;/em&gt; project management and goal setting, more than technical
virtuosity. Historically, the scientific culture of code has put the
emphasis on the genius ideas behind the code, and the craftsmanship of
the implementation, to the cost of sustainability.&lt;/p&gt;
&lt;div class="align-right docutils container"&gt;
Alone, I go fast. Together, we go far.&lt;/div&gt;
&lt;p&gt;I was surprised to see that the MLOSS community was growing very aware of
mechanisms of long-term project life, in particular the human factors.&lt;/p&gt;
&lt;p&gt;I was asked by my coorganizers to give &lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/2bc15b283f324784a945d79d9a06c76c1d"&gt;a talk on factors of success of
open source scientific software&lt;/a&gt;.
I touched upon &lt;strong&gt;software engineering&lt;/strong&gt;, &lt;strong&gt;project vision&lt;/strong&gt;,
&lt;strong&gt;licensing&lt;/strong&gt;, &lt;strong&gt;governance&lt;/strong&gt;, &lt;strong&gt;community building&lt;/strong&gt;. All these topics
deemed &lt;em&gt;“non scientific”&lt;/em&gt; and thus so often despised and left out. I was
astonished to find out that the talks before me were giving very good
advice on these. I found that I only had to summarize and comment what
had been said before. This evolution of the scientific community makes me
very hopeful for the future.&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
&lt;p&gt;Every line of code you write is dept. You should be ashamed of every line
of code you have written. […]&lt;/p&gt;
&lt;p&gt;You have a supply of labor. These are the people who are contributors
[…].
The people who are users and not contributors are actually a source of
demand […] they mostly consume sources of labor rather than produce it.
&amp;nbsp; &amp;nbsp; &amp;nbsp; —  &amp;nbsp; &amp;nbsp;
&lt;a class="reference external" href="http://k4webcast.mediasite.com/Mediasite/Play/2529ebcb20794942874d5c277c5dcc981d"&gt;John Myles White&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Thanks to our sponsors&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.facebook.com"&gt;Facebook&lt;/a&gt; and &lt;a class="reference external" href="http://www.continuum.io"&gt;continuum&lt;/a&gt; sponsored the trip for our keynote
speakers. Thank you very much, the keynotes were great!&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="http://www.datascience-paris-saclay.fr/"&gt;Paris-Saclay Center for Data Science (CDS)&lt;/a&gt; gave us our main operating
fund, which is critical for organizing an event. In general, I must
say that the CDS has been hugely supportive of open source data
science in the Paris area, having a significant impact on training as
well as development.&lt;/p&gt;
&lt;p&gt;And also, I must acknowledge support from &lt;a class="reference external" href="http://http://www.inria.fr/"&gt;Inria&lt;/a&gt; for the accounting and administration
of the event.&lt;/p&gt;
&lt;p&gt;Finally, &lt;strong&gt;our reviewers were amazing&lt;/strong&gt;. Most of them reviewed the
project, ie its code, its documentation, its support. They arose above
the typical petty fights that we see in academia and focused on what
the project was bringing to the scientific community. Often there
reviews were longer and with more information than the abstract
submitted.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Related posts&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="software-for-reproducible-science-lets-not-have-a-misunderstanding.html"&gt;Software for reproducible science: let’s not have a misunderstanding&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="../science/publishing-scientific-software-matters.html"&gt;Publishing scientific software matters&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="conferences"></category><category term="science"></category><category term="software"></category><category term="machine learning"></category><category term="reproducible research"></category><category term="scientific software"></category></entry><entry><title>Nilearn sprint: hacking neuroimaging machine learning</title><link href="https://gael-varoquaux.info/programming/nilearn-sprint-hacking-neuroimaging-machine-learning.html" rel="alternate"></link><published>2015-08-04T00:00:00+02:00</published><updated>2015-08-04T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-08-04:/programming/nilearn-sprint-hacking-neuroimaging-machine-learning.html</id><summary type="html">&lt;p&gt;A couple of weeks ago, we had in Paris the second international &lt;a class="reference external" href="http://nilearn.github.io"&gt;nilearn&lt;/a&gt; sprint, dedicated to making &lt;strong&gt;machine learning
in neuroimaging easier and more powerful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;It was such a fantastic experience, as nilearn is really shaping up as a
simple yet powerful tool, and there is a lot of enthusiasm …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A couple of weeks ago, we had in Paris the second international &lt;a class="reference external" href="http://nilearn.github.io"&gt;nilearn&lt;/a&gt; sprint, dedicated to making &lt;strong&gt;machine learning
in neuroimaging easier and more powerful&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;It was such a fantastic experience, as nilearn is really shaping up as a
simple yet powerful tool, and there is a lot of enthusiasm. For me, this
sprint is a turning point, as I could see people other than the original
core team (that spanned out of &lt;a class="reference external" href="https://team.inria.fr/parietal/"&gt;our research team&lt;/a&gt;) excited about the project’s future.
Thank you to all who came:&lt;/p&gt;
&lt;ul class="columns simple"&gt;
&lt;li&gt;Ahmed Kanaan&lt;/li&gt;
&lt;li&gt;Andres Hoyos Idrobo&lt;/li&gt;
&lt;li&gt;Alexandre Abraham&lt;/li&gt;
&lt;li&gt;Arthur Mensch&lt;/li&gt;
&lt;li&gt;Ben Cipolli (remote)&lt;/li&gt;
&lt;li&gt;Bertrand Thirion&lt;/li&gt;
&lt;li&gt;Chris Filo Gorgolewski&lt;/li&gt;
&lt;li&gt;Danilo Bzdok&lt;/li&gt;
&lt;li&gt;Elvis Dohmatob&lt;/li&gt;
&lt;li&gt;Julia Hutenburg&lt;/li&gt;
&lt;li&gt;Kamalaker Dadi&lt;/li&gt;
&lt;li&gt;Loic Esteve&lt;/li&gt;
&lt;li&gt;Martin Perez&lt;/li&gt;
&lt;li&gt;Michael Hanke&lt;/li&gt;
&lt;li&gt;Oscar Nájera, working on
&lt;a class="reference external" href="http://sphinx-gallery.readthedocs.org/"&gt;sphinx-gallery&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="" src="attachments/nilearn_july_2015_sprint/nilearn_sprint.jpg" style="width: 100%;" /&gt;
&lt;p&gt;The sprint was a joint sprint with the &lt;a class="reference external" href="http://martinos.org/mne/stable/mne-python.html"&gt;MNE-Python&lt;/a&gt; team, that makes MEG
processing awesome. We also need to thank &lt;a class="reference external" href="http://alexandre.gramfort.net"&gt;Alex Gramfort&lt;/a&gt;, who did most of the work to set up the
sprint, as well as &lt;a class="reference external" href="https://www.universite-paris-saclay.fr/en/research/project/lidex-neurosaclay"&gt;NeuroSaclay&lt;/a&gt;
for funding, and &lt;a class="reference external" href="http://lapaillasse.org/"&gt;La paillasse&lt;/a&gt;, &lt;a class="reference external" href="http://www.telecom-paristech.fr"&gt;Telecom&lt;/a&gt;, and &lt;a class="reference external" href="http://www.inria.fr/en/centre/saclay"&gt;INRIA&lt;/a&gt; for hosting.&lt;/p&gt;
&lt;div class="section" id="highlights-of-the-sprints-results"&gt;
&lt;h2&gt;Highlights of the sprints results&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Plotting of multiple maps&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external image-reference" href="https://circle-artifacts.com/gh/nilearn/nilearn/128/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/connectivity/plot_canica_resting_state.html"&gt;&lt;img alt="" class="align-right" src="attachments/nilearn_july_2015_sprint/plot_canica_resting_state_001.png" style="width: 200px;" /&gt;&lt;/a&gt;
&lt;p&gt;A function to visualize overlays of various maps, eg for a
probabilistic atlas, with defaults that try to adapt to the number of
maps (see the &lt;a class="reference external" href="https://circle-artifacts.com/gh/nilearn/nilearn/128/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/manipulating_visualizing/plot_prob_atlas.html"&gt;example&lt;/a&gt;).
It’s very useful for example for &lt;a class="reference external" href="https://circle-artifacts.com/gh/nilearn/nilearn/128/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/connectivity/plot_canica_resting_state.html"&gt;easy visualizing of ICA components&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Sign of activation in glass brain&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external image-reference" href="https://circle-artifacts.com/gh/nilearn/nilearn/287/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/manipulating_visualizing/plot_demo_glass_brain_extensive.html"&gt;&lt;img alt="" class="align-right" src="attachments/nilearn_july_2015_sprint/plot_demo_glass_brain_extensive_005.png" style="width: 200px;" /&gt;&lt;/a&gt;
&lt;p&gt;Our glass brain plotting was greatly improved adding amongst other
things the option to capture the sign of the activation in the color
(see this &lt;a class="reference external" href="https://circle-artifacts.com/gh/nilearn/nilearn/287/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/manipulating_visualizing/plot_demo_glass_brain_extensive.html"&gt;example&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Spatially-regularized decoder&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external image-reference" href="https://circle-artifacts.com/gh/nilearn/nilearn/287/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/decoding/plot_haxby_space_net.html"&gt;&lt;img alt="" class="align-right" src="attachments/nilearn_july_2015_sprint/plot_haxby_space_net_002.png" style="width: 200px;" /&gt;&lt;/a&gt;
&lt;p&gt;Decoders based on GraphNet and total variation have finally landed in
nilearn. This has required a lot of work to get fast convergence and
robust parameter selection. At the end of the day, it is much slower
than an SVM, but the maps look splendid
(see this &lt;a class="reference external" href="https://circle-artifacts.com/gh/nilearn/nilearn/287/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/decoding/plot_haxby_space_net.html"&gt;example&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Sparse dictionary learning&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external image-reference" href="https://circle-artifacts.com/gh/nilearn/nilearn/282/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/connectivity/plot_dict_learning_resting_state.html"&gt;&lt;img alt="" class="align-right" src="attachments/nilearn_july_2015_sprint/plot_dict_learning_resting_state_001.png" style="width: 200px;" /&gt;&lt;/a&gt;
&lt;p&gt;We have almost merged sparse dictionnary learning as a alternative to ICA.
Experience shows that on resting-state data, it gives more contrasted
segmentation of networks
(see this &lt;a class="reference external" href="https://circle-artifacts.com/gh/nilearn/nilearn/282/artifacts/0/home/ubuntu/nilearn/doc/_build/html/auto_examples/connectivity/plot_dict_learning_resting_state.html"&gt;example&lt;/a&gt;).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;New installation docs&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
New webpage layout using tabs to display only the installation
instruction relevant to the OS of the user (see &lt;a class="reference external" href="https://circle-artifacts.com/gh/nilearn/nilearn/287/artifacts/0/home/ubuntu/nilearn/doc/_build/html/introduction.html#installation"&gt;here&lt;/a&gt;).
The results are more compact and more clear instructions, that I hope
will make our users’ life easier.&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;CircleCI integration&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
We now use &lt;a class="reference external" href="https://circleci.com/gh/nilearn/nilearn"&gt;CircleCI&lt;/a&gt; to
run the examples and build the docs. This is challenging because our
examples are real cases of neuroimaging data analysis, and thus require
heavy datasets and computing horse power.&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Neurodebian packaging&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
There are now &lt;a class="reference external" href="http://neuro.debian.net/pkgs/python-nilearn.html"&gt;neurodebian packages&lt;/a&gt; for nilearn.&lt;/blockquote&gt;
&lt;p&gt;And much more!&lt;/p&gt;
&lt;div class="admonition warning"&gt;
&lt;p class="first admonition-title"&gt;Warning&lt;/p&gt;
&lt;p class="last"&gt;Features listed above are &lt;strong&gt;not&lt;/strong&gt; in the released version of nilearn.
You need to wait a month or so.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="neuroimaging"></category><category term="python"></category><category term="scientific computing"></category><category term="scipy"></category></entry><entry><title>Software for reproducible science: let’s not have a misunderstanding</title><link href="https://gael-varoquaux.info/programming/software-for-reproducible-science-lets-not-have-a-misunderstanding.html" rel="alternate"></link><published>2015-05-18T00:00:00+02:00</published><updated>2015-05-18T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-05-18:/programming/software-for-reproducible-science-lets-not-have-a-misunderstanding.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;tl;dr:&lt;/strong&gt; &amp;nbsp; &lt;em&gt;Reproducibilty is a noble cause and scientific
software a promising vessel. But excess of reproducibility can be at
odds with the housekeeping required for good software engineering.
Code that “just works” should not be taken for granted.&lt;/em&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;em&gt;This post advocates for a progressive consolidation effort of
scientific …&lt;/em&gt;&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;tl;dr:&lt;/strong&gt; &amp;nbsp; &lt;em&gt;Reproducibilty is a noble cause and scientific
software a promising vessel. But excess of reproducibility can be at
odds with the housekeeping required for good software engineering.
Code that “just works” should not be taken for granted.&lt;/em&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;em&gt;This post advocates for a progressive consolidation effort of
scientific code, rather than putting too high a bar on code release.&lt;/em&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a class="reference external" href="http://ivory.idyll.org/blog/"&gt;Titus Brown&lt;/a&gt; recently shared &lt;a class="reference external" href="http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html"&gt;an
interesting war story&lt;/a&gt;
in which a reviewer refuses to review a paper until he can run the code
on his own files. Titus’s comment boils down to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;blockquote class="epigraph"&gt;
&lt;a class="reference external" href="http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html"&gt;“Please destroy this software after publication”&lt;/a&gt;.&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Reproducible science: Does the emperor have clothes?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;In other words, code for a publication is often not reusable. This
point of view is very interesting from someone like Titus, who is a
&lt;a class="reference external" href="http://ivory.idyll.org/blog/a-conversation-on-reproducibility.html"&gt;vocal proponent&lt;/a&gt; of
reproducible science. His words triggered some surprises, which led Titus
to wonder if &lt;a class="reference external" href="http://ivory.idyll.org/blog/2015-we-live-in-a-bubble.html"&gt;some of the reproducible science crowd folks live in a
bubble&lt;/a&gt;. I
was happy to see &lt;a class="reference external" href="https://twitter.com/ctitusbrown/status/589171853031186434"&gt;the discussion&lt;/a&gt; unroll, as
I think that there is a strong risk of creating a bubble around
reproducible science. Such a bubble will backfire.&lt;/p&gt;
&lt;!-- Let me share my point of view on software for reproducible science. --&gt;
&lt;div class="section" id="replication-is-a-must-for-science-and-society"&gt;
&lt;h2&gt;Replication is a must for science and society&lt;/h2&gt;
&lt;p&gt;Science advances by accumulating knowledge built upon
observations. It’s easy to forget that these observations, and the
corresponding paradigmatic conclusions, are not always as simple to
establish as the fact that hot air rises: &lt;strong&gt;replicating many times the
scientific process transforms an evidence into a truth&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;One striking example of scientific replication is &lt;a class="reference external" href="http://www.nature.com/news/first-results-from-psychology-s-largest-reproducibility-test-1.17433"&gt;the on-going effort in
psychology&lt;/a&gt;
to replay the evidence behind well-accepted findings central to
current line of thoughts in psychological sciences. It implies setting up
the experiments accordingly to the seminal publications, acquiring the
data, and processing it to come up to the same conclusions. Surprisingly,
not everything that was taken for granted holds.&lt;/p&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Findings later discredited backed economic policy&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Another example, with massive consequences on Joe Average’s everyday, is
the failed replication of Reinhart and Rogoff’s &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Growth_in_a_Time_of_Debt"&gt;“Growth in a Time of
Debt”&lt;/a&gt;
publication. The original paper, published in 2010 in the American
Economic Review, claimed empirical findings linking important public debt
to failure of GDP growth. In a context of economical crisis, it was used
by policy makers as a justification for restricted public spending.
However, while pursuing a mere homework assignment to replicate these
findings, &lt;a class="reference external" href="http://www.bbc.com/news/magazine-22223190"&gt;a student uncovered methodological flaws with the paper&lt;/a&gt;. Understanding the
&lt;a class="reference external" href="http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems"&gt;limitations&lt;/a&gt;
of the original study took a while, and &lt;strong&gt;discredited the academic
backing to the economical doctrine of austerity&lt;/strong&gt;. Critically, the
analysis of the publication was possible only because Reinhart and Rogoff
&lt;strong&gt;released their spreadsheet, with data and analysis details&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="sharing-code-can-make-science-reproducible"&gt;
&lt;h2&gt;Sharing code can make science reproducible&lt;/h2&gt;
&lt;p&gt;A great example of sharing code to make a publication reproducible is the
recent paper on &lt;a class="reference external" href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0126255"&gt;orthogonalization of regressors in fMRI models&lt;/a&gt;,
by Mumford, Poline and Poldrack. The paper is a didactic refutation
of non-justified data processing practices. The authors made their
point much stronger by giving &lt;a class="reference external" href="http://nbviewer.ipython.org/github/jmumford/orthogonalizaton_ipynb/blob/master/orthogonalization.ipynb"&gt;an IPython notebook&lt;/a&gt;
to reproduce their figures. The recipe works perfectly here, because the
ideas underlying the publication are simple and can be illustrated on
synthetic data with relatively inexpensive computation. A short IPython
notebook is all it takes to convince the reader.&lt;/p&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Sharing complex code… chances are it won’t run on new data.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;At the other end of the spectrum, a complex analysis pipeline will not be
as easy to share. For instance, a feat of strength such as Miyawaki &lt;em&gt;et
al&lt;/em&gt;’s &lt;a class="reference external" href="http://www.cell.com/neuron/abstract/S0896-6273%2808%2900958-6"&gt;visual image
reconstruction from brain activity&lt;/a&gt;
requires complex statistical signal processing to extract weak
signatures. Miyawaki &lt;em&gt;et al&lt;/em&gt; shared the data. They might share the code, but
it would be a large chunk of code, probably fragile to changes in the
environment (Matlab version, OS…). Chances are that it wouldn’t run on
new data. This is the scenario that prompted Titus’s words:&lt;/p&gt;
&lt;blockquote&gt;
&lt;blockquote class="epigraph"&gt;
&lt;a class="reference external" href="http://ivory.idyll.org/blog/2015-how-should-we-think-about-research-software.html"&gt;“Please destroy this software after publication”&lt;/a&gt;.&lt;/blockquote&gt;
&lt;/blockquote&gt;
&lt;p&gt;I have good news: you can reproduce Miyawaki’s work with &lt;a class="reference external" href="http://nilearn.github.io/auto_examples/decoding/plot_miyawaki_reconstruction.html"&gt;an example&lt;/a&gt;
in &lt;a class="reference external" href="http://nilearn.github.io"&gt;nilearn&lt;/a&gt;, a library for
machine learning on brain images. The example itself is concise,
readable and it reliably produces figures close to that of the paper.&lt;/p&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Maintained libraries make feats of strength routinely
reproducible.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This easy replication is only possible because &lt;strong&gt;the corresponding code
leverages a set of libraries that encapsulate the main steps of the
analysis&lt;/strong&gt;, mainly &lt;a class="reference external" href="http://scikit-learn.org/stable/"&gt;scikit-learn&lt;/a&gt; and
&lt;a class="reference external" href="http://nilearn.github.io"&gt;nilearn&lt;/a&gt; here. These libraries are
&lt;a class="reference external" href="https://travis-ci.org/nilearn/nilearn"&gt;tested&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/nilearn/nilearn/issues?q=is%3Aissue+is%3Aclosed"&gt;maintained&lt;/a&gt;
and &lt;a class="reference external" href="http://gael-varoquaux.info/programming/scikit-learn-015-release-highlights.html"&gt;released&lt;/a&gt;.
They enable us to go from a feat of strength to routine replication.&lt;/p&gt;
&lt;!-- * An example of non-reproducible research (my ICML paper) --&gt;
&lt;!-- Can research be up to the software engineering challenge? --&gt;
&lt;/div&gt;
&lt;div class="section" id="reproducibility-is-not-sustainable-for-everything"&gt;
&lt;h2&gt;Reproducibility is not sustainable for everything&lt;/h2&gt;
&lt;!-- Things are not always that easy

It's not you, it's me

Nobody said it was easy

Living up to the promise? --&gt;
&lt;blockquote class="epigraph"&gt;
Thinking is easy, acting is difficult &amp;nbsp; &amp;nbsp; &amp;nbsp;
—  &amp;nbsp; &amp;nbsp; &amp;nbsp;  &lt;em&gt;Goethe&lt;/em&gt;&lt;/blockquote&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Keeping a physics apparatus running for replication years later?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I started my scientific career doing physics, and fairly &lt;a class="reference external" href="http://gael-varoquaux.info/science/general-relativity-quantum-physics-freely-falling-planes-and-bayesian-statistics.html"&gt;“heavy” physics&lt;/a&gt;:
vacuum systems, lasers, free-falling airplanes. In such settings, the
cost of maintaining an experiment is apparent to the layman. No-one is
expected to keep an apparatus running for replication years later. The
pinnacle of reproducible research is when the work becomes doable in a
students lab. Such progress is often supported by improved
technology, driven by wider applications of the findings.&lt;/p&gt;
&lt;p&gt;However, not every experiment will give rise to a students lab.
Replicating the others will not be easy. Even if the instruments are
still around the lab, they will require setting up, adjusting and wiring.
And chances are that connectors or cables will be missing.&lt;/p&gt;
&lt;p&gt;Software is no different. Storing and sharing it is cheaper. But
technology evolves very fast. Every setup is different. Code for a
scientific paper has seldom been built for easy maintenance: lack of
tests, profusion of exotic dependencies, inexistent documentation.
Robustness, portability, isolation, would be desirable, but it is
difficult and costly.&lt;/p&gt;
&lt;p&gt;Software developers know that understanding the constraints to design a
good program requires writing a prototype. &lt;strong&gt;Code for a scientific paper
is very much a prototype&lt;/strong&gt;: it’s a first version of an idea, that proves
its feasibility. Common sense in software engineering says that
&lt;a class="reference external" href="http://blog.codinghorror.com/the-prototype-pitfall/"&gt;prototypes are designed to be thrown away&lt;/a&gt;. Prototype code
is fragile. It’s untested, probably buggy for certain usage. Releasing
prototypes amounts to distributing semi-functioning code. This is the
case for most code accompanying a publication, and it is to be expected
given the very nature of research: exploration and prototyping &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;.&lt;/p&gt;
&lt;!-- Quality scientific software require making choices --&gt;
&lt;!-- Doing less, better --&gt;
&lt;!-- Quality scientific software, only for a happy few --&gt;
&lt;/div&gt;
&lt;div class="section" id="no-success-without-quality"&gt;
&lt;h2&gt;No success without quality, …&lt;/h2&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Highly-reliable is more useful than state-of-the-art.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;My experience with scientific code has taught me that success require
quality. Having a good implementation of simple, well-known, methods
seems to matter more than doing something fancy. This is what the
success of scikit-learn has taught us: we are really providing classic
“old” machine learning methods, but with a good API, good docs,
computational performance, and stable numerics controlled by stringent
tests. There exists plenty of more sophisticated machine-learning
methods, including some that I have developed specifically for my data.
Yet, I find myself advising my co-workers to use the methods in
scikit-learn, because I know that the implementation is reliable and that
they will be able to use them &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This quality is indeed central to doing science with code. What good is a
data analysis pipeline if it crashes when I fiddle with the data? How can
I draw conclusions from simulations if I cannot change their parameters?
As soon as I need trust in code supporting a scientific
finding, I find myself tinkering with its input, and often breaking it.
Good scientific code is code that can be reused, that can lead to
large-scale experiments validating its underlying assumptions.&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;a class="reference external image-reference" href="https://twitter.com/divineomega/status/576165762911608833"&gt;&lt;img alt="" src="../programming/attachments/sqlite_code.png" /&gt;&lt;/a&gt;
&lt;p class="caption"&gt;Sqlite is so much used that its developers have been woken up at
night by users.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;You might say that I am putting the bar too high; that slightly buggy
code is more useful than no code. But I frown at the idea of releasing
code for which I am unable to do proper quality assurance. I may have
done too much of that in the past. And because I am a prolific coder, many
people are using code that has been through my hands. My mailbox looks
like a battlefield, and when I go the coffee machine I find myself
answering questions.&lt;/p&gt;
&lt;!-- Pour vivre heureux, vivons cachés.
http://en.wikipedia.org/wiki/Jean-Pierre_Claris_de_Florian --&gt;
&lt;/div&gt;
&lt;div class="section" id="and-making-difficult-choices"&gt;
&lt;h2&gt;… and making difficult choices&lt;/h2&gt;
&lt;!-- diminishing returns --&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Craftsmanship is about trade-offs&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Achieving quality requires making choices. Not only because time
is limited, but also because the difficulty to maintain and improve a
codebase increases much quicker than the numbers of features &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;. This
phenomena is actually frightening to watch: adding a feature in
scikit-learn these days is much much harder than what it used to be in
the early days. Interactions between features is a killer: when you
modify something, something else unrelated breaks. For a given
functionality, &lt;strong&gt;nothing makes the code more incomprehensible than
cyclomatic complexity&lt;/strong&gt;: the multiplicity of branching, if/then clauses,
for loops. This complexity naturally appears when supporting different
input types, or minor variants of a same method.&lt;/p&gt;
&lt;p&gt;The consequence is that ensuring quality for many variants of a method is
prohibitory. This limit is a real problem for reproducible
science, as science builds upon comparing and opposing models. However,
ignoring it simply leads to code that fails doing what it claims to do.
What this is telling us, is that if we are really trying to do long-term
reproducibility, we &lt;strong&gt;need to identify successful and important research
and focus our efforts on it&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If you agree with my earlier point that the code of a publication is
a prototype, this iterative process seems natural. Various ideas
can be thought of as competing prototypes. Some will not lead to
publication at all, while others will end up having a high impact.
Knowing before-hand is impossible. Focusing too early on achieving high
quality is counter productive. What matters is &lt;strong&gt;progressively
consolidating the code&lt;/strong&gt;.&lt;/p&gt;
&lt;!-- XXX rephrase the above to avoid 'what matters'? --&gt;
&lt;!-- I am sorry to say that my publications are not based on code with 90% test coverage. --&gt;
&lt;!-- say that my methods in machine learning will probably never make it to
scikit-learn --&gt;
&lt;/div&gt;
&lt;div class="section" id="reproducible-science-a-rich-trade-off-space"&gt;
&lt;h2&gt;Reproducible science, a rich trade-off space&lt;/h2&gt;
&lt;div class="admonition align-right note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;Verbatim replication or reuse?&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Does Reinhart and Rogoff’s &lt;em&gt;“Growth in a Time of Debt”&lt;/em&gt; paper face the
same challenges as the manuscript under review by Titus? One is
describing mechanisms while the other is introducing a method. The code
of the former is probably much simpler than that of the latter. Different
publications come with different goals and code that is more or less easy
to share. For verbatim replication of the analysis of a paper, a simple
IPython notebook without tests or API is enough. To go beyond requires
applying the analysis to different problems or data: reuse. Reuse is
very difficult and cannot be a requirement for all publications.&lt;/p&gt;
&lt;!-- As someone who spends a lot of time on method development, I think a lot
in terms of code reuse. On the contrary, --&gt;
&lt;p&gt;Conventional wisdom in academia is that science builds upon ideas and
concepts rather than methods and code. Galileo is known for his
contribution to our understanding of the cosmos. Yet, methods
development underpins science. Galileo is also the inventor of the
telescope, which was a huge technical achievement. He needed to develop
it to back his cosmological theories. Today, Galileo’s measurements are
easy to reproduce because telescopes are readily-available as consumer
products.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote class="epigraph"&gt;
Standing on the shoulders of giants &amp;nbsp; &amp;nbsp; —  &amp;nbsp; &amp;nbsp;
&lt;em&gt;Isaac Newton, on software libraries&lt;/em&gt;&lt;/blockquote&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Related posts&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="../science/publishing_scientific_software_matters.html"&gt;Publishing scientific software matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="views_on_scientific_computing.html"&gt;Personal views on scientific computing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;!-- With great powers come great responsibility --&gt;
&lt;!-- Some publications, including computational ones, strive to contribute an idea. --&gt;
&lt;!-- The way I understand Titus's
phrase *"Please destroy this software after publication"* is that some
methods publication --&gt;
&lt;!-- Is the output of a paper the idea, or the code? It depends? (example of
the ICML) --&gt;
&lt;!-- Different code complexity, different trade-off (loops back to the point
above with Poldrack) --&gt;
&lt;!-- XXX: need to point to the donoho paper and cite it --&gt;
&lt;!-- Recommendations (in a separate blog post?):

* What the difficulties are (evolving APIs, plus configuration problems)
  (skip this point?)

* don't publish method work on non open data (very restrictive, I have
  been criticized for working on 'old', 'uninteresting' data). --&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;To make my point very clear, releasing buggy untested code is not
a good thing. However, it is not possible to ask for all research
papers to come with industial-quality code. I am trying here to push
for a collective, reasoned, undertaking of consolidation.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Theory tells us that there is there is no universal machine
learning algorithm. Given a specific machine-learning application, it
is always possible to devise a custom strategy that out-performs a
generic one. However, &lt;a class="reference external" href="http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf"&gt;do we need hundreds of classifiers to solve
real world classification problems?&lt;/a&gt;
Empirical results &lt;a class="reference external" href="http://jmlr.org/papers/volume15/delgado14a/delgado14a.pdf"&gt;[Delgado 2014]&lt;/a&gt; show
that most of the benefits can be achieved with a small number of
strategies. Is it desirable and sustainable to distribute and keep
alive the code of every machine learning paper?&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Empirical studies on the workload for programmers to achieve a
given task showed that 25 percent increase in problem complexity results in
a 100 percent increase in programming complexity: &lt;a class="reference external" href="http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F32%2F35909%2F01702600.pdf%3Farnumber%3D1702600&amp;amp;authDecision=-203"&gt;An Experiment on
Unit increase in Problem Complexity, Woodfield 1979&lt;/a&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p class="small"&gt;I need to thank my colleague &lt;a class="reference external" href="http://multiplecomparisons.blogspot.fr"&gt;Chris Filo Gorgolewski&lt;/a&gt; and my sister &lt;a class="reference external" href="http://cbio.ensmp.fr/~nvaroquaux/"&gt;Nelle
Varoquaux&lt;/a&gt; for their
feedback on this note.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="science"></category><category term="software"></category><category term="reproducible research"></category><category term="scientific software"></category></entry><entry><title>MLOSS: machine learning open source software workshop @ ICML 2015</title><link href="https://gael-varoquaux.info/programming/mloss-machine-learning-open-source-software-workshop-icml-2015.html" rel="alternate"></link><published>2015-04-23T00:00:00+02:00</published><updated>2015-04-23T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-04-23:/programming/mloss-machine-learning-open-source-software-workshop-icml-2015.html</id><summary type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This year again we will have an exciting workshop on the
leading-edge machine-learning open-source software. This subject is
central to many, because software is how we propagate, reuse, and
apply progress in machine learning.&lt;/em&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;strong&gt;Want to present a project? The deadline for the call for papers is
Apr 28th …&lt;/strong&gt;&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This year again we will have an exciting workshop on the
leading-edge machine-learning open-source software. This subject is
central to many, because software is how we propagate, reuse, and
apply progress in machine learning.&lt;/em&gt;&lt;/p&gt;
&lt;p class="last"&gt;&lt;strong&gt;Want to present a project? The deadline for the call for papers is
Apr 28th, in a few days&lt;/strong&gt;
: &lt;a class="reference external" href="http://mloss.org/workshop/icml15/"&gt;http://mloss.org/workshop/icml15/&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The workshop will be help at the &lt;a class="reference external" href="http://icml.cc/2015/"&gt;ICML conference&lt;/a&gt;, in Lille France, on July 10th. ICML
–International Conference in Machine Learning– is the leading venue for
academic research in machine learning. It’s a fantastic place to hold
such a workshop, as the actors of theoretical progress are all around.
Software is the bridge that brings this progress beyond papers.&lt;/p&gt;
&lt;p&gt;There is a &lt;a class="reference external" href="http://mloss.org/workshop/"&gt;long tradition&lt;/a&gt; of MLOSS
workshop, with one every year and a half. Last time, at NIPS 2013, I
could feel a bit of a turning point, as people started feeling that
different software slotted together, to create an efficient and
state-of-the art working environment. For this reason, we have entitled
this year’s workshop ‘open ecosystems’, stressing that contributions in
the scope of the workshop, that build a thriving work environment, are
not only machine learning software, but also better statistics or
numerical tools.&lt;/p&gt;
&lt;p&gt;We have two keynotes with important contributions to such ecosystems:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.johnmyleswhite.com/"&gt;John Myles White&lt;/a&gt; (Facebook), lead
developer of Julia statistics and machine learning: “Julia for machine
learning: high-level syntax with compiled-code speed”&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://matthewrocklin.com"&gt;Matthew Rocklin&lt;/a&gt; (Continuum Analytics),
developer of Python computational tools, in particular Blaze (confirmed):
“Blaze, a modern numerical engine with out-of-core and out-of-order
computations”.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There will be also a practical presentation on how to set up an
open-source project, discussing hosting, community development, quality
assurance, license choice, by yours truly.&lt;/p&gt;
</content><category term="programming"></category><category term="conferences"></category><category term="machine learning"></category><category term="scientific computing"></category><category term="scipy"></category></entry><entry><title>Job offer: working on open source data processing in Python</title><link href="https://gael-varoquaux.info/programming/job-offer-working-on-open-source-data-processing-in-python.html" rel="alternate"></link><published>2015-04-02T00:00:00+02:00</published><updated>2015-04-02T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-04-02:/programming/job-offer-working-on-open-source-data-processing-in-python.html</id><summary type="html">&lt;p&gt;We, &lt;a class="reference external" href="https://team.inria.fr/parietal/"&gt;Parietal team&lt;/a&gt; at &lt;a class="reference external" href="http://www.inria.fr/"&gt;INRIA&lt;/a&gt;, are recruiting software developers to work on
open source machine learning and neuroimaging software in Python.&lt;/p&gt;
&lt;p&gt;In general, we are looking for people who:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;have a mathematical mindset,&lt;/li&gt;
&lt;li&gt;are curious about data (ie like looking at data and understanding it)&lt;/li&gt;
&lt;li&gt;have an affinity for problem-solving …&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;</summary><content type="html">&lt;p&gt;We, &lt;a class="reference external" href="https://team.inria.fr/parietal/"&gt;Parietal team&lt;/a&gt; at &lt;a class="reference external" href="http://www.inria.fr/"&gt;INRIA&lt;/a&gt;, are recruiting software developers to work on
open source machine learning and neuroimaging software in Python.&lt;/p&gt;
&lt;p&gt;In general, we are looking for people who:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;have a mathematical mindset,&lt;/li&gt;
&lt;li&gt;are curious about data (ie like looking at data and understanding it)&lt;/li&gt;
&lt;li&gt;have an affinity for problem-solving tradeoffs&lt;/li&gt;
&lt;li&gt;love high-quality code&lt;/li&gt;
&lt;li&gt;worry about users&lt;/li&gt;
&lt;li&gt;are good scientific Python coders,&lt;/li&gt;
&lt;li&gt;enjoy interacting with a community of developers&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;We welcome candidates people without all the skills, but are strongly
motivated to acquire them. Prior open-source experience is a big plus.&lt;/p&gt;
&lt;p&gt;One example of such position with application in Neuroimaging is:
&lt;a class="reference external" href="http://gael-varoquaux.info/programming/hiring-a-programmer-for-a-brain-imaging-machine-learning-library.html"&gt;http://gael-varoquaux.info/programming/hiring-a-programmer-for-a-brain-imaging-machine-learning-library.html&lt;/a&gt;
Which was opened a year ago and has now resulted in nilearn:
&lt;a class="reference external" href="http://nilearn.github.io/"&gt;http://nilearn.github.io/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other positions may be more focused on general machine learning or
computing tools such as scikit-learn and joblib, which are reference
open-source libraries for data processing in Python.&lt;/p&gt;
&lt;p&gt;We are a tightly knit team, with a high degree of programming, data
analysis and neuroimaging skills.&lt;/p&gt;
&lt;p&gt;Please contact me and Olivier Grisel if you are interested,&lt;/p&gt;
</content><category term="programming"></category><category term="jobs"></category><category term="machine learning"></category><category term="neuroimaging"></category><category term="python"></category></entry><entry><title>Euroscipy 2015: Call for paper</title><link href="https://gael-varoquaux.info/programming/euroscipy-2015-call-for-paper.html" rel="alternate"></link><published>2015-03-28T00:00:00+01:00</published><updated>2015-03-28T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-03-28:/programming/euroscipy-2015-call-for-paper.html</id><summary type="html">&lt;p&gt;EuroScipy 2015, the annual conference on Python in science will take
place in Cambridge, UK on 26-30 August 2015. The conference features two
days of tutorials followed by two days of scientific talks &amp;amp; posters and
an extra day dedicated to developer sprints. It is the major event in
Europe in …&lt;/p&gt;</summary><content type="html">&lt;p&gt;EuroScipy 2015, the annual conference on Python in science will take
place in Cambridge, UK on 26-30 August 2015. The conference features two
days of tutorials followed by two days of scientific talks &amp;amp; posters and
an extra day dedicated to developer sprints. It is the major event in
Europe in the field of technical/scientific computing within the Python
ecosystem. Scientists, PhD’s, students, data scientists, analysts, and
quants from more than 20 countries attended the conference last year.&lt;/p&gt;
&lt;p&gt;The topics presented at EuroSciPy are very diverse, with a focus on advanced
software engineering and original uses of Python and its scientific libraries,
either in theoretical or experimental research, from both academia and the
industry.&lt;/p&gt;
&lt;p&gt;Submissions for posters, talks &amp;amp; tutorials (beginner and advanced) are welcome
on our website at &lt;a class="reference external" href="http://www.euroscipy.org/2015/"&gt;http://www.euroscipy.org/2015/&lt;/a&gt;
Sprint proposals should be addressed directly to the organisation at
&lt;em&gt;euroscipy-org&amp;#64;python.org&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important dates&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;em&gt;Apr 30, 2015&lt;/em&gt; Talk and tutorials submission deadline&lt;/li&gt;
&lt;li&gt;&lt;em&gt;May 1, 2015&lt;/em&gt; Registration opens&lt;/li&gt;
&lt;li&gt;&lt;em&gt;May 30, 2015&lt;/em&gt; Final program announced&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Jun 15, 2015&lt;/em&gt; Early-bird registration ends&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Aug 26-27, 2015&lt;/em&gt; Tutorials&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Aug 28-29, 2015&lt;/em&gt; Main conference&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Aug 30, 2015&lt;/em&gt; Sprints&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We look forward to an exciting conference and hope to see you in Cambridge&lt;/p&gt;
&lt;p&gt;The EuroSciPy 2015 Team - &lt;a class="reference external" href="http://ww.euroscipy.org/2015/"&gt;http://ww.euroscipy.org/2015/&lt;/a&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="conferences"></category></entry><entry><title>PRNI 2016: call for organization</title><link href="https://gael-varoquaux.info/programming/prni-2016-call-for-organization.html" rel="alternate"></link><published>2015-01-01T00:00:00+01:00</published><updated>2015-01-01T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2015-01-01:/programming/prni-2016-call-for-organization.html</id><summary type="html">&lt;p class="first last"&gt;The steering committee of PRNI (Pattern Recognition for NeuroImaging) is opening a call for bid to organize the conference in June 2016, in Europe&lt;/p&gt;
</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="http://www.prni.org"&gt;PRNI (Pattern Recognition for NeuroImaging)&lt;/a&gt; is
an IEEE conference about applying pattern recognition and machine
learning to brain imaging. It is a mid-sized conference (about 150
attendee), and is a satellite of OHBM (the annual “Human Brain Mapping”
meeting).&lt;/p&gt;
&lt;p&gt;The steering committee is calling for bids to organize the conference in
June 2016, in Europe, as a satellite the OHBM meeting in Geneva.&lt;/p&gt;
</content><category term="programming"></category><category term="neuroimaging"></category><category term="conferences"></category><category term="science"></category><category term="machine learning"></category></entry><entry><title>Improving your programming style in Python</title><link href="https://gael-varoquaux.info/programming/improving-your-programming-style-in-python.html" rel="alternate"></link><published>2014-09-29T00:00:00+02:00</published><updated>2014-09-29T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2014-09-29:/programming/improving-your-programming-style-in-python.html</id><summary type="html">&lt;p class="first last"&gt;Some references on software development techniques and patterns to help write better code.&lt;/p&gt;
</summary><content type="html">&lt;p&gt;Here are some references on software development techniques and patterns
to help write better code. They are intended for the casual programmer,
and certainly not an advanced developer.&lt;/p&gt;
&lt;p&gt;They are listed in order of difficulty.&lt;/p&gt;
&lt;div class="section" id="software-carpentry"&gt;
&lt;h2&gt;Software carpentry&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://swc.scipy.org"&gt;http://swc.scipy.org&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;These are the original notes from Greg Wilson’s course on software
engineering at the university of Toronto. This course is specifically
intended for scientists, but not computer science students. It is very
basic and does not cover design issues.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-tutorial-introduction-to-python"&gt;
&lt;h2&gt;A tutorial introduction to Python&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.informit.com/articles/article.asp?p=23100&amp;amp;seqNum=3&amp;amp;rl=1"&gt;http://www.informit.com/articles/article.asp?p=23100&amp;amp;seqNum=3&amp;amp;rl=1&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This tutorial is easier to follow than &lt;a class="reference external" href="http://www.python.org/doc/"&gt;Guido’s tutorial&lt;/a&gt;, thought it does not go as much in depth.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="python-essential-reference"&gt;
&lt;h2&gt;Python Essential Reference&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.informit.com/articles/article.asp?p=453682&amp;amp;rl=1"&gt;http://www.informit.com/articles/article.asp?p=453682&amp;amp;rl=1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.informit.com/articles/article.asp?p=459269&amp;amp;rl=1"&gt;http://www.informit.com/articles/article.asp?p=459269&amp;amp;rl=1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;These are two chapters out of David Beazley’s excellent book &lt;a class="reference external" href="http://www.amazon.com/Python-Essential-Reference-David-Beazley/dp/0735710910"&gt;Python
Essential Reference&lt;/a&gt;.
They allow to understand more deeply how python works. I strongly recommend
this book to anybody serious about python.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="an-introduction-to-regular-expressions"&gt;
&lt;h2&gt;An Introduction to Regular Expressions&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.informit.com/articles/article.asp?p=20454&amp;amp;rl=1"&gt;http://www.informit.com/articles/article.asp?p=20454&amp;amp;rl=1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;If you are going to do any sort of text manipulation, you absolutely need
to know how to use regular expressions: powerful search and replace patterns.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="software-design-for-maintainability"&gt;
&lt;h2&gt;Software design for maintainability&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="./software-design-for-maintainability.html"&gt;My own post&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A case of shameless plug: this is a post that I wrote a few years ago. I
think that it is still relevant.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="writing-a-graphical-application-for-scientific-programming-using-traitsui"&gt;
&lt;h2&gt;Writing a graphical application for scientific programming using TraitsUI&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://gael-varoquaux.info/computers/traits_tutorial/index.html"&gt;http://gael-varoquaux.info/computers/traits_tutorial/index.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Building interactive graphical application is a difficult problem. I have
found that the traitsUI module provides a great answer to this problem.
This is a tutorial intended for the non programmer.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="an-introduction-to-python-iterators"&gt;
&lt;h2&gt;An introduction to Python iterators&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.informit.com/articles/article.asp?p=26895&amp;amp;rl=1"&gt;http://www.informit.com/articles/article.asp?p=26895&amp;amp;rl=1&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This article may not be terribly easy to follow, but iterator are a
great feature of Python, so this is definitely worth reading.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="functional-programming"&gt;
&lt;h2&gt;Functional programming&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.ibm.com/developerworks/linux/library/l-prog.html?open&amp;amp;l=766,t=gr,p=PrmgPyth"&gt;http://www.ibm.com/developerworks/linux/library/l-prog.html?open&amp;amp;l=766,t=gr,p=PrmgPyth&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Functional programming is a programming style where mathematical
functions are successively applied to immutable objects to go from the
inputs of the program to its outputs in a succession of transformation.
It is appreciated by some because it is easy to analyze and prove.
In certain cases it can be very readable.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="patterns-in-python"&gt;
&lt;h2&gt;Patterns in Python&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.suttoncourtenay.org.uk/duncan/accu/pythonpatterns.html"&gt;http://www.suttoncourtenay.org.uk/duncan/accu/pythonpatterns.html&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This document exposes a few design patterns in Python. Design patterns
are solutions to recurring development problems using object oriented
programming. I suggest this reading only if you are familiar with OOP.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="idiomatic-python"&gt;
&lt;h2&gt;Idiomatic Python&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p class="first"&gt;Jeff Knupp’s post, a summary of his book:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.jeffknupp.com/blog/2012/10/04/writing-idiomatic-python/"&gt;http://www.jeffknupp.com/blog/2012/10/04/writing-idiomatic-python/&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p class="first"&gt;The &lt;a class="reference external" href="https://scipy-lectures.github.io"&gt;scipy-lectures&lt;/a&gt; chapter on
advanced Python:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://scipy-lectures.github.io/advanced/advanced_python/index.html"&gt;https://scipy-lectures.github.io/advanced/advanced_python/index.html&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="general-object-oriented-programming-advice"&gt;
&lt;h2&gt;General Object-Oriented programming advice&lt;/h2&gt;
&lt;p&gt;Designing Object-oriented code actually requires some care: when you are
building your set of abstractions, you are designing the world in which
you are going to be condemned to living (or actually coding). I would
advice people to keep things as simple as possible, and follow the SOLID
principles:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://mmiika.wordpress.com/oo-design-principles/"&gt;http://mmiika.wordpress.com/oo-design-principles/&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="using-decorators-to-do-meta-programming-in-python"&gt;
&lt;h2&gt;Using decorators to do meta-programming in Python&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www-128.ibm.com/developerworks/linux/library/l-cpdecor.html"&gt;http://www-128.ibm.com/developerworks/linux/library/l-cpdecor.html&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A very beautiful article for the advanced python user. Meta-programming
is a programming technique that involves changing the program at the
run-time. This allows to add new abstractions to the code the
programmer writes, thus creating a “meta-language”. This article shows
this very well.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-primer-on-python-metaclass-programming"&gt;
&lt;h2&gt;A Primer on Python Metaclass Programming&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.onlamp.com/lpt/a/3388"&gt;http://www.onlamp.com/lpt/a/3388&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Metaclasses allow to define new style of objects, that can have different
calling, creation or inheritance rules. This is way over my head, but I
am referencing it here for the record.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="iterators-in-python"&gt;
&lt;h2&gt;Iterators in Python&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="https://docs.python.org/2/library/itertools.html#recipes"&gt;https://docs.python.org/2/library/itertools.html#recipes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Learn to use the itertools (but don’t abuse them)!&lt;/p&gt;
&lt;p&gt;Related to the producer/consumer problem with iterators, see:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.oluyede.org/blog/2007/04/09/producerconsumer-in-python/"&gt;http://www.oluyede.org/blog/2007/04/09/producerconsumer-in-python/&lt;/a&gt;&lt;/p&gt;
&lt;!-- vim:spell:spelllang=en_us ft=rst --&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="software engineering"></category><category term="selected"></category></entry><entry><title>Hiring an engineer to mine large functional-connectivity databases</title><link href="https://gael-varoquaux.info/programming/hiring-an-engineer-to-mine-large-functional-connectivity-databases.html" rel="alternate"></link><published>2014-09-20T00:00:00+02:00</published><updated>2014-09-20T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2014-09-20:/programming/hiring-an-engineer-to-mine-large-functional-connectivity-databases.html</id><summary type="html">&lt;p&gt;&lt;strong&gt;Work with us to leverage leading-edge machine learning for
neuroimaging&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At &lt;a class="reference external" href="https://team.inria.fr/parietal"&gt;Parietal&lt;/a&gt;, my research team,
we work on improving the way brain images are analyzed, for medical
diagnostic purposes, or to understand the brain better. We develop
new machine-learning tools and investigate new methodologies for
for quantifying brain function from …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Work with us to leverage leading-edge machine learning for
neuroimaging&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At &lt;a class="reference external" href="https://team.inria.fr/parietal"&gt;Parietal&lt;/a&gt;, my research team,
we work on improving the way brain images are analyzed, for medical
diagnostic purposes, or to understand the brain better. We develop
new machine-learning tools and investigate new methodologies for
for quantifying brain function from MRI scans.&lt;/p&gt;
&lt;p&gt;One of our important alley of contributions is in deciphering “functional
connectivity”: analysis the correlation of brain activity to measure
interactions across the brain. This direction of research is exciting
because it can be used to probe the neural-support of &lt;em&gt;functional&lt;/em&gt;
deficits in incapacitated patients, and thus lead to new biomarkers on
functional pathologies, such as autism. Indeed, functional connectivity
can be computed without resorting to complicated cognitive tasks, unlike
most functional imaging approaches. The flip side is that exploiting such
“resting-state” signal requires advanced multivariate statistics tools,
something at which the Parietal team excels.&lt;/p&gt;
&lt;p&gt;For such multivariate processing of brain imaging data, Parietal has an
ecosystem of &lt;a class="reference external" href="https://team.inria.fr/parietal/software"&gt;leading-edge high-quality tools&lt;/a&gt;. In particular we have built
the foundations of the most successful Python machine learning library,
&lt;a class="reference external" href="http://scikit-learn"&gt;scikit-learn&lt;/a&gt;, and we are growing a dedicate
software, &lt;a class="reference external" href="http://nilearn.github.io/"&gt;nilearn&lt;/a&gt;, that leverages
machine-learning for neuroimaging. To support this ecosystem, we have
dedicated top-notch programmers, lead by the well-known
&lt;a class="reference external" href="http://ogrisel.com/"&gt;Olivier Grisel&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We are looking for a data-processing engineer to join our team and work
on &lt;strong&gt;applying our tools on very large neuroimaging databases to
learn specific biomarkers of pathologies&lt;/strong&gt;. For this, the work will be
shared with the &lt;a class="reference external" href="http://www.cati-neuroimaging.com/"&gt;CATI&lt;/a&gt;, the Fench
platform for multicentric neuroimaging studies, located in the same
building as us. The general context of the job is the &lt;a class="reference external" href="https://team.inria.fr/parietal/research/spatial_patterns/niconnect/"&gt;NiConnect&lt;/a&gt;
project, a multi-organisational research project that I lead and
that focuses on improving diagnostic tools on resting-state functional
connectivity. We have access to unique algorithms and datasets, before
they are published. What we are now missing between those two, and that
link could be you.&lt;/p&gt;
&lt;p&gt;If you want more details, they can be found on the &lt;a class="reference external" href="https://team.inria.fr/parietal/job-offers"&gt;job offer&lt;/a&gt;. This post is to motivate
the job in a personal way, that I cannot give in an official posting.&lt;/p&gt;
&lt;div class="section" id="why-take-this-job"&gt;
&lt;h2&gt;Why take this job?&lt;/h2&gt;
&lt;p&gt;I don’t expect some to take this job only because it pays the bill. To be
clear, the kind of person I am looking for has no difficulties finding a
job elsewhere. So, if you are that person, why would you take the job?&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;To join &lt;a class="reference external" href="https://team.inria.fr/parietal/team-members/"&gt;a great team&lt;/a&gt;
with many experts, focused on finding elegant solutions to hard
problems at the intersection of machine learning, cognitive science,
and software. Choose to work with great people, knowledgeable,
passionate, and &lt;a class="reference external" href="https://team.inria.fr/parietal/inria-winter-party-2014/"&gt;fun&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;To work on interesting problems, that matter. They are interesting
because they are challenging but we have the skills to solve them. They
matter because they can make brain research better.&lt;/li&gt;
&lt;li&gt;To learn. NeuroImaging + Machine learning is a quickly growing topic.
If you come from a NeuroImaging background and want to add to your CV
an actual expertise in machine learning for NeuroImaging. This is the
place to be.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="what-would-make-me-excited-in-a-resume"&gt;
&lt;h2&gt;What would make me excited in a resume?&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;A genuine experience in neuroimaging data processing, especially large
databases.&lt;/li&gt;
&lt;li&gt;Talent with computers and ideally some Python experience.&lt;/li&gt;
&lt;li&gt;The unlikely combination of research training (graduate or
undergraduate) and experience in a non academic setting.&lt;/li&gt;
&lt;li&gt;A problem-solving mindset.&lt;/li&gt;
&lt;li&gt;A good ability to write about neuroimaging and data processing in
English: who knows, if everything goes to plan, you could very well be
publishing about new biomarkers.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Now if you are interested and feel up for the challenge, read the real
&lt;a class="reference external" href="https://team.inria.fr/parietal/job-offers"&gt;job offer&lt;/a&gt;, and send me
your resume.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="jobs"></category><category term="neuroimaging"></category><category term="python"></category></entry><entry><title>Scikit-learn 2014 sprint: a report</title><link href="https://gael-varoquaux.info/programming/scikit-learn-2014-sprint-a-report.html" rel="alternate"></link><published>2014-07-25T00:00:00+02:00</published><updated>2014-07-25T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2014-07-25:/programming/scikit-learn-2014-sprint-a-report.html</id><summary type="html">&lt;p&gt;A week ago, the 2014 edition of the
&lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt; sprint was held in Paris.
This was the third time that we held an internation sprint and it was
hugely productive, and great fun, as always.&lt;/p&gt;
&lt;div class="section" id="great-people-and-great-venues"&gt;
&lt;h2&gt;Great people and great venues&lt;/h2&gt;
&lt;img alt="" class="align-center" src="https://pbs.twimg.com/media/BsqD4BeCQAEnT6w.jpg" style="width: 65%;" /&gt;
&lt;p&gt;We had a mix of core contributors and newcomers, which …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;A week ago, the 2014 edition of the
&lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt; sprint was held in Paris.
This was the third time that we held an internation sprint and it was
hugely productive, and great fun, as always.&lt;/p&gt;
&lt;div class="section" id="great-people-and-great-venues"&gt;
&lt;h2&gt;Great people and great venues&lt;/h2&gt;
&lt;img alt="" class="align-center" src="https://pbs.twimg.com/media/BsqD4BeCQAEnT6w.jpg" style="width: 65%;" /&gt;
&lt;p&gt;We had a mix of core contributors and newcomers, which is a great
combination, as it enables us to be productive, but also to foster the
new generation of core developers. Were present:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Laurent Direr&lt;/li&gt;
&lt;li&gt;Michael Eickenberg&lt;/li&gt;
&lt;li&gt;Loic Esteve&lt;/li&gt;
&lt;li&gt;Alexandre Gramfort&lt;/li&gt;
&lt;li&gt;Olivier Grisel&lt;/li&gt;
&lt;li&gt;Arnaud Joly&lt;/li&gt;
&lt;li&gt;Kyle Kastner&lt;/li&gt;
&lt;li&gt;Manoj Kumar&lt;/li&gt;
&lt;li&gt;Balazs Kegl&lt;/li&gt;
&lt;li&gt;Nicolas Le Roux&lt;/li&gt;
&lt;li&gt;Andreas Mueller&lt;/li&gt;
&lt;li&gt;Vlad Niculae&lt;/li&gt;
&lt;li&gt;Fabian Pedregosa&lt;/li&gt;
&lt;li&gt;Amir Sani&lt;/li&gt;
&lt;li&gt;Danny Sullivan&lt;/li&gt;
&lt;li&gt;Gabriel Synnaeve&lt;/li&gt;
&lt;li&gt;Roland Thiolliere&lt;/li&gt;
&lt;li&gt;Gael Varoquaux&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="" class="align-center" src="https://pbs.twimg.com/media/BsqRedvCEAE5Opw.jpg" style="width: 65%;" /&gt;
&lt;p&gt;As the sprint extended through a French bank holiday and the week end,
we were hosted in a variety of venues:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://lapaillasse.org"&gt;La paillasse&lt;/a&gt;, a Paris bio-hacker space&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.inria.fr"&gt;INRIA&lt;/a&gt;, the French computer-science national
research, and the place where I work :)&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.criteo.com"&gt;Criteo&lt;/a&gt;, a French company doing word-wide
add-banner placement. The venue there was absolutely gorgeous, with a
beautiful terrace on the roofs of Paris. And they even had a social
event with free drinks one evening.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.tinyclues.com"&gt;Tinyclues&lt;/a&gt;, a French startup mining
e-commerce data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I must say that we were treated like kings during the whole stay; each
host welcoming us as well they could. Thank you to all of our hosts!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="sponsored-by-the-digicosm-labex"&gt;
&lt;h2&gt;Sponsored by the Digicosm Labex&lt;/h2&gt;
&lt;p&gt;Beyond our hosts, we need to thank the &lt;a class="reference external" href="https://digicosme.lri.fr/tiki-index.php"&gt;Digicosme Labex&lt;/a&gt;.
Digicosm gave us funding that covered some of the lunches, accomodations,
and travel expenses to bring in our contributors from abroad.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="achievements-during-the-sprint"&gt;
&lt;h2&gt;Achievements during the sprint&lt;/h2&gt;
&lt;p&gt;The first day of the sprint was dedicated to polishing the &lt;a class="reference external" href="http://www.scikit-learn.org/stable/whats_new.html"&gt;0.15
release&lt;/a&gt;, which
was finally released on the morning of the second day, after 10 months
of development.&lt;/p&gt;
&lt;p&gt;A large part of the efforts of the sprint were dedicated to improving
the coding base, rather than directly adding new features. Some files
were reorganized. The input validation code was cleaned up (opening the
way for better support of pandas structures in scikit-learn). We hunted
dead code, deprecation warnings, numerical instabilities and tests
randomly failing. We made the test suite faster, and refactored our
common tests that scan all the model.&lt;/p&gt;
&lt;p&gt;Some work of our GSOC student, Manoj Kumar, was merged, making some
linear models faster.&lt;/p&gt;
&lt;p&gt;Our &lt;a class="reference external" href="http:/scikit-learn.org/dev"&gt;online documentation&lt;/a&gt; was improve
with the &lt;a class="reference external" href="http://scikit-learn.org/stable/modules/classes.html"&gt;API
documentation&lt;/a&gt;
pointing to examples and source code.&lt;/p&gt;
&lt;p&gt;Still work in progress:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Faster stochastic gradient descent (with AdaGrad, ASGD, and one day
SAG)&lt;/li&gt;
&lt;li&gt;Calibration of probabilities for models that do not have a
‘predict_proba’ method&lt;/li&gt;
&lt;li&gt;Warm restart in random forests to add more estimators to an existing
ensemble.&lt;/li&gt;
&lt;li&gt;Infomax ICA algorithm.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="sprint"></category><category term="scikit-learn"></category><category term="python"></category><category term="machine learning"></category></entry><entry><title>Scikit-learn 0.15 release: highlights</title><link href="https://gael-varoquaux.info/programming/scikit-learn-015-release-highlights.html" rel="alternate"></link><published>2014-07-15T00:00:00+02:00</published><updated>2014-07-15T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2014-07-15:/programming/scikit-learn-015-release-highlights.html</id><summary type="html">&lt;p&gt;We have just released the 0.15 version of scikit-learn. Hurray!! Thanks
to all
&lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html#people"&gt;involved&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="a-long-development-stretch"&gt;
&lt;h2&gt;A long development stretch&lt;/h2&gt;
&lt;p&gt;It’s been a while since the &lt;a class="reference external" href="http://gael-varoquaux.info/programming/scikit-learn-014-release-features-and-benchmarks.html"&gt;last release of
scikit-learn&lt;/a&gt;. So a lot has
happened. Exactly 2611 commits according my count. Quite clearly, we
have more and more existing code …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;We have just released the 0.15 version of scikit-learn. Hurray!! Thanks
to all
&lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html#people"&gt;involved&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="a-long-development-stretch"&gt;
&lt;h2&gt;A long development stretch&lt;/h2&gt;
&lt;p&gt;It’s been a while since the &lt;a class="reference external" href="http://gael-varoquaux.info/programming/scikit-learn-014-release-features-and-benchmarks.html"&gt;last release of
scikit-learn&lt;/a&gt;. So a lot has
happened. Exactly 2611 commits according my count. Quite clearly, we
have more and more existing code, more and more features to support.
This means that when we modify an algorithm, for instance to make it
faster, something else might break due to numerical instability, or
exploring some obscure option. The good news is that we have tight
continuous integration, mostly thanks to
&lt;a class="reference external" href="https://travis-ci.org/scikit-learn/scikit-learn"&gt;travis&lt;/a&gt; (but
Windows continuous integration is on its way), and we keep growing our
test suite. Thus while it is getting harder and harder to change
something in scikit-learn, scikit-learn is also becoming more and more
robust.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="highlights"&gt;
&lt;h2&gt;Highlights&lt;/h2&gt;
&lt;a class="reference external image-reference" href="https://twitter.com/t3kcit/status/434378452901187584"&gt;&lt;img alt="" src="https://pbs.twimg.com/media/Bgc45seCUAAbze1.png" /&gt;&lt;/a&gt;
&lt;p&gt;&lt;strong&gt;Quality&lt;/strong&gt; — Looking at the commit log, there has been a huge amount of
work to &lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html#id7"&gt;fix minor annoying
issues&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Speed&lt;/strong&gt; — There has been a huge effort put in making many parts of
scikit-learn faster. Little details all over the codebase. We do hope
that you’ll find that your applications run faster. For instance, we
find that the worst case speed of Ward clustering is 1.5 times faster in
0.15 than 0.14. K-means clustering is often 1.1 times faster. KNN, when
used in brute-force mode, got faster by a factor of 2 or 3.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Random Forest and various tree methods&lt;/strong&gt; — The random forest and
various tree methods are much much faster, use parallel computing much
better, and use less memory. For instance, the picture on the right
shows the scikit-learn random forest running in parallel on a fat Amazon
node, and nicely using all the CPUs with little RAM usage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hierarchical aglomerative clustering&lt;/strong&gt; — &lt;a class="reference external" href="http://scikit-learn.org/dev/modules/clustering.html#different-linkage-type-ward-complete-and-average-linkage"&gt;Complete linkage and average
linkage clustering have been
added&lt;/a&gt;.
The benefit of these approach compared to the existing Ward clustering
is that they can take &lt;a class="reference external" href="http://scikit-learn.org/stable/modules/clustering.html#varying-the-metric"&gt;an arbitrary distance
matrix&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Robust linear models&lt;/strong&gt; — Scikit-learn now includes
&lt;a class="reference external" href="http://scikit-learn.org/0.15/modules/linear_model.html#robustness-to-outliers-ransac"&gt;RANSAC&lt;/a&gt;
for robust linear regression.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HMM are deprecated&lt;/strong&gt; — We have been discussing for a long time removing
HMMs, that do not fit in the focus of scikit-learn on predictive
modeling. We have created a separate
&lt;a class="reference external" href="https://github.com/hmmlearn/hmmlearn"&gt;hmmlearn&lt;/a&gt; repository for the
HMM code. It is looking for maintainers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;And much more&lt;/strong&gt; — plenty of &lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html"&gt;“minor
things”&lt;/a&gt;, such as
better support for sparse data, better support for multi-label data…&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="machine learning"></category><category term="python"></category></entry><entry><title>Google summer of code projects for scikit-learn</title><link href="https://gael-varoquaux.info/programming/google-summer-of-code-projects-for-scikit-learn.html" rel="alternate"></link><published>2014-04-23T00:00:00+02:00</published><updated>2014-04-23T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2014-04-23:/programming/google-summer-of-code-projects-for-scikit-learn.html</id><summary type="html">&lt;p&gt;I’d like to welcome the four students that were accepted for the GSoC
this year:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Issam: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/issamou/5733935958982656"&gt;Extending Neural networks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hamzeh: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/hamsal/5709068098338816"&gt;Sparse Support for Ensemble Methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Manoj: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/manojkumar/5673522948997120"&gt;Making Linear models faster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Maheshakya: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/maheshakya/5754903989321728"&gt;Locality Sensitive Hashing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Welcome to all of you. Your submissions were excellent, and you
demonstrated a good will …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I’d like to welcome the four students that were accepted for the GSoC
this year:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Issam: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/issamou/5733935958982656"&gt;Extending Neural networks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hamzeh: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/hamsal/5709068098338816"&gt;Sparse Support for Ensemble Methods&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Manoj: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/manojkumar/5673522948997120"&gt;Making Linear models faster&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Maheshakya: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/details/google/gsoc2014/maheshakya/5754903989321728"&gt;Locality Sensitive Hashing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Welcome to all of you. Your submissions were excellent, and you
demonstrated a good will to integrate in the project, with its social and
coding dynamics. It is a privilege to work with you.&lt;/p&gt;
&lt;p&gt;I’d also like to thank all the mentors, Alex, Arnaud, Daniel, James,
Jaidev, Olivier, Robert and Vlad. It is a lot of work to mentor and
mentors are not only making it possible for great code to enter
scikit-learn, but also shaping a future generation of scikit-learn
contributors.&lt;/p&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="machine learning"></category></entry><entry><title>Hiring a programmer for a brain imaging machine-learning library</title><link href="https://gael-varoquaux.info/programming/hiring-a-programmer-for-a-brain-imaging-machine-learning-library.html" rel="alternate"></link><published>2014-02-12T00:00:00+01:00</published><updated>2014-02-12T00:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2014-02-12:/programming/hiring-a-programmer-for-a-brain-imaging-machine-learning-library.html</id><summary type="html">&lt;p&gt;&lt;strong&gt;Work with us on putting machine learning in the hand of cognitive
scientists&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Parietal is a research team that creates advanced data analysis to mine
functional brain images and solve medical and cognitive science problems.
Our day to day work is to write machine-learning and statistics code to
understand and …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;Work with us on putting machine learning in the hand of cognitive
scientists&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Parietal is a research team that creates advanced data analysis to mine
functional brain images and solve medical and cognitive science problems.
Our day to day work is to write machine-learning and statistics code to
understand and use better images of brain function (most often fMRI). Our
purpose is to be useful to the NeuroImaging community, mostly medical and
cognitive science researched, to understand brain function better. What
is limiting us in this respect is that to reach end users we need to turn
our algorithms in usable software.&lt;/p&gt;
&lt;p&gt;This is why Parietal has a long tradition of investing in building an
ecosystem of &lt;a class="reference external" href="https://team.inria.fr/parietal/software"&gt;high-quality libraries and tools&lt;/a&gt;: we build, layer by layer, an
environment in which we can do our research, and with which we hope to
one day reach the user. We choose Python, as a high-level general purpose
language with which we can do scientific computing, and, one day, GUIs,
or web servers. We contribute to the scipy ecosystem; we have built the
foundations of the most successful Python machine learning library,
&lt;a class="reference external" href="http://scikit-learn"&gt;scikit-learn&lt;/a&gt;. We are invested in the
&lt;a class="reference external" href="http://nipy.org"&gt;neuroimaging in Python ecosystem&lt;/a&gt;. Our students, our
team members, send patches to scientific Python projects, teach courses
on how to use them, speak at conferences.&lt;/p&gt;
&lt;p&gt;But to go all the way, we need support from people who do software as
there sole goal. To put the finishing touch on the quality of our
end-user libraries, we need full-time programmers. In an academic
setting, they can be hard to justify, but we have always had dedicate
top-notch engineers at Parietal, our latest hire being the well-known
&lt;a class="reference external" href="http://ogrisel.com/"&gt;Olivier Grisel&lt;/a&gt;. This is where &lt;strong&gt;you&lt;/strong&gt; can come
in.&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="https://team.inria.fr/parietal/research/spatial_patterns/niconnect/"&gt;NiConnect&lt;/a&gt;
is a specific research project in which we are developing leading
algorithmic tools. For this project, we have funding for a full-time
programmer. Someone that will help us make from our understand of how to
process brain images, a software tool that an cognitive science
researcher can use. We have started work on such a software, in the
&lt;a class="reference external" href="http://nilearn.github.io/"&gt;nilearn&lt;/a&gt; project. What we need is someone
who drives the project, and makes sure that the piece fit in together
well. That the code to solve the user’s problem is not our research code,
but a clean and lean library, just like scikit-learn is an elegant
answer to day-to-day machine learning tasks.&lt;/p&gt;
&lt;p&gt;If you want more details, they can be found on the &lt;a class="reference external" href="https://team.inria.fr/parietal/job-offers"&gt;job offer&lt;/a&gt;. This post is to motivate
the job in a personal, that I cannot give in an official posting.&lt;/p&gt;
&lt;div class="section" id="why-take-this-job"&gt;
&lt;h2&gt;Why take this job?&lt;/h2&gt;
&lt;p&gt;I don’t expect some to take this job only because it pays the bill. To be
clear, the kind of person I am looking for has no difficulties finding a
well-payed job elsewhere. So, if you are that person, why would you take
the job.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;To join &lt;a class="reference external" href="https://team.inria.fr/parietal/team-members/"&gt;a great team&lt;/a&gt;
that is focused on finding elegant solutions to hard problems at the
intersection of machine learning, cognitive science, and software.
Choose to work with great people, knowledgeable, passionate, and &lt;a class="reference external" href="https://team.inria.fr/parietal/inria-winter-party-2014/"&gt;fun&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;To work on interesting problems, that matter. They are interesting
because they are challenging but we have the skills to solve them. They
matter because these skills need to be used to make brain research
better.&lt;/li&gt;
&lt;li&gt;To have a boss (&lt;a class="reference external" href="https://github.com/GaelVaroquaux"&gt;me&lt;/a&gt;) that
actually codes and gives you feedback on your code.&lt;/li&gt;
&lt;li&gt;To learn. Data science + Python is &lt;em&gt;the&lt;/em&gt; combination of skills to have.
We have a at Parietal a unique expertise in these. And add to it fine
understanding of algorithms, high performance computing, statistics,
and software quality. You have the perfect lines on a CV.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="what-would-make-me-excited-in-a-resume"&gt;
&lt;h2&gt;What would make me excited in a resume?&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Open source contributions (there is no better coding CV than a github
account).&lt;/li&gt;
&lt;li&gt;Experience in agile-like situations&lt;/li&gt;
&lt;li&gt;A passion for code quality&lt;/li&gt;
&lt;li&gt;Good Python experience&lt;/li&gt;
&lt;li&gt;The unlikely combination of research-like training (eg undergraduate)
and experience in a non academic and non scientific setting (say web
development).&lt;/li&gt;
&lt;li&gt;To know that you care about user experience, about understanding and
solving the user’s problems.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Now if you are interested and feel up for the challenge, read the real
&lt;a class="reference external" href="https://team.inria.fr/parietal/job-offers"&gt;job offer&lt;/a&gt;, and send me
your resume.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="jobs"></category><category term="neuroimaging"></category><category term="python"></category></entry><entry><title>Scikit-learn 0.14 release: features and benchmarks</title><link href="https://gael-varoquaux.info/programming/scikit-learn-014-release-features-and-benchmarks.html" rel="alternate"></link><published>2013-08-08T00:00:00+02:00</published><updated>2013-08-08T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2013-08-08:/programming/scikit-learn-014-release-features-and-benchmarks.html</id><summary type="html">&lt;p&gt;I have tagged and released the scikit-learn 0.14 release yesterday
evening, after more than 6 months of heavy development from the team. I
would like to give a quick overview of the highlights of this release in
terms of features but also in term of performance. Indeed, the
scikit-learn …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have tagged and released the scikit-learn 0.14 release yesterday
evening, after more than 6 months of heavy development from the team. I
would like to give a quick overview of the highlights of this release in
terms of features but also in term of performance. Indeed, the
scikit-learn developers believe that &lt;strong&gt;performance matters&lt;/strong&gt; and strive
to be fast and efficient on fairly datasets.&lt;/p&gt;
&lt;p&gt;I will show in this article on a couple of benchmarks that we have
significant performance improvement and are competitive with the faster
libraries such as the proprietary WiseRF.&lt;/p&gt;
&lt;div class="section" id="prohiminent-new-features"&gt;
&lt;h2&gt;Prohiminent new features&lt;/h2&gt;
&lt;p&gt;Most of the new features of the upcoming release have been mentionned
more in details on &lt;a class="reference external" href="http://peekaboo-vision.blogspot.de/2013/07/scikit-learn-sprint-and-014-release.html"&gt;Andy Mueller’s
blog&lt;/a&gt;.
I am just giving a quick list here for completness (see also the &lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html"&gt;full
list of changes&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Major new estimators&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;AdaBoost&lt;/strong&gt; (by &lt;a class="reference external" href="http://noel.dawe.me"&gt;Noel Dawe&lt;/a&gt; and &lt;a class="reference external" href="http://www.montefiore.ulg.ac.be/~glouppe/"&gt;Gilles
Louppe&lt;/a&gt;): the classic
boosting algorithm. This implementation can be applied to any
estimator, but uses trees by default.
AdaBoost is a learning strategy that builds from simple learning
strategies by focussing successively on samples that are not well
predicted. Typically, the simple learners (called &lt;em&gt;weak learners&lt;/em&gt;)
can be rules as simple as taking simple thresholds of observed
quantities (this will form &lt;em&gt;decision stumps&lt;/em&gt;).
&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/ensemble.html#AdaBoost"&gt;Documentation&lt;/a&gt;
—
&lt;a class="reference external" href="http://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_twoclass.html"&gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Biclustering&lt;/strong&gt; (by &lt;a class="reference external" href="http://www.kemaleren.com"&gt;Kemal Eren&lt;/a&gt;):
clustering rows and columns of the data matrices.
Suppose you have access to the shopping list of many consumers,
biclustering would consists is grouping both consumers and product
they bought to come up with stories such as “geeks buy computers and
phones”, where “geeks” would be a group of consumers and “computers”
and “phones” would be groups of products.
&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/biclustering.html"&gt;Documentation&lt;/a&gt;
—
&lt;a class="reference external" href="http://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_biclustering.html"&gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Missing value imputation&lt;/strong&gt; (by &lt;a class="reference external" href="http://nicolastr.com/"&gt;Nicolas
Tresegnie&lt;/a&gt;): simple transformer filling
missing data with means or medians.
If your data-acquisition has failures, human or material, you can
easily end up with some descriptors missing for some observations. It
would be a pitty to throw away either those observations, or some
descriptors. “Imputation” fills in the blanks with simple strategies.
&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/preprocessing.html#imputation-of-missing-values"&gt;Documentation&lt;/a&gt;
—
&lt;a class="reference external" href="http://scikit-learn.org/stable/auto_examples/imputation.html"&gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RBMs (Restricted Boltzmann Machines)&lt;/strong&gt; (by &lt;a class="reference external" href="http://ynd.github.io/"&gt;Yann
Dauphin&lt;/a&gt;): a neural network model useful
for unsupervised learning of features.
Restricted Boltzmann machines learn a set of hidden (latent) factors
that have, for each observation, a probability to be activated or
not. These activations are found so that they explain the data well,
when combined across all the hidden factors with connection weights.
Typically, they form a new feature set that can be useful in a
prediction task.
&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/neural_networks.html#restricted-boltzmann-Machines"&gt;Documentation&lt;/a&gt;
—
&lt;a class="reference external" href="http://scikit-learn.org/stable/auto_examples/plot_rbm_logistic_classification.html"&gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RandomizedSearchCV&lt;/strong&gt; (by &lt;a class="reference external" href="http://peekaboo-vision.blogspot.com"&gt;Andreas
Mueller&lt;/a&gt;): setting
meta-parameters on estimators using a randomized parameter
exploration rather than a grid, as in a grid-search.
A CV (cross-validated) meta-estimator sets parameters of an
estimator by maximizing their cross-validated prediction scores. This
entails fitting the estimator for each parameter value tried. The
randomized-search explores the parameter space randomly, avoiding the
exponential growth in number of points to fit of the grid search.
&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/grid_search.html#randomized-parameter-optimization"&gt;Documentation&lt;/a&gt;
—
&lt;a class="reference external" href="http://scikit-learn.org/stable/auto_examples/randomized_search.html"&gt;Example&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Infrastucture work&lt;/strong&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;New wesbite&lt;/strong&gt; (mostly by &lt;a class="reference external" href="http://www.montefiore.ulg.ac.be/~glouppe/"&gt;Gilles
Louppe&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/nellev"&gt;Nelle
Varoquaux&lt;/a&gt;, Vincent Michel and &lt;a class="reference external" href="http://peekaboo-vision.blogspot.com"&gt;Andreas
Mueller&lt;/a&gt;). The redesign of
the website had two objectives: &lt;em&gt;i)&lt;/em&gt; unclutter the pages to help
prioritize information, &lt;em&gt;ii)&lt;/em&gt; make it easier for users to find the
stable documentation, if they follow an external link to a
documentation of previous releases. I think that it also looks
prettier &lt;em&gt;:)&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python 3 support&lt;/strong&gt; (&lt;a class="reference external" href="https://github.com/justinvf"&gt;Justin
Vincent&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/larsmans"&gt;Lars
Buitinck&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/smoitra87"&gt;Subhodeep
Moitra&lt;/a&gt; and &lt;a class="reference external" href="http://twitter.com/ogrisel"&gt;Olivier
Grisel&lt;/a&gt;). As a side note, under Python
3.3, on Windows, we have found that &lt;em&gt;np.load&lt;/em&gt; can trigger segfaults,
which means our test suite crashes. The tests not relying on
&lt;em&gt;np.load&lt;/em&gt; pass.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="major-api-changes"&gt;
&lt;h2&gt;Major API changes&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;The scoring parameter&lt;/strong&gt; One of the benefits of scikit-learn over
other learning packages is that it can set parameters to maximizing a
prediction score. However, the prediction that one would want to
optimize might depend on the application. Also, some scores can only
be computed with specific estimators, for instance because they
require probabilistic prediction. &lt;a class="reference external" href="http://peekaboo-vision.blogspot.com"&gt;Andreas
Mueller&lt;/a&gt; and &lt;a class="reference external" href="https://github.com/larsmans"&gt;Lars
Buitinck&lt;/a&gt; came up with &lt;a class="reference external" href="http://scikit-learn.org/dev/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules"&gt;a new
API&lt;/a&gt;
to specifies the scoring strategy that is versatile and hides
complexity from the user. This replaces the &lt;em&gt;score_func&lt;/em&gt; argument.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;*sklearn.test()*&lt;/strong&gt; is deprecated and will not run the test suite.
Please use &lt;em&gt;nosetests sklearn&lt;/em&gt; from the command line.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The full list of API changes can be found on the &lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html"&gt;change
log&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="performance-improvements"&gt;
&lt;h2&gt;Performance improvements&lt;/h2&gt;
&lt;p&gt;Many part of the codebase got speed-ups, with a focus on making
&lt;strong&gt;scikit-learn more scalable for bigger data&lt;/strong&gt;.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;The trees (random forests and extra-trees) were massively sped up by
&lt;a class="reference external" href="http://www.montefiore.ulg.ac.be/~glouppe/"&gt;Gilles Louppe&lt;/a&gt;,
bringing them to par with the fastest libraries (see benchmarks
below)&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.astro.washington.edu/users/vanderplas/"&gt;Jake
Vanderplas&lt;/a&gt;
improved the BallTree and implemented fast KDTrees for
nearest-neighbor search (benchmarks below).&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://github.com/cleverless"&gt;“cleverless”&lt;/a&gt; made the DBSCAN
implementation scale to a large number of samples by relying on
KDTree and BallTree for neighbor search.&lt;/li&gt;
&lt;li&gt;KMeans much faster on sparse data (&lt;a class="reference external" href="https://github.com/larsmans"&gt;Lars
Buitinck&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;For text vectorization: much faster CountVectorizer and
TfidVectorizer with less memory consumption (Jochen Wersdorfer and
Roman Sinayev)&lt;/li&gt;
&lt;li&gt;Out-of-core learning for discrete naive Bayes classifiers by &lt;a class="reference external" href="http://twitter.com/ogrisel"&gt;Olivier
Grisel&lt;/a&gt;. Estimators that implement a
&lt;em&gt;partial_fit&lt;/em&gt; method can be used to fit the model with an
out-of-core strategy, as illustrated by the &lt;a class="reference external" href="http://scikit-learn.org/dev/auto_examples/applications/plot_out_of_core_classification.html"&gt;out-of-core
classification
example&lt;/a&gt;.
These settings are well suited to very big data.&lt;/li&gt;
&lt;li&gt;FastICA: less memory consumptions and slightly faster code (&lt;a class="reference external" href="https://github.com/dengemann"&gt;Denis
Engemann&lt;/a&gt; and &lt;a class="reference external" href="http://alexandre.gramfort.net"&gt;Alexandre
Gramfort&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Faster IsotonicRegression (&lt;a class="reference external" href="https://github.com/nellev"&gt;Nelle
Varoquaux&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;OrthogonalMatchingPursuitCV by &lt;a class="reference external" href="http://alexandre.gramfort.net"&gt;Alexandre
Gramfort&lt;/a&gt; and &lt;a class="reference external" href="http://vene.ro"&gt;Vlad
Niculae&lt;/a&gt;: while strictly-speaking not a speedup of
a existing estimator, this new estimator means that OMP parameters
can be set much faster.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="we-are-faster-lies-damn-lies-and-benchmarks"&gt;
&lt;h2&gt;We are faster: lies, damn lies and benchmarks&lt;/h2&gt;
&lt;blockquote class="epigraph"&gt;
&lt;p&gt;“There are three kinds of lies: lies, damned lies and statistics.” —&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Mark Twain’s Own Autobiography: The Chapters from the North
American Review&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I claim we have gotten faster at certain things. Other libraries, such
as &lt;a class="reference external" href="http://docs.wise.io/"&gt;WiseRf&lt;/a&gt;, have performance claims compared
to us. It turns out that benching statistical learning code is very
hard, because speed depends a lot on the properties of the data.&lt;/p&gt;
&lt;div class="section" id="fast-neighbor-searches-good-kdtrees-beat-balltrees"&gt;
&lt;h3&gt;Fast neighbor searches: good KDTrees beat BallTrees&lt;/h3&gt;
&lt;p&gt;A good example of interplay between properties of the data and
computational speed is the nearest neighbor search. In general, finding
the nearest neighbor to a point out of &lt;em&gt;n&lt;/em&gt; other points will cost you
&lt;em&gt;n&lt;/em&gt; operations, as you have to compute the distance to each of these
points. However, building a tree-like data structure ahead of time can
make this query cost only &lt;em&gt;log n&lt;/em&gt;. If these points are in 1D, &lt;em&gt;ie&lt;/em&gt;
simple scalars, this would be achieve by sorting them. In higher
dimensions that can be achieved by building a &lt;em&gt;KDTree&lt;/em&gt;, made of planes
dividing the space in half-spaces, or a &lt;em&gt;BallTree&lt;/em&gt;, made of nested
balls.&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;img alt="" src="http://www.astroml.org/_images/fig_kdtree_example_1.png" style="width: 60%;" /&gt;
&lt;p class="caption"&gt;&lt;strong&gt;KD Tree&lt;/strong&gt; Image from &lt;a class="reference external" href="http://www.astroml.org/index.html"&gt;AstroML’s documentation&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="figure align-center"&gt;
&lt;img alt="" src="http://www.astroml.org/_images/fig_balltree_example_1.png" style="width: 60%;" /&gt;
&lt;p class="caption"&gt;&lt;strong&gt;Ball tree&lt;/strong&gt; Image from &lt;a class="reference external" href="http://www.astroml.org/index.html"&gt;AstroML’s documentation&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Popular wisdom in machine learning is that in high dimensions, BallTrees
scale better than KDTrees. This is explained by the fact that as the
dimensionality grows, the number of planes required to break up the
space grows too. On the contrary, if the data has structure, BallTrees
can more efficiently cover this structure. I have benched scikit-learn’s
KDTree and BallTree, as well as scipy’s KDTree, which employs a simpler
tree-building strategy, on a variety of datasets, both real-life and
artificial. Below if a summary plot giving relative performance of
neighbor search&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;img alt="" src="https://gael-varoquaux.info/programming/attachments/sklearn_0.14.X_speed/nn_trees.png" style="width: 60%;" /&gt;
&lt;p class="caption"&gt;&lt;em&gt;n&lt;/em&gt; is the number of data points, and &lt;em&gt;p&lt;/em&gt; the dimensionality.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;We can see that no approach win on all counts. That said, it came to a
surprise to me to see that even in high dimension, &lt;strong&gt;scikit-learn’s
KDTree outperformed the BallTrees&lt;/strong&gt;. This is explained because these
datasets do not display a heavily-structured low ambient dimension. On
highly-structured synthetic data, the benefit of BallTree can clearly
stand out, as shown by Jake
&lt;a class="reference external" href="http://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python"&gt;here&lt;/a&gt;.
However, on most dataset people encounter, it seems that this is not the
case. Note also that &lt;strong&gt;scikit-learn’s KDTree tend to scale better in
high dimension than scipy’s&lt;/strong&gt;. This is due to the more elaborate choice
of cutting planes. Note that it also has a cost, and may backfire, as on
some datasets scikit-learn is slower than scipy.&lt;/p&gt;
&lt;p&gt;Overall, the new KDTree in scikit-learn seem to be giving an excellent
compromise. Congratulations
&lt;a class="reference external" href="http://www.astro.washington.edu/users/vanderplas/"&gt;Jake&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="dbscan-is-faster-with-trees"&gt;
&lt;h3&gt;DBSCAN is faster with trees&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/clustering.html#dbscan"&gt;DBSCAN&lt;/a&gt;
is a clustering algorithm that relies heavily on the local neighborhood
structure. The implementation in scikit-learn 0.13 computed the complete
&lt;em&gt;n&lt;/em&gt; by &lt;em&gt;n&lt;/em&gt; matrix of distance between observations, which means that if
you had a lot of data, you would blow your memory. In the 0.14 release,
DBSCAN uses the BallTree, and as a result scales to much larger datasets
and brings speed benefits. Here is a comparison between 0.13 and 0.14
implementations (I couldn’t put data as large as I wanted because the
0.13 code would blow):&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="53%" /&gt;
&lt;col width="23%" /&gt;
&lt;col width="24%" /&gt;
&lt;/colgroup&gt;
&lt;thead valign="bottom"&gt;
&lt;tr&gt;&lt;th class="head"&gt;Dataset&lt;/th&gt;
&lt;th class="head"&gt;time with 0.13&lt;/th&gt;
&lt;th class="head"&gt;time with 0.14&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;“lfw”: 13233 samples, 5 features&lt;/td&gt;
&lt;td&gt;6.57 seconds&lt;/td&gt;
&lt;td&gt;3.59 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;“make_blobs”: 30000, with 10 features&lt;/td&gt;
&lt;td&gt;33.50 seconds&lt;/td&gt;
&lt;td&gt;12.87 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Importantly, the scaling is different: while the 0.13 code scales as &lt;em&gt;n
^ 2&lt;/em&gt;, the 0.14 code scales as &lt;em&gt;n log n&lt;/em&gt;. This means that the benefit is
bigger for large dataset.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="scikit-learn-0-14-s-random-forests-are-fast"&gt;
&lt;h3&gt;Scikit-learn 0.14’s random forests are fast&lt;/h3&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.montefiore.ulg.ac.be/~glouppe/"&gt;Gilles Louppe&lt;/a&gt; has made
the random forests significantly faster in the 0.14 release. Let us
bench them in comparison with WiseIO’s
&lt;a class="reference external" href="http://docs.wise.io/"&gt;WiseRf&lt;/a&gt;, a proprietary package that only does
random forest and for which the main selling point is that it is
significantly than scikit-learn. However, let us also bench
&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees"&gt;ExtraTrees&lt;/a&gt;,
a tree-based model that is very similar to random forests, but that in
our experience can be implemented a bit faster, and tends to work
better.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On the digits dataset (1797 samples, 641 features):&lt;/strong&gt;&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="33%" /&gt;
&lt;col width="19%" /&gt;
&lt;col width="17%" /&gt;
&lt;col width="31%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;Forest implementation&lt;/td&gt;
&lt;td&gt;train time&lt;/td&gt;
&lt;td&gt;test time&lt;/td&gt;
&lt;td&gt;prediction accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sklearn ExtraTrees&lt;/td&gt;
&lt;td&gt;2.641s&lt;/td&gt;
&lt;td&gt;0.082s&lt;/td&gt;
&lt;td&gt;0.986&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sklearn RandomForest&lt;/td&gt;
&lt;td&gt;5.074s&lt;/td&gt;
&lt;td&gt;0.088s&lt;/td&gt;
&lt;td&gt;0.981&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;WiseRF&lt;/td&gt;
&lt;td&gt;5.665s&lt;/td&gt;
&lt;td&gt;0.108s&lt;/td&gt;
&lt;td&gt;0.979&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;So we see that on a mid-sized dataset, scikit-learn is faster than
WiseRF, and ExtraTrees is twice as fast as RandomForest, for better
results.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;On the MNIST dataset (70000 samples, 784 features):&lt;/strong&gt;&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="33%" /&gt;
&lt;col width="19%" /&gt;
&lt;col width="17%" /&gt;
&lt;col width="31%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;Forest implementation&lt;/td&gt;
&lt;td&gt;train time&lt;/td&gt;
&lt;td&gt;test time&lt;/td&gt;
&lt;td&gt;prediction accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sklearn ExtraTrees&lt;/td&gt;
&lt;td&gt;1378.141s&lt;/td&gt;
&lt;td&gt;4.768s&lt;/td&gt;
&lt;td&gt;0.976&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sklearn RandomForest&lt;/td&gt;
&lt;td&gt;1639.866s&lt;/td&gt;
&lt;td&gt;4.132s&lt;/td&gt;
&lt;td&gt;0.972&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;WiseRF&lt;/td&gt;
&lt;td&gt;1102.465s&lt;/td&gt;
&lt;td&gt;14.542s&lt;/td&gt;
&lt;td&gt;0.972&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;On a big dataset, WiseRF takes the lead, but not by a large factor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Using 2 CPUs (n_jobs=2) on the digits dataset:&lt;/strong&gt;&lt;/p&gt;
&lt;table border="1" class="docutils"&gt;
&lt;colgroup&gt;
&lt;col width="33%" /&gt;
&lt;col width="19%" /&gt;
&lt;col width="17%" /&gt;
&lt;col width="31%" /&gt;
&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td&gt;Forest implementation&lt;/td&gt;
&lt;td&gt;train time&lt;/td&gt;
&lt;td&gt;test time&lt;/td&gt;
&lt;td&gt;prediction accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sklearn ExtraTrees&lt;/td&gt;
&lt;td&gt;4.874s&lt;/td&gt;
&lt;td&gt;1.478s&lt;/td&gt;
&lt;td&gt;0.986&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;Sklearn RandomForest&lt;/td&gt;
&lt;td&gt;5.716s&lt;/td&gt;
&lt;td&gt;1.349s&lt;/td&gt;
&lt;td&gt;0.978&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;WiseRF&lt;/td&gt;
&lt;td&gt;3.264s&lt;/td&gt;
&lt;td&gt;0.104s&lt;/td&gt;
&lt;td&gt;0.979&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Both scikit-learn and WiseRF can use several CPUs. However, the Python
parallel execution model via multiple processes has an overhead in term
of computing time and of memory usage. The internals of WiseRF are coded
in C++, and thus it is not limited by this overhead. Also, because of
the memory duplication with multiples processes in scikit-learn, I could
not run it on MNIST with 2 jobs. Next release will address these issues,
partly by using memmapped arrays to share memory between processes.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="we-make-good-use-of-funding-the-paris-sprint"&gt;
&lt;h2&gt;We make good use of funding: the Paris sprint&lt;/h2&gt;
&lt;p&gt;A couple of weeks ago, we had a coding sprint in Paris. We were able to
bring in a lot of core developers from all over Europe thanks to our
sponsors: &lt;a class="reference external" href="http://www.frs-fnrs.be/%20"&gt;FNRS&lt;/a&gt;,
&lt;a class="reference external" href="http://www.afpy.org"&gt;AFPy&lt;/a&gt;, &lt;a class="reference external" href="http://www.telecom-paristech.fr/"&gt;Telecom
Paristech&lt;/a&gt;, and &lt;a class="reference external" href="http://www.svi.cnrs-bellevue.fr"&gt;Saint-Gobain
Recherche&lt;/a&gt;. The total budget,
including accommodation and travel, was a couple thousand euros, thanks
to &lt;a class="reference external" href="http://www.telecom-paristech.fr/"&gt;Telecom Paristech&lt;/a&gt; and
&lt;a class="reference external" href="http://www.tinyclues.com"&gt;tinyclues&lt;/a&gt; helping us with accommodation
and hosting the sprint.&lt;/p&gt;
&lt;p&gt;The productivity of such a sprint is huge, both because we get together
and work efficiently, but also because we get back home and keep working
(I have been sleep deprived because of late-night hacking ever since the
sprint). As an illustration, here is the diagram of commits as can be
seen on Github. The huge spike correspond to the second international
sprint: Paris 2013.&lt;/p&gt;
&lt;div class="figure align-center"&gt;
&lt;img alt="" src="https://gael-varoquaux.info/programming/attachments/sklearn_0.14.X_speed/commit_graph.png" style="width: 100%;" /&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;We now have a “donate” button&lt;/strong&gt; on the
&lt;a class="reference external" href="http://scikit-learn.org/stable"&gt;website&lt;/a&gt;. I can assure you that
your donations are well spent and turned into code.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="machine learning"></category></entry><entry><title>RIP John Hunter: the loss of a great man</title><link href="https://gael-varoquaux.info/programming/rip-john-hunter-the-loss-of-a-great-man.html" rel="alternate"></link><published>2012-08-30T10:21:00+02:00</published><updated>2012-08-30T10:21:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-08-30:/programming/rip-john-hunter-the-loss-of-a-great-man.html</id><summary type="html">&lt;p&gt;John Hunter, the author of &lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt; passed away yesterday after a
short battle against cancer. John gave the keynote at the scipy 2012
conference a few weeks ago, and was diagnosed with cancer just on his
return from the conference. It is a shock to me that that a friend …&lt;/p&gt;</summary><content type="html">&lt;p&gt;John Hunter, the author of &lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt; passed away yesterday after a
short battle against cancer. John gave the keynote at the scipy 2012
conference a few weeks ago, and was diagnosed with cancer just on his
return from the conference. It is a shock to me that that a friend can
disappear so quickly. Please read the &lt;a class="reference external" href="https://groups.google.com/forum/#!msg/pydata/FpwXp3sX6N8/mxopkZ1PkBQJ"&gt;announcement&lt;/a&gt; of &lt;a class="reference external" href="http://fperez.org/"&gt;Fernando
Perez&lt;/a&gt;, who supported John in the last weeks to learn more about John.&lt;/p&gt;
&lt;div class="section" id="a-man-who-gave-a-lot-not-asking-for-anything-in-return"&gt;
&lt;h2&gt;A man who gave a lot, not asking for anything in return&lt;/h2&gt;
&lt;p&gt;Many have benefited from the silent efforts of John, and are not fully
aware of how he generously invested his time and talent for the benefit
of others. Matplotlib, the Python plotting library that he created in
2002, has propelled Python as a major tool for scientific research and
engineering. The impact of John’s efforts go well beyond Matplotlib.
Early on, John had the vision of Python as a interactive scientific
environment. He promoted this vision pairing with Fernando Perez to
develop the fantastic &lt;a class="reference external" href="http://ipython.org/"&gt;ipython&lt;/a&gt;/&lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt; tandem, solving many
technical challenges. But he also invested a lot of energy in teaching
workshops that helped change the way people compute, as well as writing
didactic documentation and articles. He was a friendly, active, leader
of an online community, open and helpful to newcomers.&lt;/p&gt;
&lt;p&gt;As Travis Oliphant said on John’s numfocus &lt;a class="reference external" href="http://numfocus.org/johnhunter/"&gt;memorial webpage&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
Those who contribute much to open source, as John did, do so at the
expense of something - often it is time with family.&lt;/blockquote&gt;
&lt;p&gt;I cannot stress how true this is. The entire open source software, that
nowadays supports our economy, our education, and our research, is built
on the shoulders of a fairly small number of generous people that spend
their energy in making better software, rather than personal wealth.&lt;/p&gt;
&lt;p&gt;John was a humble man. He did not have a blog, or a twitter account, did
not seek fame or money. For this reason I feel that his contributions
are unknown and undervalued by many. In my eyes, he is an unknown
soldier of our modern times. I hope that I am not being too emphatic,
but this is how I feel.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;John passed away at 44, leaving behind a wife and 3 daughters. Please
do consider supporting them:&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote class="last"&gt;
&lt;a class="reference external" href="http://numfocus.org/johnhunter"&gt;http://numfocus.org/johnhunter&lt;/a&gt;&lt;/blockquote&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="personnal"></category><category term="community"></category></entry><entry><title>A journal promoting high-quality research code: dream and reality</title><link href="https://gael-varoquaux.info/programming/a-journal-promoting-high-quality-research-code-dream-and-reality.html" rel="alternate"></link><published>2012-06-04T21:39:00+02:00</published><updated>2012-06-04T21:39:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-06-04:/programming/a-journal-promoting-high-quality-research-code-dream-and-reality.html</id><summary type="html">&lt;p&gt;&lt;a class="reference external" href="http://www.openresearchcomputation.com/"&gt;Open research computation (ORC)&lt;/a&gt; was an attempt to create a scientific
publication promoting &lt;strong&gt;high-quality and open source scientific code&lt;/strong&gt;.
The project went public in falls 2010, but last month, facing the low
volume of submission, the editorial board &lt;a class="reference external" href="http://blogs.openaccesscentral.com/blogs/bmcblog/entry/open_research_computation_thematic_series"&gt;chose to reorient it&lt;/a&gt; as a
special track of an existing journal …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="http://www.openresearchcomputation.com/"&gt;Open research computation (ORC)&lt;/a&gt; was an attempt to create a scientific
publication promoting &lt;strong&gt;high-quality and open source scientific code&lt;/strong&gt;.
The project went public in falls 2010, but last month, facing the low
volume of submission, the editorial board &lt;a class="reference external" href="http://blogs.openaccesscentral.com/blogs/bmcblog/entry/open_research_computation_thematic_series"&gt;chose to reorient it&lt;/a&gt; as a
special track of an existing journal.&lt;/p&gt;
&lt;p&gt;The challenges that we face are discussed in our editorial:&lt;/p&gt;
&lt;blockquote&gt;
&lt;a class="reference external" href="http://www.scfbm.org/content/7/1/2/abstract"&gt;Changing computational research. The challenges ahead.&lt;/a&gt; C Neylon,
J Aerts, CT Brown, D Lemire, J Millman, P Murray-Rust, F Perez, N
Saunders, A Smith, G Varoquaux and E Willighagen, &lt;em&gt;Source Code for
Biology and Medicine&lt;/em&gt; 2012, 7:20&lt;/blockquote&gt;
&lt;p&gt;Here is my own personal take on the rise and fall of this ideal.&lt;/p&gt;
&lt;div class="section" id="my-story-with-orc"&gt;
&lt;h2&gt;My story with ORC&lt;/h2&gt;
&lt;img alt="" class="align-right" src="http://www.rcac.net.au/images/Publications1.jpg" style="width: 40%;" /&gt;
&lt;p&gt;&lt;strong&gt;From pipe dream to journal -&lt;/strong&gt; My involvement with ORC started long
before there was such a thing as ORC. In falls 2008, I had a discussion
with a friend working in the publication industry, telling her how I
believed that the publication system is broken, because it promotes new
results without any interest on whether these can be exported outside
the lab that produced them: &lt;strong&gt;it is currently easier to publish a minor
but novel result than a tool enabling the routine reproduction of
previous results&lt;/strong&gt;. This seemed particularly marked in the scientific
software world, as software tools are becoming central to the scientific
workflow, and cost nothing to duplicate when produced under open-source
license. To my surprise, she took me seriously, and asked me to write my
ideas down in an email that she would forward to her colleagues in the
publication industry.&lt;/p&gt;
&lt;p&gt;Looking back at the email that I send, my concerns were, back then, to
promote:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;quality and openness of scientific software&lt;/li&gt;
&lt;li&gt;basic tools shared across communities&lt;/li&gt;
&lt;li&gt;recognition of software development as a challenging and worthwhile
task in academic research&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Shaping the idea -&lt;/strong&gt;In the year that followed, I had a few
discussions with staff from &lt;a class="reference external" href="http://www.biomedcentral.com"&gt;BioMedCentral&lt;/a&gt;, an open-access publisher
in biology and medicine that was looking into expending in the physics
and math related fields. Eventually, my contact there told me that they
had other similar requests and were launching a journal that would be
lead by Cameron Neylon, a British biophysicist and strong advocate of
openness and reproducibility in science. This was the start of ORC, and
for me the chance to meet other people sharing my concerns, some new and
some &lt;a class="reference external" href="http://fperez.org/"&gt;already&lt;/a&gt; &lt;a class="reference external" href="http://jarrodmillman.com/"&gt;old&lt;/a&gt; &lt;a class="reference external" href="http://ivory.idyll.org"&gt;friends&lt;/a&gt;.&lt;/p&gt;
&lt;div class="figure align-right"&gt;
&lt;img alt="" src="http://www.salinafbc.com/Websites/fbcsalina/images/nerd_computer.gif" style="width: 230px;" /&gt;
&lt;p class="caption"&gt;ORC editor&lt;/p&gt;
&lt;/div&gt;
&lt;div class="figure align-left"&gt;
&lt;img alt="" src="http://researchsupportgroup.files.wordpress.com/2011/11/kayla1.jpg" style="width: 150px;" /&gt;
&lt;p class="caption"&gt;Conventional editor&lt;/p&gt;
&lt;/div&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;strong&gt;Setting up the journal -&lt;/strong&gt;BioMedCentral was instrumental in setting
up the journal project. I quickly learned that, no surprises, a journal
is a product, like anything else, and it must find customers. Here, as
we were launching an open access journal, the customers were authors.
This is where a journal faces a chicken and egg problem: to be
recognised it needs high-visibility publications, but authors will
submit only to journals that they know. The main tool to overcome this
challenge are communication and advocacy. I then realized that these
really weren’t my strong points. Cameron Neylon absolutely shined on
this side, with very enthusiastic &lt;a class="reference external" href="http://cameronneylon.net/blog/open-research-computation-an-ordinary-journal-with-extraordinary-aims/"&gt;communications&lt;/a&gt; and an incredibly
active &lt;a class="reference external" href="https://twitter.com/#!/CameronNeylon"&gt;twitter account&lt;/a&gt;. On my side, I am a slow writer, and I tend to
speak Python code better than English language, which is not a strong
asset to be a journal editor.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wild editorial discussions -&lt;/strong&gt; The discussions in the editorial board
really thrilled me because they were centered on how to set standards to
improve the quality of code published. Looking in my mailbox, I see
discussions about code repositories, software testing, documentation or
licensing issues. This is not that surprising, given that a lot of the
editors where actually contributors to major software projects. It made
me very happy, as I have the feeling that, so far, most committees or
decision makers are clueless about software.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="sand-in-the-gears-the-lack-of-uptake"&gt;
&lt;h2&gt;Sand in the gears: the lack of uptake&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;A false start -&lt;/strong&gt;So ORC was launched late 2010 and we had fantastic
feedback. I had the feeling that people were &lt;a class="reference external" href="http://neuralensemble.blogspot.fr/2010/12/open-research-computation-new-journal.html"&gt;genuinely&lt;/a&gt; &lt;a class="reference external" href="https://twitter.com/vaguery/status/15402390589018112"&gt;excited&lt;/a&gt;
about our program: changing the way computational science worked from
the inside, through the review process. The idea was that we had opened
a pre-submission call, and were waiting for a few good papers to be
submitted to launch the journal. However, it turned out that the papers
were slow to come. It took me a while to realize that there was
something wrong. But slowly we had to face the truth: many people were
excited about the journal, but most were sending their papers elsewhere.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What went wrong? -&lt;/strong&gt;If I really knew what went wrong, I would
probably not be discussing it here, but rather changing the world.
However, I can come up with a few guesses:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Working across communities is harder.&lt;/strong&gt; From the beginning we had
wanted to position the journal across communities, in order to foster
the sharing of tools for a greater good. The challenge is that a
central role of publication is nowadays to provide recognition. It is
much easier to achieve recognition in a given community than across
communities, and authors always preferred submitting their work to a
non-software oriented journal in their field. We couldn’t fight
together the battle for software quality and the battle for
inter-community work.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Setting the bar too high.&lt;/strong&gt; Many felt that the submission
requirements that where too demanding, as expressed on a NeuroImaging
forumn to quote a researcher: &lt;a class="reference external" href="http://www.nitrc.org/forum/message.php?msg_id=3674"&gt;“I think it’s setting the bar
unrealistically high for most neuroimaging software”&lt;/a&gt;. While we had
originally shot for a very high test coverage (probably too high), we
had scaled it back quickly, simply stressing that editors and
reviewers would be looking closely at test coverage, documentation
and ease of installation. That said, the average researcher did not
share our ideals of raising the quality of scientific software.
Trying to ask only for excellent publications in a new and unproven
journal was probably unrealistic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Editors not willing to game the system.&lt;/strong&gt; I have watched a few
journal launches, and it seems to me that a common trick is to line
up articles that are created by the editors and their friends
specifically for the new journal. People come up with &lt;em&gt;opinion
papers&lt;/em&gt;, &lt;em&gt;reviews&lt;/em&gt;, &lt;em&gt;commentaries&lt;/em&gt; that only serve to generate an
identity to the journal. This did not happen for ORC, and I believe
that it is because &lt;a class="reference external" href="http://cameronneylon.net/blog/open-research-computation-an-ordinary-journal-with-extraordinary-aims"&gt;the editors themselves&lt;/a&gt; were not huge fans of
the low signal-to-noise ratio in modern scientific publishing
practice.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="the-times-they-are-a-changing"&gt;
&lt;h2&gt;The times they are a changing&lt;/h2&gt;
&lt;img alt="" class="align-right" src="http://www.pictures88.com/p/success/success_005.jpg" style="width: 35%;" /&gt;
&lt;p&gt;&lt;strong&gt;ORC is dead, long live ORC -&lt;/strong&gt; We did get a few submissions. ORC is
not coming to an end, it is morphing into a special thematic series in
&lt;a class="reference external" href="http://www.scfbm.org/"&gt;source code for biology and medicine&lt;/a&gt;. This solution is not completely
satisfactory, as it pushes what should have been a forum for exposing
good practices and good software into a smaller community. But at least
there is now a venue in which people can publish a paper about software
that they have been improving and maintaining, and not only about a new
algorithm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Changing practices across the board -&lt;/strong&gt; Among the reasons for which we
had a hard time making a breakthrough, is that authors where sending
their software papers to other journals, in particular journals not
specialized on software. While these papers are not getting the
attention of a review and editorial team expert on software development,
as we are setting up with ORC, this is still a good thing. Indeed it
shows that the times are changing and that recognition of software as a
scientific work is improving. I have been impressed to see that many
high profile journals have changed their editorial policies to
specifically accept software papers, or have create tracks dedicated to
software.&lt;/p&gt;
&lt;p&gt;Software is being slowly recognized as a pillar of modern scientific
research. We need to keep pushing to make sure that quality standards
are set and that the open-source scientific software grows into a mature
ecosystem focused on problem solving.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="publishing"></category><category term="science"></category><category term="computational science"></category><category term="programming"></category><category term="python"></category><category term="scientific computing"></category></entry><entry><title>Update on scikit-learn: recent developments for machine learning in Python</title><link href="https://gael-varoquaux.info/programming/update-on-scikit-learn-recent-developments-for-machine-learning-in-python.html" rel="alternate"></link><published>2012-05-09T00:12:00+02:00</published><updated>2012-05-09T00:12:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-05-09:/programming/update-on-scikit-learn-recent-developments-for-machine-learning-in-python.html</id><summary type="html">&lt;p&gt;Yesterday, we released version 0.11 of the &lt;a class="reference external" href="http://scikit-learn"&gt;scikit-learn&lt;/a&gt; toolkit for
machine learning in Python, and there was much rejoincing.&lt;/p&gt;
&lt;div class="section" id="major-features-gained-in-the-last-releases"&gt;
&lt;h2&gt;Major features gained in the last releases&lt;/h2&gt;
&lt;p&gt;In the last 6 months, there have been many things happening with the
scikit-learn. While I do not whish to give an exhaustive …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;Yesterday, we released version 0.11 of the &lt;a class="reference external" href="http://scikit-learn"&gt;scikit-learn&lt;/a&gt; toolkit for
machine learning in Python, and there was much rejoincing.&lt;/p&gt;
&lt;div class="section" id="major-features-gained-in-the-last-releases"&gt;
&lt;h2&gt;Major features gained in the last releases&lt;/h2&gt;
&lt;p&gt;In the last 6 months, there have been many things happening with the
scikit-learn. While I do not whish to give an exhaustive summary of
features added (it can be found &lt;a class="reference external" href="http://scikit-learn.org/stable/whats_new.html"&gt;here&lt;/a&gt;), let me list a few of the
additions that I personnally find exciting.&lt;/p&gt;
&lt;div class="section" id="non-linear-prediction-models"&gt;
&lt;h3&gt;Non-linear prediction models&lt;/h3&gt;
&lt;p&gt;For complex prediction problems where there is no simple model
available, as in computer vision, non-linear models are handy. A good
example of such models are those based on decisions trees and model
averaging. For instance random forests are used in the Kinect to locate
body parts. As they are intrinsically complex, they may need a large
amount of training data. For this reason, they have been implemented in
the scikit-learn with special attention to computational efficiency.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees"&gt;Randomized Forests and extra-trees&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/ensemble.html#gradient-tree-boosting"&gt;Gradient boosted regression trees&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="dealing-with-unlabeled-instances"&gt;
&lt;h3&gt;Dealing with unlabeled instances&lt;/h3&gt;
&lt;p&gt;It is often easy to gather unlabeled observations than labeled
observation. While prediction of a quantity of interest is then harder
or simply impossible, mining this data can be useful.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/label_propagation.html"&gt;Semi-supervised learning&lt;/a&gt;: using unlabeled observations together with
labeled ones for better prediction.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/outlier_detection.html"&gt;Outlier/novelty detection&lt;/a&gt;: detect deviant observations.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/manifold.html"&gt;Manifold learning&lt;/a&gt;: discover a non-linear low-dimensional structure in
the data.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/clustering.html"&gt;Clustering&lt;/a&gt; with &lt;a class="reference external" href="http://scikit-learn.org/stable/modules/clustering.html#mini-batch-k-means"&gt;an algorithm&lt;/a&gt; that can scale to really large
datasets using an online approach: fitting small portions of the data on
after the other (Mini-batch k-means).&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/decomposition.html#dictionarylearning"&gt;Dictionary learning&lt;/a&gt;: learning patterns in the data that represent it
sparsely: each observation is a combination of a small number patterns.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="sparse-models-when-very-few-descriptors-are-relevant"&gt;
&lt;h3&gt;Sparse models: when very few descriptors are relevant&lt;/h3&gt;
&lt;p&gt;In general, finding which descriptors are useful when there are many of
them is like find a needle in a haystack: it is a very hard problem.
However, you know that only a few of these descriptors actually carry
information, you are in a so-called &lt;em&gt;sparse&lt;/em&gt; problem, for specific
approaches can work well.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/linear_model.html#orthogonal-matching-pursuit-omp"&gt;Orthogonal matching pursuit&lt;/a&gt;: a greedy and fast algorithm for very
sparse linear models&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/feature_selection.html#randomized-sparse-models"&gt;Randomized sparsity (randomized Lasso)&lt;/a&gt;: selecting the relevant
descriptors in noisy high-dimensional observations&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.org/stable/modules/generated/sklearn.covariance.GraphLasso.html#sklearn.covariance.GraphLasso"&gt;Sparse inverse covariance&lt;/a&gt;: learning graphs of connectivity from
correlations in the data&lt;/p&gt;
&lt;div class="section" id="getting-developpers-together-the-granada-sprint"&gt;
&lt;h4&gt;Getting developpers together: the Granada sprint&lt;/h4&gt;
&lt;p&gt;
&lt;object width="400" height="300" align="right"&gt;
&lt;embed type="application/x-shockwave-flash" src="http://www.flickr.com/apps/slideshow/show.swf?v=109615" allowfullscreen="true" flashvars="offsite=true⟨=en-us&amp;amp;page_show_url=%2Fsearch%2Fshow%2F%3Fq%3Dscikit-learn%26m%3Dtags%26w%3D66885349%2540N03&amp;amp;page_show_back_url=%2Fsearch%2F%3Fq%3Dscikit-learn%26m%3Dtags%26w%3D66885349%2540N03&amp;amp;method=flickr.photos.search&amp;amp;api_params_str=&amp;amp;api_tags=scikit-learn&amp;amp;api_tag_mode=bool&amp;amp;api_user_id=66885349%40N03&amp;amp;api_safe_search=3&amp;amp;api_content_type=7&amp;amp;api_media=all&amp;amp;api_sort=date-posted-desc&amp;amp;jump_to=&amp;amp;start_index=0" width="400" height="300"&gt;
&lt;/embed&gt;
&lt;/object&gt;
&lt;/p&gt;&lt;p&gt;Of course, such developments happen only because we have a great team of
&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/graphs/contributors"&gt;dedicated coders&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Getting along and working together is a critical part of the project. In
December 2011, we held the first international &lt;a class="reference external" href="http://scikit-learn"&gt;scikit-learn&lt;/a&gt; sprint in
Granada, on the side of the &lt;a class="reference external" href="http://nips.cc"&gt;NIPS conference&lt;/a&gt;. That was a while ago,
and I haven’t found time to blog about it, maybe because I was too busy
merging in the code produced :). Here is a small report from my point of
view. Better late than never.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="participants-from-all-over-the-globe"&gt;
&lt;h2&gt;Participants from all over the globe&lt;/h2&gt;
&lt;p&gt;This sprint was a big deal for us, because for the first time, thanks to
sponsor money, we were able to fly contributors from overseas and meet
the team in person. For the first time I was able to see the faces
behind many of the fantastic people that I knew only from the mailing
list.&lt;/p&gt;
&lt;p&gt;I really think that we must thank our sponsors, &lt;strong&gt;Google&lt;/strong&gt; and
&lt;strong&gt;tinyclues&lt;/strong&gt;, but also The PSF, that is in particular Jesse Noller but
especially &lt;strong&gt;Steve Holden&lt;/strong&gt;, whose help was absolutely instrumental in
getting sponsor money. This money is what made it possible to unite a
good fraction of the team, and it opened the door to great moments of
coding, and more.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="producing-code-lines-and-friendship"&gt;
&lt;h2&gt;Producing code lines and friendship&lt;/h2&gt;
&lt;p&gt;An important aspect of the sprint for me was that I really felt the team
being united. Granada is a great city and we spent fantastic moments
together. Now when I review code, I can often put a face on the author
of that code and remember a walk below the Alhambra or an evening in a
bar. I am sure it helps reviewing code!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="was-it-worth-the-money"&gt;
&lt;h2&gt;Was it worth the money?&lt;/h2&gt;
&lt;img alt="" src="attachments/skl_activity.png" style="width: 90%;" /&gt;
&lt;p&gt;I really appreciate that the sponsors did not ask for specific returns on
investment beyond acknowledgments, but I think that it is useful for us
to ask the question: was it worth the money? After all, we got around
$5000, and that’s a lot of money. First of all, as a side effect of the
sprint, people who had invested a huge amount of time in a machine
learning toolkit without asking anything in return got help to go to a
major machine learning conference.&lt;/p&gt;
&lt;p&gt;But was there a return over investment in terms of code? If you look at
the number of lines of code modified weekly (figure on the right), there
is a big spike in December 2011. That’s our sprint! Importantly, if you
look at the months following the sprint, there still is a lot of activity
in the months following the sprint. This is actually unusual, as the
active developments happen more in the summer break than during the
winter, as our developpers are busy working on papers or teaching.&lt;/p&gt;
&lt;p&gt;The explaination is simple: we where thrilled by the sprint. Overall, it
was incredibly beneficial to the project. I am looking forward to the
next ones.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="machine learning"></category><category term="python"></category><category term="science"></category><category term="scikit-learn"></category><category term="sprint"></category></entry><entry><title>3 Google summer of code for scikit-learn and more…</title><link href="https://gael-varoquaux.info/programming/3-google-summer-of-code-for-scikit-learn-and-more.html" rel="alternate"></link><published>2012-04-23T22:25:00+02:00</published><updated>2012-04-23T22:25:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-04-23:/programming/3-google-summer-of-code-for-scikit-learn-and-more.html</id><summary type="html">&lt;p&gt;The &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt; got 3 students accepted for the Google summer of
code.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://ibayer.blogspot.fr/"&gt;Imanuel Bayer&lt;/a&gt; will work on making our sparse linear models, for
regression and classification, faster. His proposal &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/ibayer/11001"&gt;Optimizing
sparse linear models using coordinate descent and strong rules&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.davidmarek.cz/"&gt;David Marek&lt;/a&gt; will implement multi-layer perceptrons for the scikit.
His proposal …&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;The &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt; got 3 students accepted for the Google summer of
code.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://ibayer.blogspot.fr/"&gt;Imanuel Bayer&lt;/a&gt; will work on making our sparse linear models, for
regression and classification, faster. His proposal &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/ibayer/11001"&gt;Optimizing
sparse linear models using coordinate descent and strong rules&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.davidmarek.cz/"&gt;David Marek&lt;/a&gt; will implement multi-layer perceptrons for the scikit.
His proposal: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/h4wk_cz/24001"&gt;Multilayer Perceptron&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://blog.vene.ro/"&gt;Vlad Niculae&lt;/a&gt; will work on speeding up the library in general,
catching all the low hanging fruits, and the ones a bit higher. His
proposal: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/vladn/26002"&gt;Need for scikit-learn speed&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition, other related projects have exciting projects, for instance
&lt;a class="reference external" href="http://statsmodels.sourceforge.net/"&gt;**statsmodels**&lt;/a&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Divyanshu Bandil: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/divyanshu/34002"&gt;Extension of Linear to Non Linear Models in
Statsmodels Python module&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Alexandre Crayssac: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/alexandreyc/8001"&gt;estimating system of equations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Justin Grana: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/j_grana/8001"&gt;empirical Likelihood in Statsmodels&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Georgi Panterov: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/gpanterov/7001"&gt;nonparametric estimation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and &lt;a class="reference external" href="http://www.cython.org"&gt;Cython&lt;/a&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Philip Herron: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/redbrain1123/28002"&gt;pxd generation using gcc-python-plugin&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Mark Florisson: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/markflorisson88/30002"&gt;Fast Numerical Computing with Cython&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;finally, in &lt;a class="reference external" href="http://pandas.pydata.org/"&gt;Pandas&lt;/a&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Vytautas Jancauskas: &lt;a class="reference external" href="http://www.google-melange.com/gsoc/project/google/gsoc2012/bucket_brigade/42002"&gt;Plots in pandas&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Congratulations to all of the students. This is going to be an exciting
summer.&lt;/p&gt;
</content><category term="programming"></category><category term="machine learning"></category><category term="programming"></category><category term="scipy"></category><category term="scikit-learn"></category></entry><entry><title>Want features? Just code</title><link href="https://gael-varoquaux.info/programming/want-features-just-code.html" rel="alternate"></link><published>2012-03-08T22:46:00+01:00</published><updated>2012-03-08T22:46:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-03-08:/programming/want-features-just-code.html</id><summary type="html">&lt;p&gt;Somebody just sent an email on a user’s mailing list for an open-source
scientific package entitled &lt;strong&gt;“Feature foo: why is package bar
not&amp;nbsp;up to the task?”&lt;/strong&gt;. To quote him:&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
Is there ANY plan for having such a module in &lt;em&gt;package bar&lt;/em&gt;?? I
think&amp;nbsp;(personally) that this is a …&lt;/blockquote&gt;</summary><content type="html">&lt;p&gt;Somebody just sent an email on a user’s mailing list for an open-source
scientific package entitled &lt;strong&gt;“Feature foo: why is package bar
not&amp;nbsp;up to the task?”&lt;/strong&gt;. To quote him:&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
Is there ANY plan for having such a module in &lt;em&gt;package bar&lt;/em&gt;?? I
think&amp;nbsp;(personally) that this is a MUST DO. This is typically the
type of&amp;nbsp;routines that I hear people use in e.g., idl etc. If this
could be an&amp;nbsp;optimised, fast (and easy to use) routine, all the
better.&lt;/blockquote&gt;
&lt;p&gt;As some one who spends a fair amount of time working on open
source&amp;nbsp;software I hear such remarks quite often. I am finding it harder
and harder not to&amp;nbsp;react negatively to these emails. Now I cannot
consider myself as a&amp;nbsp;contributor to &lt;em&gt;package bar&lt;/em&gt;, and thus I can claim
that I am not taking your&amp;nbsp;comment personally.&lt;/p&gt;
&lt;p&gt;Why aren’t package not up to the task? Will, the answer is quite
simple:&amp;nbsp;because they are developed by volunteers that do it on their
spare time, late&amp;nbsp;at night too often, or companies that put some of their
benefits in open&amp;nbsp;source rather in locking down a market. 90% of the time
the reason the&amp;nbsp;feature isn’t as good as you would want it is because of
lack of time.&lt;/p&gt;
&lt;p&gt;I personally find that suggesting that somebody else should put more
of&amp;nbsp;the time and money they are already giving away in improving a
feature&amp;nbsp;that you need is almost insulting.&lt;/p&gt;
&lt;p&gt;I am aware that people do not realize how small the group of people
that&amp;nbsp;develop and maintain their toys is. Borrowing the figure below from
&lt;a class="reference external" href="http://www.euroscipy.org/file/6459?vid=download"&gt;Fernando Perez’s talk&amp;nbsp;at Euroscipy&lt;/a&gt;,&amp;nbsp;the number of people that do 90%
of the grunt work to get the core&amp;nbsp;scientific Python ecosystem going is
around two handfuls:&lt;/p&gt;
&lt;img alt="" src="attachments/fperez_euroscipy_2011_contributors.jpg" style="width: 70%;" /&gt;
&lt;p&gt;I’d like to think that this recruitment problem is a lack of skill set:
users that have the&amp;nbsp;ability to contribute are just too rare. This is not
entirely true, there&amp;nbsp;are scores of skilled people on the mailing lists.
The poster himself mentioned his email that he was developing a package.
I personally started contribution not knowing anything about software
development. I struggled, I did the grunt work like maintaining wikis,
answer questions on mailing list, and writing documentation. These
easier tasks were useful to the community, I think, but must
importantly, they taught me a lot because I was investing energy in
them.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;&lt;strong&gt;If people want things to improve, they will have more&amp;nbsp;successes
sending in pull requests than messages on mailing list that&amp;nbsp;sound
condescending to my ears.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;I hope that I haven’t overreacted too badly :), that email turned me on.
That said, I am not sure that people realize how much they owe to the
open source developers breaking their backs on the packages they use.&lt;/p&gt;
&lt;img alt="" src="attachments/fperez_euroscipy_2011_i_want_you.jpg" style="width: 50%;" /&gt;
&lt;p&gt;All credit for images goes to &lt;a class="reference external" href="http://fperez.org/"&gt;Fernando Perez&lt;/a&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="community"></category></entry><entry><title>Book review: NumPy 1.5 Beginner’s guide</title><link href="https://gael-varoquaux.info/programming/book-review-numpy-15-beginners-guide.html" rel="alternate"></link><published>2012-01-10T08:57:00+01:00</published><updated>2012-01-10T08:57:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-01-10:/programming/book-review-numpy-15-beginners-guide.html</id><summary type="html">&lt;p&gt;Packt publishing sent me a copy of &lt;a class="reference external" href="http://www.packtpub.com/numpy-1-5-using-real-world-examples-beginners-guide/Book"&gt;NumPy 1.5 Beginner’s guide&lt;/a&gt; by Ivan
Idris.&lt;/p&gt;
&lt;p&gt;The book actually covers more than only &lt;a class="reference external" href="http://numpy.scipy.org/"&gt;numpy&lt;/a&gt;: it is a full
introduction to numerical computing with Python. The &lt;a class="reference external" href="http://www.packtpub.com/toc/numpy-15-beginners-guide-table-contents"&gt;table of
contents&lt;/a&gt; is the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;NumPy Quick Start&lt;/li&gt;
&lt;li&gt;Beginning with NumPy Fundamentals&lt;/li&gt;
&lt;li&gt;Get into …&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;Packt publishing sent me a copy of &lt;a class="reference external" href="http://www.packtpub.com/numpy-1-5-using-real-world-examples-beginners-guide/Book"&gt;NumPy 1.5 Beginner’s guide&lt;/a&gt; by Ivan
Idris.&lt;/p&gt;
&lt;p&gt;The book actually covers more than only &lt;a class="reference external" href="http://numpy.scipy.org/"&gt;numpy&lt;/a&gt;: it is a full
introduction to numerical computing with Python. The &lt;a class="reference external" href="http://www.packtpub.com/toc/numpy-15-beginners-guide-table-contents"&gt;table of
contents&lt;/a&gt; is the following:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;NumPy Quick Start&lt;/li&gt;
&lt;li&gt;Beginning with NumPy Fundamentals&lt;/li&gt;
&lt;li&gt;Get into Terms with Commonly Used Functions&lt;/li&gt;
&lt;li&gt;Convenience Functions for Your Convenience&lt;/li&gt;
&lt;li&gt;Working with Matrices and ufuncs&lt;/li&gt;
&lt;li&gt;Move Further with NumPy Modules&lt;/li&gt;
&lt;li&gt;Peeking Into Special Routines&lt;/li&gt;
&lt;li&gt;Assure Quality with Testing&lt;/li&gt;
&lt;li&gt;Plotting with Matplotlib&lt;/li&gt;
&lt;li&gt;When NumPy is Not Enough: SciPy and Beyond&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The book is easy to read, as it requires no specific expertise other
than knowing basic Python programming. It is full of examples and
exercises, which is really great for learning. I find the style of the
author, Ivan Idris, particularly amusing and relaxing, engaging the
reader with questions, challenges, or even jokes (&lt;em&gt;“Have a go hero”&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;With regards to the formatting and the print, the book is written in
large fonts, with sectioning information, tips and exercises clearly
standing out.&lt;/p&gt;
&lt;p&gt;It is full of practical information, such as how to install the
software, or where to get help. Finally, One thing that I appreciated,
is that the examples are typed in &lt;a class="reference external" href="http://ipython.org/"&gt;IPython&lt;/a&gt;. Each time I teach, I like
to use IPython, because it is full of features to help plotting,
debugging and profiling numerical code. The book even has a little
introduction to some useful IPython features.&lt;/p&gt;
&lt;p&gt;After an introduction to the work flow, the book explores array
manipulation such as creation or reshaping, followed by some simple
numerics and the battery of array-based operations on functions and
polynomials. Then it presents linear algebra and signal processing
basics (FFT). It also covers the financial functions that are present in
numpy and mentions testing, which is very important to achieve quality
code. The book finishes with matplotlib and scipy, two modules that are
important to know to go further.&lt;/p&gt;
&lt;p&gt;The examples are mostly drawn from statistics or financial applications,
such as computing running averages on stock quotes. Basic math
explanations, such as the definition of the Moore-Penrose
pseudo-inverse, are given when needed.&lt;/p&gt;
&lt;p&gt;To conclude, I enjoyed this book and I think that it is a nice addition
to my library. It answers exactly it’s title: it is well-suited for
beginners wanting to learn numpy. On the other hand, I would not
recommend it as a reference material, or as a book to learn more general
scientific or numerical computing with Python.&lt;/p&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="python"></category><category term="scientific computing"></category><category term="books"></category></entry><entry><title>Joblib beta release: fast compressed persistence + Python 3</title><link href="https://gael-varoquaux.info/programming/joblib-beta-release-fast-compressed-persistence-python-3.html" rel="alternate"></link><published>2012-01-07T19:27:00+01:00</published><updated>2012-01-07T19:27:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2012-01-07:/programming/joblib-beta-release-fast-compressed-persistence-python-3.html</id><summary type="html">&lt;div class="section" id="joblib-0-6-better-i-o-and-python-3-support"&gt;
&lt;h2&gt;Joblib 0.6: better I/O and Python 3 support&lt;/h2&gt;
&lt;p&gt;Happy new year, every one. I have just released &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Out-of-core_algorithm"&gt;Joblib&lt;/a&gt; 0.6.0 beta.
The highlights of the 0.6 release are a reworked enhanced pickler, and
Python 3 support.&lt;/p&gt;
&lt;p&gt;Many thanks go to the contributors to the 0.5 …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="section" id="joblib-0-6-better-i-o-and-python-3-support"&gt;
&lt;h2&gt;Joblib 0.6: better I/O and Python 3 support&lt;/h2&gt;
&lt;p&gt;Happy new year, every one. I have just released &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Out-of-core_algorithm"&gt;Joblib&lt;/a&gt; 0.6.0 beta.
The highlights of the 0.6 release are a reworked enhanced pickler, and
Python 3 support.&lt;/p&gt;
&lt;p&gt;Many thanks go to the contributors to the 0.5.X series (Fabian
Pedregosa, Yaroslav Halchenko, Kenneth C. Arnold, Alexandre Gramfort,
Lars Buitinck, Bala Subrahmanyam Varanasi, Olivier Grisel, Ralf Gommers,
Juan Manuel Caicedo Carvajal, and myself). In particular Fabian made
sure that Joblib worked under Python 3.&lt;/p&gt;
&lt;p&gt;In this blog post, I’d like to discuss a bit more the compressed
persistence engine, as it illustrates well key factors in implementing
and using compressed serialization.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="fast-compressed-persistence"&gt;
&lt;h2&gt;Fast compressed persistence&lt;/h2&gt;
&lt;p&gt;One of the key components of joblib is it’s ability to persist arbitrary
Python objects, and read them back very quickly. It is particularly
efficient for &lt;strong&gt;containers that do their heavy lifting with numpy
arrays&lt;/strong&gt;. The trick to achieving great speed has been to save in
separate files the numpy arrays, and load them via &lt;strong&gt;memmapping&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;However, one drawback of joblib, is that the caching mechanism may end
up using a lot of disk space. As a result, there is strong interest in
having &lt;strong&gt;compressed storage&lt;/strong&gt;, provided it doesn’t slow down the library
too much. Another use case that I have in mind for fast compressed
persistence, is implementing &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Out-of-core_algorithm"&gt;out of core computation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are some great compressed I/O libraries for Python, for instance
&lt;a class="reference external" href="http://pytables.github.com/index.html"&gt;Pytables&lt;/a&gt;. You may wonder why the need to code yet another one. The
answer is that joblib is &lt;strong&gt;pure Python, depending only on the standard
library&lt;/strong&gt; (numpy is optional), but also that the goal here is
&lt;strong&gt;black-box persistence of arbitrary objects&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="section" id="comparing-i-o-speed-and-compression-to-other-libraries"&gt;
&lt;h3&gt;Comparing I/O speed and compression to other libraries&lt;/h3&gt;
&lt;p&gt;Implementing efficient compressed storage was a bit of a struggle and I
learned a lot. Rather than going into the details straight away, let me
first discuss a few benchmarks of the resulting code. Benching such
feature is very hard, first because you are fighting with the disk
cache, second because they performances depends very much on the data at
hand (some data compress better than others), last because they are
three interesting metrics: disk space used, write speed, and read speed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dataset used&lt;/strong&gt; - I chose to compare the different strategies on some
datasets that I work with, namely the probabilistic brain atlases MNI
1mm (62Mb uncompressed) and Juelich 2mm (105Mb uncompressed). Whether
the data is represented as a Fortran-ordered array, or a C-ordered array
is important for the I/O performance. This data is normally stored to
disk compressed using the domain-specific Nifti format (&lt;em&gt;.nii&lt;/em&gt; files),
accessed in Python with the &lt;a class="reference external" href="http://nipy.sourceforge.net/nibabel/"&gt;Nibabel&lt;/a&gt; library.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Libraries used&lt;/strong&gt; - I benched different compression strategies in
joblib against Nibabel’s Nifti I/O, compressed or not, and against using
Pytables to store the data buffer (without the meta-informations).
Pytables exposed a variety of compression strategies, with different
speed compromises. In addition, I benched numpy’s builtin
&lt;em&gt;save_compressed&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;I would like to stress that I am comparing a general purpose persistence
engine (joblib) to specific I/O libraries either optimized for the data
(Nifti), or requiring some massaging to enable persistence (pytables).&lt;/p&gt;
&lt;img alt="" class="align-center" src="attachments/joblib_rel_0.6_speed/disk.png" style="width: 66%;" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img alt="" class="align-center" src="attachments/joblib_rel_0.6_speed/write.png" style="width: 66%;" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img alt="" class="align-center" src="attachments/joblib_rel_0.6_speed/read.png" style="width: 66%;" /&gt;
&lt;p&gt;&lt;em&gt;Comparing to other libraries&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Actual numbers can be found &lt;a class="reference external" href="attachments/joblib_rel_0.6_speed/results_nii.csv"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Take home messages&lt;/strong&gt; - The graphs are not crystal-clear, but a few
tendencies appear:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Pytables with LZO or blosc compression is the king of the hill for
read and write speed.&lt;/li&gt;
&lt;li&gt;I/O of compressed data is often faster than with uncompressed data
for a good compression algorithm.&lt;/li&gt;
&lt;li&gt;Joblib with Zlib compression level 1 performs honorably in terms of
speed with only the Python standard library and no compiled code.&lt;/li&gt;
&lt;li&gt;Read time of memmapping (with nibabel or joblib) is negligeable (it
is tiny on the graphs), however the loading time appears when you
start accessing the data.&lt;/li&gt;
&lt;li&gt;Passing in arrays with a memory layout (Fortran versus C order) that
the I/O library doesn’t expect can really slow down writing.&lt;/li&gt;
&lt;li&gt;Compressing with Zlib compression-level 1 gets you most of the disk
space gains for a reasonable cost in write/read speed.&lt;/li&gt;
&lt;li&gt;Compressing with Zlib compression-level 9 (not shown on the figures)
doesn’t buy you much in disk space, but costs a lot in writing time.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="benching-datasets-richer-than-pure-arrays"&gt;
&lt;h3&gt;Benching datasets richer than pure arrays&lt;/h3&gt;
&lt;p&gt;The datasets used so far are pretty much composed of one big array, a 4D
smooth spatial map. I wanted to test on more datasets, to see how the
performances varied with data type and richness. For this, I used the
datasets of the &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt;, real life data of various nature,
described &lt;a class="reference external" href="http://scikit-learn.org/stable/datasets/index.html"&gt;here&lt;/a&gt;:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;20 news&lt;/strong&gt; - 20 usenet news group: this data mainly consists of
text, and not numpy arrays.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LFW people&lt;/strong&gt; - Labeled faces in the wild, many pictures of
different people’s face.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LFW pairs&lt;/strong&gt; - Labeled faces in the wild, pairs of pictures for each
individual. This is a high entropy dataset, it does not have much
redundant information.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Olivetti&lt;/strong&gt; - Olivetti dataset: centered pictures of faces.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Juelich(F)&lt;/strong&gt; - Our previous Juelich atlas&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Big people&lt;/strong&gt; - The LFW people dataset, but repeated 4 times, to put
a strain on memory resources.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MNI(F)&lt;/strong&gt; - Our previous MNI atlas&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Species&lt;/strong&gt; - Occurence of species measured in latin America, with a
lot of missing data.&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="" class="align-center" src="attachments/joblib_rel_0.6_speed/joblib_disk.png" style="width: 50%;" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img alt="" class="align-center" src="attachments/joblib_rel_0.6_speed/joblib_write.png" style="width: 50%;" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;br /&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;img alt="" class="align-center" src="attachments/joblib_rel_0.6_speed/joblib_read.png" style="width: 50%;" /&gt;
&lt;p&gt;Actual numbers can be found
&lt;a class="reference external" href="attachments/joblib_rel_0.6_speed/joblib_results.csv"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What this tells us&lt;/strong&gt; - The main message from these benchmarks is that
datasets with redundant information, i.e. that compress well, give fast
I/O. This is not surprising. In particular, good compression can give
good I/O on text (20 news). Another result, more of a sanity check, is
that compressed I/O on big data (Big people, ) works as well as on
smaller data. Earlier code would start to swap. Finally, I conclude from
these graphs, that compression levels from 1 to 3 buy you most of the
gains for reasonable costs, and that going up to 9 is not recommended,
unless you know that your data can be compressed a lot (species).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="lessons-learned"&gt;
&lt;h3&gt;Lessons learned&lt;/h3&gt;
&lt;p&gt;I’ll keep this paragraph short, because the information is really in
&lt;a class="reference external" href="https://github.com/joblib/joblib/blob/0.5.X/joblib/numpy_pickle.py"&gt;joblib’s code and comments&lt;/a&gt;. Don’t hesitate to have a look, it’s
BSD-licenced, so you are free to borrow what you please.&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Memory copies, of arrays, but also of strings and byte streams can
really slow you down with big data.&lt;/li&gt;
&lt;li&gt;To avoid copies with numpy arrays, fully embrace numpy’s strided
memory model. For instance, you do not need to save arrays in C
order, if they are given to you in a different order. Accessing the
memory in the wrong striding direction explains the poor write
performance of pytables on Fortran-ordered Juelich.&lt;/li&gt;
&lt;li&gt;When dealing with the file system, the OS makes so much magic (e.g.
prefetching) that clever hacks tend not to work: always benchmark.&lt;/li&gt;
&lt;li&gt;Depending on the size of the data, it may be more efficient to store
subsets in different files: it introduces ‘chunk’ that avoid filling
in the memory too much (parameter &lt;em&gt;cache_size&lt;/em&gt; in joblib’s code). In
addition, data of a same nature tends to compress better.&lt;/li&gt;
&lt;li&gt;The I/O stream or file object interfaces are abstractions that can
hide the data movement and the creation of large temporaries. After
experiments with GZipFile and StringIO/BytesIO I found it more
efficient to fall back to passing around big buffer object, numpy
arrays, or strings.&lt;/li&gt;
&lt;li&gt;For reasons 4 and 5, I ended up avoiding the gzip module: raw access
to the zlib with buffers gives more control. This explains a good
part of the differences in read speed for pure arrays with numpy’s
&lt;em&gt;save_compressed&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One of my conclusions for joblib, is that I’ll probably use Pytables as
an optional backend for persistence in a future release.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="details-on-the-benchmarks"&gt;
&lt;h3&gt;Details on the benchmarks&lt;/h3&gt;
&lt;p&gt;These benchmarks where run on a Dell Lattitude D630 laptop. That’s a
dual-core Intel Core2 Duo box, with 2M of CPU cache.&lt;/p&gt;
&lt;p&gt;The code for the benchmarks below can be found on &lt;a class="reference external" href="https://gist.github.com/1551250"&gt;a gist&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="thanks"&gt;
&lt;h3&gt;Thanks&lt;/h3&gt;
&lt;p&gt;I’d like to that Francesc Alted for very useful feedback he gave on this
topics. In particular, the &lt;a class="reference external" href="http://sourceforge.net/mailarchive/message.php?msg_id=28609087"&gt;following thread&lt;/a&gt; on the pytables
mailing-list may be of interest to the reader.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="joblib"></category><category term="python"></category><category term="scientific computing"></category><category term="scipy"></category><category term="scikit-learn"></category></entry><entry><title>Scikit-learn NIPS 2011 sprint: international thanks to our sponsors</title><link href="https://gael-varoquaux.info/programming/scikit-learn-nips-2011-sprint-international-thanks-to-our-sponsors.html" rel="alternate"></link><published>2011-11-18T14:47:00+01:00</published><updated>2011-11-18T14:47:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-11-18:/programming/scikit-learn-nips-2011-sprint-international-thanks-to-our-sponsors.html</id><summary type="html">&lt;p&gt;&lt;strong&gt;The NIPS conference: time for a sprint.&lt;/strong&gt; The &lt;a class="reference external" href="http://nips.cc/"&gt;NIPS conference&lt;/a&gt;, one
of the major conferences in machine learning, is hosted in Granada this
year. I believe that it is the first time that it is hosted in Europe.
As many of the &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt; developers are part of the wider NIPS …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;The NIPS conference: time for a sprint.&lt;/strong&gt; The &lt;a class="reference external" href="http://nips.cc/"&gt;NIPS conference&lt;/a&gt;, one
of the major conferences in machine learning, is hosted in Granada this
year. I believe that it is the first time that it is hosted in Europe.
As many of the &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt; developers are part of the wider NIPS
community, but also many live in Europe, we jumped on the occasion to
organize a truly international sprint: the &lt;a class="reference external" href="http://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events"&gt;NIPS 2011 scikit-learn
sprint&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finding money.&lt;/strong&gt; As often with open source development, a lot of our
contributors are young people, investing their free time outside of any
request from their hierarchy. In such a situation, it can be hard to
find travel money. So we started looking for sponsors. We needed to find
a decent sum of money, as we were flying people in from places such as
the West coast of the US, or even Japan. The good news is that we found
money, and between supervisors pitching in, universities giving travel
grants, and our generous sponsors, there will be an impressive list of
contributors from all over the world at the sprint.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thanks to our sponsors.&lt;/strong&gt; The first people that we need to thank are
Google, who gave us a sizable sponsorship, and the &lt;a class="reference external" href="http://www.python.org/psf/"&gt;PSF&lt;/a&gt;, who made
Google’s sponsorship possible through their accounting and sprints
programs. We also need to thanks our other sponsors, namely
&lt;a class="reference external" href="http://www.tinyclues.com/"&gt;Tinyclues&lt;/a&gt;. Thanks to these sponsors, and additional investment from
many universities and research group, we have been able to gather a
total of 12 contributors in Granada, a handful coming from overseas.
Also, we are indebted to the &lt;a class="reference external" href="http://www.ugr.es/"&gt;University of Granada&lt;/a&gt;, and the Gnu/Linux
Granada Group (GGG), who are providing hosting for the sprint, as well
as Régine Bricquet, from INRIA, who did a lot of the trip planing for
the sponsored people.&lt;/p&gt;
&lt;p&gt;I am very much looking forward to the sprint. It will be the first time
that meet in real life many of the contributors, and judging by the
warmness of the on-line exchanges, it will be a great moment. Besides,
Granada is known to be a lively and historical city.&lt;/p&gt;
&lt;p&gt;If you are around and want to join us, to work on Python in machine
learning, send us a mail on the &lt;a class="reference external" href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general"&gt;mailing list&lt;/a&gt;.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scikit-learn"></category><category term="scipy"></category><category term="conferences"></category><category term="sprint"></category></entry><entry><title>Cython example of exposing C-computed arrays in Python without data copies</title><link href="https://gael-varoquaux.info/programming/cython-example-of-exposing-c-computed-arrays-in-python-without-data-copies.html" rel="alternate"></link><published>2011-09-28T23:42:00+02:00</published><updated>2011-09-28T23:42:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-09-28:/programming/cython-example-of-exposing-c-computed-arrays-in-python-without-data-copies.html</id><summary type="html">&lt;p&gt;Some advice on passing arrays from C to Python avoiding copies. I use
Cython as I have found the code to be more maintainable than hand-written
Python C-API code.&lt;/p&gt;
&lt;p&gt;I found out that there was no self-contained example of creating numpy
arrays from existing data in Cython. Thus I created …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Some advice on passing arrays from C to Python avoiding copies. I use
Cython as I have found the code to be more maintainable than hand-written
Python C-API code.&lt;/p&gt;
&lt;p&gt;I found out that there was no self-contained example of creating numpy
arrays from existing data in Cython. Thus I created my own. The full code
with readme build and demo scripts is available on a &lt;a class="reference external" href="https://gist.github.com/1249305"&gt;gist&lt;/a&gt;. Here I only
give an executive summary.&lt;/p&gt;
&lt;p&gt;The core functionality is implemented by the
&lt;a class="reference external" href="http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#PyArray_SimpleNewFromData"&gt;PyArray_SimpleNewFromData&lt;/a&gt; function of the C API of numpy that can
create an ndarray from a pointer to the data, a simple data type, and
the shape of the data. The Cython file just builds around that function:&lt;/p&gt;
&lt;p&gt;
&lt;script src="https://gist.github.com/1249305.js?file=cython_wrapper.pyx"&gt;&lt;/script&gt;
&lt;/p&gt;</content><category term="programming"></category><category term="scipy"></category><category term="cython"></category><category term="python"></category><category term="scientific computing"></category><category term="selected"></category></entry><entry><title>Python at scientific conferences</title><link href="https://gael-varoquaux.info/programming/python-at-scientific-conferences.html" rel="alternate"></link><published>2011-09-11T15:52:00+02:00</published><updated>2011-09-11T15:52:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-09-11:/programming/python-at-scientific-conferences.html</id><summary type="html">&lt;p&gt;Top notch scientific conferences are starting to add Python tracks to
their program. This is good news. Indeed, it scientific Python
conferences (namely &lt;a class="reference external" href="http://conference.scipy.org/scipy2011/"&gt;Scipy&lt;/a&gt;, &lt;a class="reference external" href="http://www.euroscipy.org/"&gt;EuroSciPy&lt;/a&gt; and &lt;a class="reference external" href="http://scipy.in/scipyin/2011/"&gt;Scipy India&lt;/a&gt;) are doing
great to get together people who have already heard about Python for
science, but we need to reach out to …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Top notch scientific conferences are starting to add Python tracks to
their program. This is good news. Indeed, it scientific Python
conferences (namely &lt;a class="reference external" href="http://conference.scipy.org/scipy2011/"&gt;Scipy&lt;/a&gt;, &lt;a class="reference external" href="http://www.euroscipy.org/"&gt;EuroSciPy&lt;/a&gt; and &lt;a class="reference external" href="http://scipy.in/scipyin/2011/"&gt;Scipy India&lt;/a&gt;) are doing
great to get together people who have already heard about Python for
science, but we need to reach out to specific Python communities to
maximize impact.&lt;/p&gt;
&lt;div class="section" id="esco-2012-european-seminar-on-coupled-problems"&gt;
&lt;h2&gt;ESCO 2012 - European Seminar on Coupled Problems&lt;/h2&gt;
&lt;p&gt;&lt;a class="reference external" href="http://esco2012.femhub.com/"&gt;ESCO 2012&lt;/a&gt; is the 3rd event in a series of interdisciplineary meetings
dedicated to computational science challenges in multi-physics and PDEs.&lt;/p&gt;
&lt;p&gt;I was invited as ESCO last year. It was an aboslute pleasure, because it
is a small conference that is very focused on discussions. I learned a
lot and could sit down with people who code top notch PDE libraries such
as FEniCS and have technical discussions. Besides, it is hosted in the
historical brewery where the Pilsner was invented. Plenty of great beer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Application areas&lt;/strong&gt; Theoretical results as well as applications are
welcome. Application areas include, but are not limited to:
Computational electromagnetics, Civil engineering, Nuclear engineering,
Mechanical engineering, Computational fluid dynamics, Computational
geophysics, Geomechanics and rock mechanics, Computational hydrology,
Subsurface modeling, Biomechanics, Computational chemistry, Climate and
weather modeling, Wave propagation, Acoustics, Stochastic differential
equations, and Uncertainty quantification.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Minisymposia&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Multiphysics and Multiscale Problems in Civil Engineering&lt;/li&gt;
&lt;li&gt;Modern Numerical Methods for ODE&lt;/li&gt;
&lt;li&gt;Porous Media Hydrodynamics&lt;/li&gt;
&lt;li&gt;Nuclear Fuel Recycling Simulations&lt;/li&gt;
&lt;li&gt;Adaptive Methods for Eigenproblems&lt;/li&gt;
&lt;li&gt;Discontinuous Galerkin Methods for Electromagnetics&lt;/li&gt;
&lt;li&gt;Undergraduate Projects in Technical Computing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Software afternoon&lt;/strong&gt; Important part of each ESCO conference is a
software afternoon featuring software projects by participants.
Presented can be any computational software that has reached certain
level of maturity, i.e., it is used outside of the author’s institution,
and it has a web page and a user documentation. If you would like to
present your software project, let us know soon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Proceedings&lt;/strong&gt; For each ESCO we strive to reserve a special issue of an
international journal with impact factor. Proceedings of ESCO 2008
appeared in Math. Comput. Simul., proceedings of ESCO 2010 in CiCP and
Appl. Math. Comput. Proceedings of ESCO 2012 will appear in Computing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important Dates&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;December 15, 2011: Abstract submission deadline.&lt;/li&gt;
&lt;li&gt;December 15, 2011: Minisymposia proposals.&lt;/li&gt;
&lt;li&gt;January 15, 2012: Notification of acceptance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="pyhpc-python-for-high-performance-computing"&gt;
&lt;h2&gt;PyHPC: Python for High performance computing&lt;/h2&gt;
&lt;p&gt;If you are doing super computing, &lt;a class="reference external" href="http://sc11.supercomputing.org/"&gt;SC11, the Super Computing
conference&lt;/a&gt; is &lt;em&gt;the&lt;/em&gt; reference conference. This year there will a
workshop on high performance computing with Python: &lt;a class="reference external" href="http://www.dlr.de/sc/desktopdefault.aspx/tabid-1183/1638_read-31733/"&gt;PyHPC&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At the scipy conference, I was having a discussion with some of the
attendees on how people often still do process management and I/O with
Fortran in the big computing environment. This is counter productive.
However, has success stories of supercomputing folks using high-level
languages are not advertized, this is bound to stay. Come and tell us
how you use Python for high performance computing!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Topics&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Python-based scientific applications and libraries&lt;/li&gt;
&lt;li&gt;High performance computing&lt;/li&gt;
&lt;li&gt;Parallel Python-based programming languages&lt;/li&gt;
&lt;li&gt;Scientific visualization&lt;/li&gt;
&lt;li&gt;Scientific computing education&lt;/li&gt;
&lt;li&gt;Python performance and language issues&lt;/li&gt;
&lt;li&gt;Problem solving environments with Python&lt;/li&gt;
&lt;li&gt;Performance analysis tools for Python application&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Papers&lt;/strong&gt; We invite you to submit a paper of up to 10 pages via the
submission site. Authors are encouraged to use IEEE two column format.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important Dates&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Full paper submission: September 19, 2011&lt;/li&gt;
&lt;li&gt;Notification of acceptance: October 7, 2011&lt;/li&gt;
&lt;li&gt;Camera-ready papers: October 31, 2011&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="conferences"></category><category term="python"></category><category term="scipy"></category><category term="science"></category><category term="scientific computing"></category></entry><entry><title>Hiring a junior developer on the scikit-learn</title><link href="https://gael-varoquaux.info/programming/hiring-a-junior-developer-on-the-scikit-learn.html" rel="alternate"></link><published>2011-09-03T07:26:00+02:00</published><updated>2011-09-03T07:26:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-09-03:/programming/hiring-a-junior-developer-on-the-scikit-learn.html</id><summary type="html">&lt;p&gt;Once again, we are looking for a junior developer to work on the
scikit-learn. Below is the official job posting. As a personal remark, I
would like to stress that this is a unique opportunity to be payed for
two years to work on learning and improving the scientific Python …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Once again, we are looking for a junior developer to work on the
scikit-learn. Below is the official job posting. As a personal remark, I
would like to stress that this is a unique opportunity to be payed for
two years to work on learning and improving the scientific Python
toolstack.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="section" id="job-description"&gt;
&lt;h2&gt;Job Description&lt;/h2&gt;
&lt;p&gt;INRIA is looking to hire a young graduate on a 2-year position to help
with the community-driven development of the open source machine
learning in Python library, scikit-learn. The scikit-learn is one of the
majormajor machine-learning libraries in Python. It aims to be
state-of-the-art on mid-size to large datasets by harnessing the power
of the scientific Python toolstack.&lt;/p&gt;
&lt;p&gt;Speaking French is not a requirement, as it is an international team.&lt;/p&gt;
&lt;div class="section" id="requirements"&gt;
&lt;h3&gt;Requirements&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Programming skills in Python and C/C++&lt;/li&gt;
&lt;li&gt;Understanding of quality assurance in software development:
test-driven programming, version control, technical documentation.&lt;/li&gt;
&lt;li&gt;Some knowledge of Linux/Unix&lt;/li&gt;
&lt;li&gt;Software design skills&lt;/li&gt;
&lt;li&gt;Knowledge of open-source development and community-driven
environments&lt;/li&gt;
&lt;li&gt;Good technical English level&lt;/li&gt;
&lt;li&gt;An experience in statistical learning or a mathematical-oriented
mindset is a plus&lt;/li&gt;
&lt;li&gt;We can only hire a young-graduate that has received a masters or
equivalent degree at most a year ago.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="about-inria"&gt;
&lt;h2&gt;About INRIA&lt;/h2&gt;
&lt;p&gt;INRIA is the French computer science research institute. It recognized
word-wide as one of the leading research institutions and has a strong
expertise in machine learning. You will be working in the &lt;a class="reference external" href="https://parietal.saclay.inria.fr"&gt;Parietal
team&lt;/a&gt; that makes a heavy use of Python for brain imaging analysis.&lt;/p&gt;
&lt;p&gt;Parietal is a small research team (around 10 people) with an excellent
technical knowledge of scientific and numerical computing in Python as
well as a fine understanding of algorithmic issues in machine learning
and statistics. Parietal is committed to investing in scikit-learn.&lt;/p&gt;
&lt;p&gt;Working at Parietal is a unique opportunity to improve your skills in
machine learning and numerical computing in Python. In addition, working
full time on the scikit-learn, a very active open-source project, will
give you premium experience of open source community management and
collaborative project development.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="contact-info"&gt;
&lt;h2&gt;Contact Info:&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Technical Contact&lt;/strong&gt;: Bertand Thirion&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;E-mail contact&lt;/strong&gt;: bertrand dotnospam thirion atnospam inria
dotnospam fr&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HR Contact&lt;/strong&gt;: Marie Domingues&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;E-mail Contact&lt;/strong&gt;: marie dotnospam domingues atnospam inria
dotnospam fr&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No telecommuting&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="machine learning"></category><category term="python"></category><category term="science"></category><category term="jobs"></category><category term="scikit-learn"></category></entry><entry><title>Euroscipy 2011: early bird deadline soon</title><link href="https://gael-varoquaux.info/programming/euroscipy-2011-early-bird-deadline-soon.html" rel="alternate"></link><published>2011-07-22T00:44:00+02:00</published><updated>2011-07-22T00:44:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-07-22:/programming/euroscipy-2011-early-bird-deadline-soon.html</id><summary type="html">&lt;div class="section" id="euroscipy-2011-register-now-for-early-bird-prices"&gt;
&lt;h2&gt;Euroscipy 2011: register now for early bird prices&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The deadline for early-bird registration at the Euroscipy conference
is this Sunday&lt;/strong&gt;. Beyond this deadline prices will double. &lt;strong&gt;Register
now to get a great deal.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To register, simply go to &lt;a class="reference external" href="http://www.euroscipy.org"&gt;www.euroscipy.org&lt;/a&gt;, log in using the link on
the top right …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="section" id="euroscipy-2011-register-now-for-early-bird-prices"&gt;
&lt;h2&gt;Euroscipy 2011: register now for early bird prices&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;The deadline for early-bird registration at the Euroscipy conference
is this Sunday&lt;/strong&gt;. Beyond this deadline prices will double. &lt;strong&gt;Register
now to get a great deal.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To register, simply go to &lt;a class="reference external" href="http://www.euroscipy.org"&gt;www.euroscipy.org&lt;/a&gt;, log in using the link on
the top right, and follow the &lt;em&gt;‘Register now for the conference’&lt;/em&gt; link
on the top left.&lt;/p&gt;
&lt;p&gt;The conference is a great opportunity to learn the intricacies of
numerical and scientific computing in Python. You can register for the
tutorials in a &lt;a class="reference external" href="http://www.euroscipy.org/track/4010?vid=tracktalkslist"&gt;intro track&lt;/a&gt;, that will take you from beginner to fully
autonomous user, or for an &lt;a class="reference external" href="http://www.euroscipy.org/track/4011?vid=tracktalkslist"&gt;advanced track&lt;/a&gt;, to learn from the experts
topics such as image processing, GPU computing, machine learning or
optimization. The tutorials are a fairly unique occasion to improve your
skills, as you will seldom get such a concentration of experts.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="some-program-highlights"&gt;
&lt;h2&gt;Some program highlights&lt;/h2&gt;
&lt;p&gt;After the 2 days of tutorial, the conference itself we host 2 keynotes:
one by &lt;a class="reference external" href="http://mcs.open.ac.uk/mp8/"&gt;Marian Petre&lt;/a&gt;, of the open university, well-known for her
empirical studies of software development, and another one by &lt;a class="reference external" href="http://fperez.org/"&gt;Fernando
Perez&lt;/a&gt;, a pioneer in scientific computing in Python and the original
author of IPython.&lt;/p&gt;
&lt;p&gt;Glancing at the &lt;a class="reference external" href="http://www.euroscipy.org/track/3992?vid=tracktalkslist"&gt;program&lt;/a&gt;, we can see how a wide range of topics are
touched:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;pure computer-science topics, such as &lt;a class="reference external" href="http://www.euroscipy.org/talk/4186"&gt;concurrent programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;traditional &lt;em&gt;hard&lt;/em&gt; sciences, such as &lt;a class="reference external" href="http://www.euroscipy.org/talk/4201"&gt;multi-physics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;simulation of complex systems, for instance &lt;a class="reference external" href="http://www.euroscipy.org/talk/4219"&gt;network modeling in
epidemiology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;or novel application of quantitative large-data processing, as in
&lt;a class="reference external" href="http://www.euroscipy.org/talk/4182"&gt;legal research&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The variety of the topics illustrates what is for me one of the greatest
benefits of the scipy conferences: they form a forum to exchange ideas
and techniques to find new solutions to scientific, numerical and data
analysis problems. Unlike the pure computer science conference, they sit
at the frontier of applications and bleeding edge computer developments,
&lt;strong&gt;because these people really use the tools presented to solve their
problems&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In addition to this rich program, we will have 2 days of &lt;a class="reference external" href="http://www.euroscipy.org/track/5201"&gt;sprints&lt;/a&gt;
before the conference as well as 2-day-long satellite conferences on
Python in &lt;a class="reference external" href="http://www.euroscipy.org/card/pyphy2011"&gt;Physics&lt;/a&gt; and &lt;a class="reference external" href="http://pythonneuro.sciencesconf.org/"&gt;NeuroScience&lt;/a&gt; after the conference. This is
how what used to be a small conference can now be a full 8-days event if
you order all the extras.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="conferences"></category></entry><entry><title>Hiring a junior engineer on the scikit-learn</title><link href="https://gael-varoquaux.info/programming/hiring-a-junior-engineer-on-the-scikit-learn.html" rel="alternate"></link><published>2011-05-14T19:10:00+02:00</published><updated>2011-05-14T19:10:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-05-14:/programming/hiring-a-junior-engineer-on-the-scikit-learn.html</id><summary type="html">&lt;p&gt;The &lt;a class="reference external" href="http://www.scikit-learn.org"&gt;scikit-learn&lt;/a&gt; is a Python module for machine learning. The
project builds on the scientific and numerical tools of the &lt;a class="reference external" href="http://www.scipy.org"&gt;scipy
community&lt;/a&gt; to provide state-of-the-art data analysis tools. It is
developed by a community of open source developers to which my research
team (&lt;a class="reference external" href="https://parietal.saclay.inria.fr/"&gt;Parietal&lt;/a&gt;, &lt;a class="reference external" href="http://www.inria.fr/"&gt;INRIA&lt;/a&gt;) contributes a lot and is …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The &lt;a class="reference external" href="http://www.scikit-learn.org"&gt;scikit-learn&lt;/a&gt; is a Python module for machine learning. The
project builds on the scientific and numerical tools of the &lt;a class="reference external" href="http://www.scipy.org"&gt;scipy
community&lt;/a&gt; to provide state-of-the-art data analysis tools. It is
developed by a community of open source developers to which my research
team (&lt;a class="reference external" href="https://parietal.saclay.inria.fr/"&gt;Parietal&lt;/a&gt;, &lt;a class="reference external" href="http://www.inria.fr/"&gt;INRIA&lt;/a&gt;) contributes a lot and is a &lt;a class="reference external" href="http://github.com/scikit-learn/scikit-learn"&gt;striving
project&lt;/a&gt;. Its mailing list fosters many discussions on code and machine
learning topics, it has a &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/user_guide.html"&gt;a very detailed documentation&lt;/a&gt;, and &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/whats_new.html"&gt;a tight
release cycle&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Although scikits.learn is mostly developed by volunteers, INRIA has
funded a two year position for a junior engineer —currently &lt;a class="reference external" href="http://fseoane.net/blog/"&gt;Fabian
Pedregosa&lt;/a&gt;— to help with the core management and integration of the
project. This funding is coming to an end in falls 2011 &lt;a class="reference external" href="#footnote"&gt;[*]&lt;/a&gt;. The
good news is that we have been allocate new funding to hire an engineer
on the scikit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We are thus looking to hire a junior engineer for a 2-year position to
work on the scikits.learn at INRIA in Saclay, near Paris&lt;/strong&gt;. The position
is only available to candidates that have received a &lt;strong&gt;masters or
equivalent degree at most a year ago&lt;/strong&gt; — this is non negotiable: we
cannot hire more senior candidates.&lt;/p&gt;
&lt;p&gt;We are looking for a developer with good open-source project management
skills: the successful candidate will review and merge patches, ensure
the quality of the scikit, make releases, coordinate development on the
mailing list and on github. Good knowledge of Python and its scientific
ecosystem is expected. A mathematical or computer-science oriented
mindset is a plus, as the project involves working with machine learning
algorithms.&lt;/p&gt;
&lt;p&gt;The candidate should be willing to relocate to work daily in the
&lt;a class="reference external" href="http://www-dsv.cea.fr/en/instituts/institut-d-imagerie-biomedicale-i2bm/services/neurospin-neurospin"&gt;Neurospin brain research institute&lt;/a&gt; in which the Parietal is located.
Knowledge of French is not required, as the team and the institute are
very international. Non-EU candidates are welcome, but the hiring
process will take longer.&lt;/p&gt;
&lt;p&gt;You will be working in a very stimulating environment. You will be
employed by INRIA, the French computer science research institute. As
such, you will benefit from the expertise of the institute’s researchers
and engineers. Team members contribute to various scientific Python
libraries (in addition to scikits.learn, &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi&lt;/a&gt;, &lt;a class="reference external" href="http://nipy.org"&gt;nipy&lt;/a&gt;, &lt;a class="reference external" href="http://packages.python.org/joblib/"&gt;joblib&lt;/a&gt;).
In addition, you will be working in a brain research institute, in
collaboration with leading &lt;a class="reference external" href="http://lnao.fr"&gt;methods researchers&lt;/a&gt; and &lt;a class="reference external" href="http://www.unicog.org/pm/pmwiki.php"&gt;neuroscientists&lt;/a&gt;
that use machine learning to gain new insights on brain processes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To apply:&lt;/strong&gt; To apply, you need to prepare a CV and a motivation
letter. The deadline for applications is mid June, but we will be
selecting candidates and conducting interviews before. &lt;strong&gt;Don’t send me
CVs&lt;/strong&gt;. The formal job description, as well as instructions to apply can
be found on this &lt;a class="reference external" href="http://en.inria.fr/institute/recruitment/offers/young-graduate-engineers/%28view%29/details.html?id=PNGFK026203F3VBQB6G68LOE1&amp;amp;LOV5=4510&amp;amp;ContractType=4545&amp;amp;LG=EN&amp;amp;Resultsperpage=20&amp;amp;nPostingID=5534&amp;amp;nPostingTargetID=10628&amp;amp;option=52&amp;amp;sort=DESC&amp;amp;nDepartmentID=10"&gt;page&lt;/a&gt;. The page is mostly in French, sorry; use
Google translate if you don’t understand. At the bottom of the page you
will find a link to apply.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;strong&gt;[*]&lt;/strong&gt; Fabian will most probably stay with us to do a PhD on
&lt;a class="reference external" href="https://parietal.saclay.inria.fr/research"&gt;analysis of large brain functional imaging datasets&lt;/a&gt;.&lt;/p&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="jobs"></category><category term="machine learning"></category><category term="scipy"></category><category term="science"></category></entry><entry><title>EuroScipy: the program is filling up, and the submission deadline nearing</title><link href="https://gael-varoquaux.info/programming/euroscipy-the-program-is-filling-up-and-the-submission-deadline-nearing.html" rel="alternate"></link><published>2011-04-30T17:21:00+02:00</published><updated>2011-04-30T17:21:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-04-30:/programming/euroscipy-the-program-is-filling-up-and-the-submission-deadline-nearing.html</id><summary type="html">&lt;div class="section" id="submission-deadline-may-8th"&gt;
&lt;h2&gt;Submission deadline May 8th&lt;/h2&gt;
&lt;p&gt;The deadline for the call for presentation for the EuroScipy conference
is on &lt;strong&gt;May 8th&lt;/strong&gt;. There is only a week and a half left.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.euroscipy.org/"&gt;EuroScipy&lt;/a&gt; will be held in &lt;strong&gt;Paris, August 25-28&lt;/strong&gt;. It is the European
meeting for users of Python in scientific and numerical-intensive
applications …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="section" id="submission-deadline-may-8th"&gt;
&lt;h2&gt;Submission deadline May 8th&lt;/h2&gt;
&lt;p&gt;The deadline for the call for presentation for the EuroScipy conference
is on &lt;strong&gt;May 8th&lt;/strong&gt;. There is only a week and a half left.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.euroscipy.org/"&gt;EuroScipy&lt;/a&gt; will be held in &lt;strong&gt;Paris, August 25-28&lt;/strong&gt;. It is the European
meeting for users of Python in scientific and numerical-intensive
applications. It strives to bring together both users and developers of
scientific and numerical tools, as well as academic research and state
of the art industry. The conference will host 2 days of tutorials and 2
days of technical presentations.&lt;/p&gt;
&lt;p&gt;Lately, numerical computing in Python has started reaching a much wider
audience than the traditional academic-oriented audience. This is partly
because Python is making its way in major engineering companies, but
also because more and more industries are processing large amounts of
data, and find precious &lt;strong&gt;data analytics tools&lt;/strong&gt; in the &lt;a class="reference external" href="http://www.scipy.org"&gt;Scipy&lt;/a&gt;
community. In this spirit, this year there will be a &lt;a class="reference external" href="http://www.euroscipy.org/talk/4061"&gt;tutorial on
machine learning with Python&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="poster-session"&gt;
&lt;h2&gt;Poster session&lt;/h2&gt;
&lt;p&gt;Last year, the organizing committee had to refuse a large fraction of
the proposals, because there were not enough slots available. We had
considered organizing a poster sessions, but the logistics were to
challenging for our little resources. Indeed, EuroSciPy still tries to
be organized as a hackers and coders conference, rather than an
industry-level one. For instance, we keep the prices to a minimum, in
order to make it easy for young people traveling on their own budget to
join us. Getting 200 attendees as we did last year, did strain our small
organization committee.&lt;/p&gt;
&lt;p&gt;This year, we had a unexpected backing of the &lt;a class="reference external" href="http://www.phys.ens.fr/"&gt;physics department&lt;/a&gt; of
the &lt;a class="reference external" href="http://www.ens.fr/?lang=en"&gt;ENS&lt;/a&gt;. They were extremely enthusiastic about Python, that they now
use for teaching and research. This made me really happy, as this is
where I studied. They proposed help, and in particular help with the
local organization.&lt;/p&gt;
&lt;p&gt;Thus I am able to announce that thanks to the physics department of the
ENS, we will be able to host a poster session!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="an-exciting-program-shaping-up"&gt;
&lt;h2&gt;An exciting program shaping up&lt;/h2&gt;
&lt;p&gt;The program is starting to shape up, and it is looking really good, in
my eyes.&lt;/p&gt;
&lt;div class="section" id="keynotes"&gt;
&lt;h3&gt;Keynotes&lt;/h3&gt;
&lt;p&gt;We will be having two keynote speakers, one directly from the SciPy
community, Fernando Perez, and one probably less known to this
community, Marian Petre.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://mcs.open.ac.uk/mp8/"&gt;Marian Petre&lt;/a&gt;: Marian is the director of the &lt;a class="reference external" href="http://crc.open.ac.uk/"&gt;Center for Research
in Computing&lt;/a&gt;, at the &lt;a class="reference external" href="http://www.open.ac.uk/"&gt;Open University&lt;/a&gt;. She is interested in
empirical studies of software development. I am very excited to hear
a bit more about the often-forgotten human factor that goes behind
every coding job, big or small. In my experience scientific computing
and computational sciences pay a hefty price because they don’t
acknowledge well-enough the gap between good ideas and tractable
code.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fperez.org/"&gt;Fernando Perez&lt;/a&gt;: Fernando is a research scientist in
neuroscience at &lt;a class="reference external" href="http://neuroscience.berkeley.edu/"&gt;UC Berkeley&lt;/a&gt;. Before that, he was successively a
physicist and a mathematician. He has been an early advocate of the
scientific Python ecosystem, in addition to being the creator of
IPython. His vision has always been oriented toward finding an
computing environment that makes scientific creativity easier.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="tutorials"&gt;
&lt;h3&gt;Tutorials&lt;/h3&gt;
&lt;p&gt;The tutorial program is now final, and can be seen on the &lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy2011"&gt;schedule&lt;/a&gt;.
Like last year, we will have two tracks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/track/4010"&gt;An introductory track&lt;/a&gt;, designed as a two-day course addressing
the different aspects of the Python language and the scientific
computing module to bring up beginners to full speed. At the end of
the two days, attendee should be able to solve simple computational
problems using Python alone.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/track/4011"&gt;An advanced track&lt;/a&gt;, in which experts of various aspects of
scientific and numerical computing in Python share their knowledge in
2-hours long tutorials.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="python-in-neuroscience-satellite"&gt;
&lt;h2&gt;Python in NeuroScience satellite&lt;/h2&gt;
&lt;p&gt;The two days following the conference, their will be &lt;a class="reference external" href="http://pythonneuro.sciencesconf.org/"&gt;a satellite
meeting on the use Python in neuroscience&lt;/a&gt;. It will be a small and more
focused event, in which neuroscientist will be able to exchange
technical aspects of computation and data management in Python.
Hopefully it will foster interest discussions and collaborations. if you
are interested, you can submit a talk proposal for this satellite
meeting &lt;a class="reference external" href="http://pythonneuro.sciencesconf.org/"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;img alt="" class="align-center" src="http://farm5.static.flickr.com/4143/4780097256_14c99f3b32.jpg" style="width: 60%;" /&gt;
&lt;p&gt;&lt;strong&gt;Come and join us at EuroScipy in Paris, Augst 25-28. Paris is a great
city. The SciPy community is a friendly one.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="conferences"></category><category term="scipy"></category><category term="python"></category><category term="science"></category></entry><entry><title>Scikit-learn sprint on April 1st</title><link href="https://gael-varoquaux.info/programming/scikit-learn-sprint-on-april-1st.html" rel="alternate"></link><published>2011-03-26T13:27:00+01:00</published><updated>2011-03-26T13:27:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-03-26:/programming/scikit-learn-sprint-on-april-1st.html</id><summary type="html">&lt;a class="reference external image-reference" href="http://scikit-learn.sourceforge.net/"&gt;&lt;img alt="" src="http://scikit-learn.sourceforge.net/stable/_static/scikit-learn-logo-small.png" /&gt;&lt;/a&gt;
&lt;p&gt;The &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt; team is organizing a sprint on April 1st (that next
Friday). Join us in &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events"&gt;Paris, Boston, or on IRC&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;With the rise of the data sciences, the scikit-learn, a &lt;strong&gt;BSD-licensed
Python package for machine learning&lt;/strong&gt;, is becoming an asset for more and
more endeavors. Machine learning has traditionally …&lt;/p&gt;</summary><content type="html">&lt;a class="reference external image-reference" href="http://scikit-learn.sourceforge.net/"&gt;&lt;img alt="" src="http://scikit-learn.sourceforge.net/stable/_static/scikit-learn-logo-small.png" /&gt;&lt;/a&gt;
&lt;p&gt;The &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt; team is organizing a sprint on April 1st (that next
Friday). Join us in &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events"&gt;Paris, Boston, or on IRC&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;With the rise of the data sciences, the scikit-learn, a &lt;strong&gt;BSD-licensed
Python package for machine learning&lt;/strong&gt;, is becoming an asset for more and
more endeavors. Machine learning has traditionally been considered as
very technical and inaccessible to the non mathematician. We are aiming
to break this barrier.&lt;/p&gt;
&lt;p&gt;The sprint will be focused on pragmatic down-to-earth improvements in
the scikit. Our goal is to make it easy for people to contribute. A list
of tasks and organization details can be found on the &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/wiki/Upcoming-events"&gt;sprint planning&lt;/a&gt;
wiki page. Amongst other things, we’ll be working on:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;integrating new learning algorithms&lt;/strong&gt;, in particular merging in the
many excellent pull requests that we have: &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/103"&gt;hierarchical
clustering&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/103"&gt;data transforming using linear discriminant
analysis&lt;/a&gt;, &lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/107"&gt;multinomial naive bayes classifier&lt;/a&gt; …&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;testing and logging framework&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://github.com/scikit-learn/scikit-learn/pull/94"&gt;**better parallel computing support**&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;and many other itches to scratch, as it is a community-driven event.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Come and join us. It will be fun, and it’s an occasion to learn new
tricks.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external image-reference" href="http://farm5.static.flickr.com/4067/4405351641_5675ba000c.jpg"&gt;&lt;img alt="image1" src="http://farm5.static.flickr.com/4067/4405351641_5675ba000c.jpg" style="width: 20%;" /&gt;&lt;/a&gt; &lt;a class="reference external image-reference" href="http://farm6.static.flickr.com/5249/5265835075_ea0b41019c.jpg"&gt;&lt;img alt="image2" src="http://farm6.static.flickr.com/5249/5265835075_ea0b41019c.jpg" style="width: 20%;" /&gt;&lt;/a&gt; &lt;a class="reference external image-reference" href="http://farm5.static.flickr.com/4135/4974339970_566424185f.jpg"&gt;&lt;img alt="image3" src="http://farm5.static.flickr.com/4135/4974339970_566424185f.jpg" style="width: 20%;" /&gt;&lt;/a&gt; &lt;a class="reference external image-reference" href="http://farm6.static.flickr.com/5294/5425114531_6eec316967.jpg"&gt;&lt;img alt="image4" src="http://farm6.static.flickr.com/5294/5425114531_6eec316967.jpg" style="width: 20%;" /&gt;&lt;/a&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="sprint"></category><category term="machine learning"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category><category term="scikit-learn"></category></entry><entry><title>Windows binaries for the scientific Python ecosystem</title><link href="https://gael-varoquaux.info/programming/windows-binaries-for-the-scientific-python-ecosystem.html" rel="alternate"></link><published>2011-02-15T09:02:00+01:00</published><updated>2011-02-15T09:02:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-02-15:/programming/windows-binaries-for-the-scientific-python-ecosystem.html</id><summary type="html">&lt;p&gt;I just realized yesterday that Christoph Gohlke has &lt;a class="reference external" href="http://www.lfd.uci.edu/~gohlke/pythonlibs/"&gt;a repository of
binary installers&lt;/a&gt; (&lt;em&gt;.exe&lt;/em&gt;) for Windows 32 and 64bit with almost all
the scientific Python packages that you can dream of:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://numpy.scipy.org"&gt;numpy&lt;/a&gt;, &lt;a class="reference external" href="http://www.scipy.org/"&gt;scipy&lt;/a&gt; and &lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt;, of course (compiled
with the MKL)&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://cython.org/"&gt;cython&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;the &lt;a class="reference external" href="http://enthought.github.com/"&gt;ETS&lt;/a&gt;, including &lt;a class="reference external" href="http://enthought.github.com/mayavi/mayavi/"&gt;Mayavi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;VTK&lt;/strong&gt;, with the Python …&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;I just realized yesterday that Christoph Gohlke has &lt;a class="reference external" href="http://www.lfd.uci.edu/~gohlke/pythonlibs/"&gt;a repository of
binary installers&lt;/a&gt; (&lt;em&gt;.exe&lt;/em&gt;) for Windows 32 and 64bit with almost all
the scientific Python packages that you can dream of:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://numpy.scipy.org"&gt;numpy&lt;/a&gt;, &lt;a class="reference external" href="http://www.scipy.org/"&gt;scipy&lt;/a&gt; and &lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt;, of course (compiled
with the MKL)&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://cython.org/"&gt;cython&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;the &lt;a class="reference external" href="http://enthought.github.com/"&gt;ETS&lt;/a&gt;, including &lt;a class="reference external" href="http://enthought.github.com/mayavi/mayavi/"&gt;Mayavi&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;VTK&lt;/strong&gt;, with the Python bindings&lt;/li&gt;
&lt;li&gt;a variety of &lt;a class="reference external" href="http://scikits.appspot.com/"&gt;scikits&lt;/a&gt; (including the &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt;,
hurray!)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These binaries are incredibly useful, as building all these packages
under Windows does requires some skills, and a compiler. They complement
very well fully-fledge scientific Python distributions such as EPD or
Python(x,y), as they can be installed on top of an existing Python
installation.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;I should say that I discovered this thanks to a long email discussion in
which Christoph Gohlke and Yakub Nowacki helped me debug a nasty Mayavi
bug on Windows 64bit that I couldn’t reproduce as I don’t have a Windows
64bit available. That was particularly helpful.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scipy"></category><category term="mayavi"></category></entry><entry><title>Interested in parallel computing and statistics? We are looking for a post-doc</title><link href="https://gael-varoquaux.info/programming/interested-in-parallel-computing-and-statistics-we-are-looking-for-a-post-doc.html" rel="alternate"></link><published>2011-01-30T22:30:00+01:00</published><updated>2011-01-30T22:30:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-01-30:/programming/interested-in-parallel-computing-and-statistics-we-are-looking-for-a-post-doc.html</id><summary type="html">&lt;p&gt;&lt;a class="reference external" href="https://parietal.saclay.inria.fr/"&gt;My research group&lt;/a&gt; is kick starting a new project, called
&lt;strong&gt;AzureBrain&lt;/strong&gt; to do computational analysis of large brain imaging and
genetics population-wise data. One of the goals of the project is to
harness the power of grid computing to do statistical learning on fMRI
data, finding features in an individuals …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="https://parietal.saclay.inria.fr/"&gt;My research group&lt;/a&gt; is kick starting a new project, called
&lt;strong&gt;AzureBrain&lt;/strong&gt; to do computational analysis of large brain imaging and
genetics population-wise data. One of the goals of the project is to
harness the power of grid computing to do statistical learning on fMRI
data, finding features in an individuals brain images that can be
predicted by his genome. The medical applications cover the wide scope
of genetically-related brain pathologies, such as autism.&lt;/p&gt;
&lt;p&gt;Want to work in a dynamic and exiting environment, using Python to solve
challenging data analysis? We are looking for a post-doctoral fellow to
hire in spring/beginning of summer. The ideal candidate would have a
strong background in computational statistics or machine learning, as
well as parallel computing, however we will consider any candidate with
good experience in one or the other and a strong desire to learn.&lt;/p&gt;
&lt;p&gt;You would be employed by &lt;a class="reference external" href="http://www.inria.fr"&gt;INRIA&lt;/a&gt;, the lead computing research institute
in France. We are a team of computer scientists specialized in image
processing and statistical data analysis, integrated in one of the top
French brain research centers, &lt;a class="reference external" href="http://www-dsv.cea.fr/en/instituts/institut-d-imagerie-biomedicale-i2bm/services/neurospin-d.-le-bihan"&gt;NeuroSpin&lt;/a&gt;, south of Paris. We work
mostly in Python. The team includes core contributors to the
&lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn project&lt;/a&gt;, for machine learning in Python, and the &lt;a class="reference external" href="http://nipy.sourceforge.net/"&gt;nipy
project&lt;/a&gt;, for NeuroImaging in Python.&lt;/p&gt;
&lt;p&gt;Below follows a summary of &lt;a class="reference external" href="http://parietal.saclay.inria.fr/open-positions/azure-brain-post-doc-proposal"&gt;the official job announcement&lt;/a&gt;. Please
contact Bertrand Thirion, (first name _dot_ last name _at_ inria
_dot_ fr) if you are interested, referencing the AzureBrain project.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="section" id="introduction"&gt;
&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Imaging genetic studies linking functional MRI data and Single
Nucleotide Polyphormisms (SNPs) data face a dire multiple comparisons
issue. In the genome dimension, genotyping DNA chips allow to record of
several hundred thousands values per subject, while in the imaging
dimension a brain image may contain 100k-1M voxels. Finding the brain
and genome regions that may be involved in this link entails a huge
number of hypotheses, hence a drastic correction of the statistical
significance of pairwise relationships, which in turn reduces crucially
the sensitivity of statistical procedures that aims at detecting the
association. It is therefore desirable to set up as sensitive techniques
as possible to explore where in the brain and where in the genome a
significant link can be detected, while correcting for family-wise
multiple comparisons (controlling for false positive rate). Another
issue is the computational cost of these procedures, that need to be
addressed with adequate algorithmic and computational tools.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="objectives"&gt;
&lt;h2&gt;Objectives&lt;/h2&gt;
&lt;p&gt;In this project, we will consider a unique dataset acquired in the
&lt;a class="reference external" href="http://www.imagen-europe.com"&gt;Imagen project&lt;/a&gt;, an FP6 project that aims at investigating factors of
addition in a population of adolescents; Imagen’s database contains
multi-modal neuroimaging as well as genetics and psychological data on
about 2000 subjects. This database is hosted and processed at Neurospin
and is available for research purpose. The candidate will be in charge
of:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Setting an analysis pipeline (based on code already available to
analyze neuroimaging/genetics datasets) to extract and pre-process
the relevant data for statistical analysis.&lt;/li&gt;
&lt;li&gt;Performing statistical analysis on simulated datasets and sub-parts
of the whole database in order to set all the computational
framework. These procedures will include mass-univariate linear
modeling (with peak- and cluster-level tests), regularized multiple
regression and a permutation-based assessment framework.&lt;/li&gt;
&lt;li&gt;Launch data analysis on a large scale grid and cloud environment,
with the help of the Kerdata researchers (see below).&lt;/li&gt;
&lt;li&gt;Build the post-analytic framework to ease the interpretation of the
results in both neuroimaging and genetics domains.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The analysis framework is based on algorithmic tools developed in
C/Python (numpy, scipy and scikit-learn). The candidate will interact i)
with researchers of the Parietal team for algorithmic aspects, but also
ii) with CEA researchers of Neurospin, who will provide expertise in
genetics domain and iii) with the KerData team (INRIA Rennes) and the
Joint MSR-INRIA Research Center (Microsoft Research), that will provide
help and massive computation facilities. The project has an access to
grid/cloud computing facilities to be used in collaboration with
INRIA/Kerdata and MSR-INRIA partners.&lt;/p&gt;
&lt;p&gt;The expected results is the discovery of correlation between brain
activation and genetic information.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="required-knowledge-and-background"&gt;
&lt;h2&gt;Required knowledge and background&lt;/h2&gt;
&lt;p&gt;The candidate should have at least a basic knowledge of standard
statistical concepts. He or she should have a first significant
experience in parallel computation and with python language. It is
important that he or she has some real interest in genetics and/or brain
imaging in order to have strong interactions with specialists of these
domains. He or she will benefit from the algorithmic tools developed at
Parietal and of the database settings and data pre-processing tools
developed by Neurospin researchers.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="jobs"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category></entry><entry><title>EuroSciPy 2011: the dates are out - Aug 25-28, Paris</title><link href="https://gael-varoquaux.info/programming/euroscipy-2011-the-dates-are-out-aug-25-28-paris.html" rel="alternate"></link><published>2011-01-16T15:57:00+01:00</published><updated>2011-01-16T15:57:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-01-16:/programming/euroscipy-2011-the-dates-are-out-aug-25-28-paris.html</id><summary type="html">&lt;p&gt;We have finally been able to settle on final dates and venue for
&lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy_2011"&gt;EuroSciPy 2011&lt;/a&gt;, the 4th European meeting on Python in Science.&lt;/p&gt;
&lt;p&gt;The conference will be held &lt;strong&gt;from Thursday August 25th, to Sunday
August 28th&lt;/strong&gt;. The &lt;a class="reference external" href="http://www.ens.fr"&gt;ENS&lt;/a&gt; will be hosting the conference once again,
right in the center of …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We have finally been able to settle on final dates and venue for
&lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy_2011"&gt;EuroSciPy 2011&lt;/a&gt;, the 4th European meeting on Python in Science.&lt;/p&gt;
&lt;p&gt;The conference will be held &lt;strong&gt;from Thursday August 25th, to Sunday
August 28th&lt;/strong&gt;. The &lt;a class="reference external" href="http://www.ens.fr"&gt;ENS&lt;/a&gt; will be hosting the conference once again,
right in the center of Paris.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="conferences"></category></entry><entry><title>Scientific publication for software development</title><link href="https://gael-varoquaux.info/programming/scientific-publication-for-software-development.html" rel="alternate"></link><published>2011-01-08T22:40:00+01:00</published><updated>2011-01-08T22:40:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2011-01-08:/programming/scientific-publication-for-software-development.html</id><summary type="html">&lt;p&gt;The academic community seems to judge the validity and significance of
any contribution by the number of papers published and the number of
citations they get. To find funding, to get credit, you have to
&lt;strong&gt;publish or perish&lt;/strong&gt;. However, the natural output of software
development tends not to be an …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The academic community seems to judge the validity and significance of
any contribution by the number of papers published and the number of
citations they get. To find funding, to get credit, you have to
&lt;strong&gt;publish or perish&lt;/strong&gt;. However, the natural output of software
development tends not to be an article (people who confuse articles and
documentation do a poor job of both, IMHO).&lt;/p&gt;
&lt;p&gt;While I believe that this policy is harmful for the quality of research,
I also know that I cannot fight it, and chances are that many other are
in my situation. As such, we need to publish scientific papers about the
scientific softwares that we develop (such as &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi&lt;/a&gt;, or
&lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt;, as far as I am concerned). On the other hand, as an
editor of the &lt;a class="reference external" href="http://conference.scipy.org/proceedings.html"&gt;Scipy conference proceedings&lt;/a&gt;, I have found that the
process of writing a paper on software work and going through peer
review can be greatly beneficial to the software. Indeed, it forces
authors to do a thorough review of the prior work, and to clearly
identify the purpose of the project. Also, such an article can only be
much shorter than a user manual, thus it forces the authors to identify
the key concepts of their software, and explain them clearly. As a
result, it helps finding design and usability flaws and gaining insight
on how the user manual can be structured.&lt;/p&gt;
&lt;p&gt;A major challenge to publishing is that most of the highly-ranked
journals tend to disregard software works, unless they are very specific
to a scientific problem, which actually makes them less useful to the
complete ecosystem. Deeply rooted in the minds of the editors and the
reviewers, there tends to be the idea that developing software is easy
compared to doing experiments or proofs. In addition, these top-notch
scientists are not always the most qualified to judge the quality of
software, as they have most often never worked in a major software
project. The good news is that this is slowing changing with the
creation of software tracks in specialized journals, and the development
of new journals focused on scientific software.&lt;/p&gt;
&lt;div class="section" id="journals-for-publishing-about-interdisciplinary-scientific-software"&gt;
&lt;h2&gt;Journals for publishing about interdisciplinary scientific software&lt;/h2&gt;
&lt;p&gt;In my opinion, interdisciplinary scientific software such as &lt;a class="reference external" href="http://numpy.scipy.org/"&gt;numpy&lt;/a&gt;,
the &lt;a class="reference external" href="http://www.gnu.org/software/gsl/"&gt;GSL&lt;/a&gt;, &lt;a class="reference external" href="http://www.gnu.org/software/octave/"&gt;octave&lt;/a&gt;, &lt;a class="reference external" href="http://www.scilab.org/"&gt;scilab&lt;/a&gt;, &lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt;, or &lt;a class="reference external" href="http://www.fenicsproject.org"&gt;Fenics&lt;/a&gt;, are the
most valuable projects, as they provide foundations to build science in
the open. The challenge that these projects have to face are not only
algorithmic or computational, but also deal with providing good user
interfaces, or developing and catering for very large communities of
users. These problems are considered as &lt;em&gt;solved&lt;/em&gt; in a scientific
context, as they have all been solved at least once, often quite
successfully by commercial products such as Matlab. As a result, it is
hard to get some funding for these projects unless there is a political
reason behind the funding, and IMHO politics tend to produce bad
software. Publishing high-profile articles on interdisciplinary
scientific software is thus hard, but critical. For this we need
journals that accept software papers, but are not only read by
researchers in CS or IT departments.&lt;/p&gt;
&lt;p&gt;A couple of years ago, some of us made a review of where it was possible
to publish truly wide-scope scientific software, and we found that there
was pretty much no option. It’s crazy to see that things have still not
changed much, and that all lot of major general-purpose widely-used
projects, like the one I cited above, have never been acknowledged by a
publication.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://cise.aip.org/"&gt;Computing in Science and Engineering&lt;/a&gt;: a joint publication
between the AIP (American Institute of Physics) and the IEEE, it is a
magazine-style journal and it can be seen in many coffee rooms of
computational-science departments. Thanks to that it gets a lot of
reading, but the articles cannot be too technical (which might be a
good thing) and there is room for only few articles.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.openresearchcomputation.com/"&gt;Open Research Computation (ORC)&lt;/a&gt;: A newly-created journal, with
a focus on making computational research reproducible. As such, it
favors papers about open source scientific software with good
software-engineering. &lt;strong&gt;Open access&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to these software-friendly journals, some large-scope
journals on computational science sometime accept software papers,
though software production fall out of their scope:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.elsevier.com/locate/jocs/"&gt;Journal of Computational Science&lt;/a&gt;: a very multidisciplinary
journal.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.siam.org/journals/sisc.php"&gt;SIAM Journal on Scientific Computing (SISC)&lt;/a&gt;: a journal of the
SIAM (society for industrial and applied mathematics), thus with a
focus on engineering-type applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="journals-for-publishing-domain-specific-scientific-software"&gt;
&lt;h2&gt;Journals for publishing domain-specific scientific software&lt;/h2&gt;
&lt;p&gt;It is usually easier to publish a domain-specific software contribution,
as you can claim that you have solved a well-identified scientific
roadblock. Until recently, it was hard to get such papers in the best
journals of a community, but things have been changing.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.elsevier.com/locate/cpc"&gt;Computer Physics Communications&lt;/a&gt;: for algorithms and packages
solving numerical and computational problems related to physics.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://bioinformatics.oxfordjournals.org/"&gt;Bioinformatics&lt;/a&gt;: accepts software papers on biology-related
problems.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://toms.acm.org/"&gt;ACM Transactions On Mathematical Software (TOMS)&lt;/a&gt;: a journal of
the ACM (Association for Computing Machinery), thus with a focus on
algorithms.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.jstatsoft.org/"&gt;Journal of statistical Software&lt;/a&gt;: this journal comes from the
community of people who wrote the R language. They know that open
source scientific software is hard and important. &lt;strong&gt;Open access&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://jmlr.csail.mit.edu/mloss/"&gt;Journal of Machine Learning Research (JMLR), Machine Learning Open
Source (MLOSS) track&lt;/a&gt;: reference journal in the machine learning
community, the MLOSS track cares strongly about documentation,
packaging and usability of the software. &lt;strong&gt;Open access&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.elsevier.com/wps/find/journaldescription.cws_home/398/description#description"&gt;Computers &amp;amp; Geoscience&lt;/a&gt;: computational geoscience journal that
accepts software papers (thanks Michael Aye for the pointer).&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291099-0542"&gt;Computer Applications in Engineering Education&lt;/a&gt;: a journal
about education with computers. AFAIK, no special focus on open
source or software-engineering quality (thanks Doug Holton for the
pointer).&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.springer.com/biomed/neuroscience/journal/12021"&gt;NeuroInformatics&lt;/a&gt; and &lt;a class="reference external" href="http://www.frontiersin.org/neuroinformatics"&gt;Frontiers NeuroInformatics&lt;/a&gt; (&lt;strong&gt;open
access&lt;/strong&gt;): two journals on computer-related issues in neuroscience
that accept software papers. I have the feeling that the latter is a
bit warmer to open source that the former (thanks Andrew Davison for
the pointer).&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.elsevier.com/wps/find/journaldescription.cws_home/503304/description#description"&gt;Computers and Electronics in Agriculture&lt;/a&gt;: for publishing
agriculture-related software (thanks John B. Cole for the pointer).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I should stress that, in my opinion, journals such as &lt;a class="reference external" href="http://www.ploscompbiol.org"&gt;PLOS
computational biology&lt;/a&gt;, or the &lt;a class="reference external" href="http://www.elsevier.com/wps/find/journaldescription.cws_home/622866/description#description"&gt;Journal of Computational Physics&lt;/a&gt;, or
are not great venues for software papers, as they tend to emphasize what
I would call &lt;em&gt;proof of principle&lt;/em&gt;, and not packaged and maintained
software.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I have the feeling that there is need for more communication on
scientific software. The list above is, of course, incomplete. If you
have extra ideas, please do not hesitate to contact me.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As a conclusion, I would like to point out that conferences are also a
good way to advertise scientific software. You may even get approached
by the editor of a journal to open the door for a journal article. Last
year I was at &lt;a class="reference external" href="http://hpfem.org/events/esco-2010/"&gt;ESCO&lt;/a&gt;, a coupled problems conference, and there was a
track on Python in science. All in all the conference was a huge amount
of fun, and I learned a lot on practical aspects of numerical methods,
given the amount of numerical computing geeks that were around. The same
community is organizing &lt;a class="reference external" href="http://hpfem.org/events/femtec-2011/"&gt;FEMTEC&lt;/a&gt; in Lake Tahoe (California) this year.
If you are in any field related to FEM or multiphysics, you should
consider it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update: added links suggested by Doug Holton, Michael Aye, Andrew
Davison, and John B. Cole&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category><category term="publishing"></category></entry><entry><title>ICA versus PCA in the scikit-learn: the value of code over pictures</title><link href="https://gael-varoquaux.info/programming/ica-versus-pca-in-the-scikit-learn-the-value-of-code-over-pictures.html" rel="alternate"></link><published>2010-11-20T16:12:00+01:00</published><updated>2010-11-20T16:12:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-11-20:/programming/ica-versus-pca-in-the-scikit-learn-the-value-of-code-over-pictures.html</id><summary type="html">&lt;p&gt;When I was trying to get an intuitive feeling of the difference between
&lt;strong&gt;Independent Component Analysis&lt;/strong&gt; (ICA) and &lt;strong&gt;Principal Component
Analysis&lt;/strong&gt; (PCA), I wrote a few Python scripts producing &lt;a class="reference external" href="http://gael-varoquaux.info/scientific_computing/ica_pca/index.html"&gt;some
visualizations explaining the difference&lt;/a&gt; that have had a bit of
success.&lt;/p&gt;
&lt;p&gt;During the last sprint on &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt;, a machine learning …&lt;/p&gt;</summary><content type="html">&lt;p&gt;When I was trying to get an intuitive feeling of the difference between
&lt;strong&gt;Independent Component Analysis&lt;/strong&gt; (ICA) and &lt;strong&gt;Principal Component
Analysis&lt;/strong&gt; (PCA), I wrote a few Python scripts producing &lt;a class="reference external" href="http://gael-varoquaux.info/scientific_computing/ica_pca/index.html"&gt;some
visualizations explaining the difference&lt;/a&gt; that have had a bit of
success.&lt;/p&gt;
&lt;p&gt;During the last sprint on &lt;a class="reference external" href="http://scikit-learn.org"&gt;scikit-learn&lt;/a&gt;, a machine learning
toolkit in Python, we cleaned up the ICA code that I had been using, and
we added it to the scikit, along with &lt;a class="reference external" href="http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html"&gt;an example&lt;/a&gt; inspired from this
earlier toy problem.&lt;/p&gt;
&lt;a class="reference external image-reference" href="http://scikit-learn.org/stable/auto_examples/decomposition/plot_ica_vs_pca.html"&gt;&lt;img alt="" class="align-center" src="http://scikit-learn.org/stable/_images/sphx_glr_plot_ica_vs_pca_001.png" /&gt;&lt;/a&gt;
&lt;p&gt;While the pictures are not as pretty as the initial ones I had done
(because we wanted to keep the example as simple as possible), I am very
happy that this discussion is know more than a set of static pictures,
but comes with runnable code.&lt;/p&gt;
&lt;p&gt;This illustrates very well my feelings on the future of scientific code
and scientific research: paper, books, teaching materials, on numerical
methods or computational science are greatly enhanced when they come
with highly-readable code that illustrates their purpose, because the
reader can start asking questions to the algorithm. Hopefully, &lt;strong&gt;the
documentation of scientific programming toolkits will become the
textbooks of tomorrow&lt;/strong&gt;. We still have a lot of work to.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;It’s funny, I just realized that my vision on software might have been
strongly influenced by the fact that my mother, a high-school math
teacher, spent endless nights when I was a teenager working on
&lt;a class="reference external" href="http://fr.wikipedia.org/wiki/G%C3%A9oplan"&gt;Geoplan&lt;/a&gt;, a software for teaching geometry by interaction with
figures.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category></entry><entry><title>Multitouch with VTK (and MedINRIA and Mayavi)</title><link href="https://gael-varoquaux.info/programming/multitouch-with-vtk-and-medinria-and-mayavi.html" rel="alternate"></link><published>2010-09-18T09:40:00+02:00</published><updated>2010-09-18T09:40:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-09-18:/programming/multitouch-with-vtk-and-medinria-and-mayavi.html</id><summary type="html">&lt;p&gt;If the videos on this post are not showing, click through to see
them.&lt;/p&gt;
&lt;p&gt;A colleague of mine, &lt;a class="reference external" href="http://sites.google.com/site/pierrefillard/"&gt;Pierre Fillard&lt;/a&gt;, has just integrated multitouch
in the next generation of the VTK-based medical imaging software
&lt;a class="reference external" href="http://www-sop.inria.fr/asclepios/software/MedINRIA/"&gt;MedINRIA&lt;/a&gt;. The nice thing is that it works on an Apple laptop out of
the box …&lt;/p&gt;</summary><content type="html">&lt;p&gt;If the videos on this post are not showing, click through to see
them.&lt;/p&gt;
&lt;p&gt;A colleague of mine, &lt;a class="reference external" href="http://sites.google.com/site/pierrefillard/"&gt;Pierre Fillard&lt;/a&gt;, has just integrated multitouch
in the next generation of the VTK-based medical imaging software
&lt;a class="reference external" href="http://www-sop.inria.fr/asclepios/software/MedINRIA/"&gt;MedINRIA&lt;/a&gt;. The nice thing is that it works on an Apple laptop out of
the box.&lt;/p&gt;
&lt;p&gt;
&lt;object width="640" height="385"&gt;
&lt;embed src="http://www.youtube.com/v/UyO4KRnYreU?fs=1&amp;amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"&gt;
&lt;/embed&gt;
&lt;/object&gt;
&lt;/p&gt;&lt;p&gt;On &lt;a class="reference external" href="https://sites.google.com/site/pierrefillard/coding-blog/multi-touchgesturesinvtk"&gt;his blog&lt;/a&gt;, he explain how he did this (warning, it involves C++ and
VTK programming). &lt;strong&gt;He also gives the code for this!&lt;/strong&gt; Enjoy.&lt;/p&gt;
&lt;p&gt;This reminded me of when the &lt;a class="reference external" href="http://www.enthought.com/"&gt;Enthought guys&lt;/a&gt; had rigged up a large
multitouch screen and wired it in &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi&lt;/a&gt; for 3D plotting, and in
&lt;a class="reference external" href="http://code.enthought.com/projects/chaco/"&gt;chaco&lt;/a&gt; for 2D plotting, using only a web-cam, a video projector, and
pure Python image-analysis code:&lt;/p&gt;
&lt;p&gt;
&lt;object width="480" height="385"&gt;
&lt;embed src="http://www.youtube.com/v/bEf3nGjOgpU?fs=1&amp;amp;hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"&gt;
&lt;/embed&gt;
&lt;/object&gt;
&lt;/p&gt;</content><category term="programming"></category><category term="mayavi"></category><category term="python"></category><category term="scientific computing"></category></entry><entry><title>Scikit Learn coding sprint</title><link href="https://gael-varoquaux.info/programming/scikit-learn-coding-sprint.html" rel="alternate"></link><published>2010-09-04T17:43:00+02:00</published><updated>2010-09-04T17:43:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-09-04:/programming/scikit-learn-coding-sprint.html</id><summary type="html">&lt;p&gt;We have been really crap at communicating the next &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt;
coding sprint. It’s next week!&lt;/p&gt;
&lt;p&gt;The coding sprint will take place the 8 and 9 September at &lt;a class="reference external" href="http://maps.google.fr/maps/place?oe=utf-8&amp;amp;rls=com.mandriva:en-US:official&amp;amp;client=firefox-a&amp;amp;um=1&amp;amp;ie=UTF-8&amp;amp;q=inria+saclay&amp;amp;fb=1≷=fr&amp;amp;hq=inria&amp;amp;hnear=Saclay&amp;amp;cid=14838681423181723946"&gt;INRIA
Saclay&lt;/a&gt;, near Paris, in the room K110 (building K).&lt;/p&gt;
&lt;p&gt;For those who cannot make it, it will be possible to participate …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We have been really crap at communicating the next &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt;
coding sprint. It’s next week!&lt;/p&gt;
&lt;p&gt;The coding sprint will take place the 8 and 9 September at &lt;a class="reference external" href="http://maps.google.fr/maps/place?oe=utf-8&amp;amp;rls=com.mandriva:en-US:official&amp;amp;client=firefox-a&amp;amp;um=1&amp;amp;ie=UTF-8&amp;amp;q=inria+saclay&amp;amp;fb=1≷=fr&amp;amp;hq=inria&amp;amp;hnear=Saclay&amp;amp;cid=14838681423181723946"&gt;INRIA
Saclay&lt;/a&gt;, near Paris, in the room K110 (building K).&lt;/p&gt;
&lt;p&gt;For those who cannot make it, it will be possible to participate using
the IRC chan (#scikit-learn on irc.freenode.net).&lt;/p&gt;
&lt;p&gt;We will start at 9am (Paris time), and a sketch of the planning can be
found &lt;a class="reference external" href="http://sourceforge.net/apps/trac/scikit-learn/wiki/SprintPlanning"&gt;here&lt;/a&gt;. In particular:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;More docs! we still need tutorials: features selection, model
selection, cross-validation, etc..&lt;/li&gt;
&lt;li&gt;Make the &lt;a class="reference external" href="http://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/pipeline.py"&gt;pipeline object&lt;/a&gt; really work + illustration in different
contexts.&lt;/li&gt;
&lt;li&gt;Clean up and doc for bayesian approaches.&lt;/li&gt;
&lt;li&gt;Implementation of PCA (fit + transform).&lt;/li&gt;
&lt;li&gt;FastICA (adapt the &lt;a class="reference external" href="http://github.com/GaelVaroquaux/canica/blob/master/canica/algorithms/fastica.py"&gt;CanICA&lt;/a&gt; code)&lt;/li&gt;
&lt;li&gt;LDA : Covariance estimators (Ledoit-Wolf) and add transform.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/preprocessing.py"&gt;Preprocessing routines&lt;/a&gt; (center, standardize) with fit transform.&lt;/li&gt;
&lt;li&gt;Anything that you have a particular interest in.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Do not hesitate to send on the &lt;a class="reference external" href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general"&gt;mailing list&lt;/a&gt; some advices on this
(incomplete…) list, and see you next week!&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;&lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt; is a Python module for efficient and easy machine
learning using scipy and numpy.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="scikit-learn"></category></entry><entry><title>Software design for maintainability</title><link href="https://gael-varoquaux.info/programming/software-design-for-maintainability.html" rel="alternate"></link><published>2010-08-01T23:47:00+02:00</published><updated>2010-08-01T23:47:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-08-01:/programming/software-design-for-maintainability.html</id><summary type="html">&lt;p&gt;I have just spent the best part of my Sunday fixing a bug that turned
out being a &lt;a class="reference external" href="https://svn.enthought.com/enthought/changeset/25699/"&gt;seemingly-trivial two-liner&lt;/a&gt;. Such unpleasant experiences
are all too frequent, and weight a lot on my view of code design.&lt;/p&gt;
&lt;div class="section" id="my-stance-on-code-design"&gt;
&lt;h2&gt;My stance on code design&lt;/h2&gt;
&lt;img alt="" class="align-right" src="https://gael-varoquaux.info/programming/attachments/software_design_for_maintainability/cool-car-drawing-5.jpg" style="width: 30%;" /&gt;
&lt;p&gt;I call &lt;em&gt;code design&lt;/em&gt; the process of designing …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;I have just spent the best part of my Sunday fixing a bug that turned
out being a &lt;a class="reference external" href="https://svn.enthought.com/enthought/changeset/25699/"&gt;seemingly-trivial two-liner&lt;/a&gt;. Such unpleasant experiences
are all too frequent, and weight a lot on my view of code design.&lt;/p&gt;
&lt;div class="section" id="my-stance-on-code-design"&gt;
&lt;h2&gt;My stance on code design&lt;/h2&gt;
&lt;img alt="" class="align-right" src="https://gael-varoquaux.info/programming/attachments/software_design_for_maintainability/cool-car-drawing-5.jpg" style="width: 30%;" /&gt;
&lt;p&gt;I call &lt;em&gt;code design&lt;/em&gt; the process of designing the architecture of a
piece of software: what are the objects it uses? how do they interact?
how is the information passed around?…&lt;/p&gt;
&lt;p&gt;My view of code design and software engineering has progressively
evolved to favor &lt;strong&gt;extreme simplicity&lt;/strong&gt; over sophistication. I believe
that a good programmer should know &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29"&gt;design patterns&lt;/a&gt;, &lt;a class="reference external" href="http://gael-varoquaux.info/computers/python_advanced/index.html"&gt;powerful
language features&lt;/a&gt;, &lt;a class="reference external" href="http://scipy2010.blogspot.com/2010/06/tutorials-day-1-advanced-numpy.html"&gt;libraries dark corners&lt;/a&gt;, and &lt;em&gt;not use them unless
absolutely necessary&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="some-rules-of-thumb"&gt;
&lt;h2&gt;Some rules of thumb&lt;/h2&gt;
&lt;p&gt;Here are some rules that I apply nowadays when writing code that I would
like to last (I am aware that some of them go against well-advertised
best practices).&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Keep it as simple a possible, really!&lt;/strong&gt; Experimental results have
shown that the tractability of a code base goes down as the square of
the number of interactions, and thus much quicker than the number of
lines in a project. Each time you add a line, think about it: can you
make simpler? If not you’ll have to find resources to maintain your
project as fixing bugs or adding features will grow harder.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Design for the 80% usecases.&lt;/strong&gt; In the same vein, a small decrease
in the requirements can make your project much simpler
&lt;a class="reference external" href="http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F32%2F35909%2F01702600.pdf%3Farnumber%3D1702600&amp;amp;authDecision=-203"&gt;[Woodfield1979]&lt;/a&gt;. Corner cases and minor usecases should not make
the whole project complex and hard to maintain. If you can, give up
on what is bringing in complexity. If you cannot, isolate it, and
don’t let it sit at the core of your design.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Don’t design for the future.&lt;/strong&gt; Again the same core idea: don’t
start planing for all the usecases, and all the difficulties that you
haven’t encountered, you will most certainly design wrong, and
chances are that you’ll add complexity that you do not use. Design
simple, design cleanly and refactor as you go, based on concrete
problems. This is known as the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/You_aren't_gonna_need_it"&gt;“YAGNI principle”&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;img alt="" class="align-center" src="https://gael-varoquaux.info/programming/attachments/software_design_for_maintainability/howtobuildmvp.gif" style="width: 60%;" /&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Don’t be clever.&lt;/strong&gt; Each time you do a clever trick, whoever has to
read and maintain this code will have to understand it (that person
may be you, in a few years). Chances are that they’ll get it wrong
and start by loosing a lot of time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repeating yourself may actually be OK.&lt;/strong&gt; This is a case of
&lt;em&gt;practicality beats purity&lt;/em&gt;. Repeating code is really a bad thing in
software design, because it leads to an increased number of lines to
debug, and tends to hinder reusability. However, adding complexity in
order to save a few lines of duplicated code will cost you more in
the long run.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use objects sparingly.&lt;/strong&gt; Object are great, but are they always
need? An object with a single method &lt;em&gt;eval&lt;/em&gt; can probably simply be
implemented by a function. The limitation of objects is that they all
have a different behavior. As a result, the users and maintainers of
your codebase will first have to understand how all your classes
interact before understanding your code. This also means that there
is a lot of benefit in making many different classes that have the
same interface.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avoid abstractions and levels of indirection.&lt;/strong&gt; The more levels of
code piled on top one of the other, the more layers your maintainer
is going to have to inspect to find were the bug might be. An
abstraction hides another object or algorithm. To debug code, chances
are that all the black boxes will first have to be opened.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="coding-for-others-to-debug"&gt;
&lt;h2&gt;Coding for others to debug&lt;/h2&gt;
&lt;blockquote class="epigraph"&gt;
“Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it.” - Brian W. Kernighan&lt;/blockquote&gt;
&lt;img alt="" class="align-right" src="https://gael-varoquaux.info/programming/attachments/software_design_for_maintainability/auto-graveyard-1.jpg" style="width: 40%;" /&gt;
&lt;p&gt;You may think that I am overemphasizing simplicity at the cost of
functionality. Well, think about the future of your code. The net is
full of unmaintained and abandoned code. If you want your project to
grow and have a future, you will probably need people to help you. For a
given purpose, the easiest the code is to read and debug, the more
chances you will have to pick momentum.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Some external references I like (about software engineering, rather than
debugging):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Edmon Lau: &lt;a class="reference external" href="http://www.theeffectiveengineer.com/blog/hidden-costs-that-engineers-ignore"&gt;Hidden costs that engineers ignore&lt;/a&gt;
(&lt;strong&gt;Read this&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;Titus Brown: &lt;a class="reference external" href="http://ivory.idyll.org/blog/sep-07/not-sucking-v2"&gt;Writing (Python) Code that Doesn’t Suck&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Peter Norvig: &lt;a class="reference external" href="http://norvig.com/21-days.html"&gt;Teach yourself programming in 10 years&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Paul Stachour and David Collier-Brown: &lt;a class="reference external" href="http://cacm.acm.org/magazines/2009/11/48444-you-dont-know-jack-about-software-maintenance/fulltext"&gt;You Don’t Know Jack About
Software Maintenance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Greg Wilson: &lt;a class="reference external" href="http://software-carpentry.org/"&gt;Software carpentry: a course in software engineering&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="software engineering"></category><category term="software architecture"></category><category term="python"></category><category term="selected"></category></entry><entry><title>Sprint Scikit learn in Paris</title><link href="https://gael-varoquaux.info/programming/sprint-scikit-learn-in-paris.html" rel="alternate"></link><published>2010-07-23T14:31:00+02:00</published><updated>2010-07-23T14:31:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-07-23:/programming/sprint-scikit-learn-in-paris.html</id><summary type="html">&lt;p&gt;We are organizing a coding sprint in Paris on &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit learn&lt;/a&gt;,
&lt;strong&gt;machine learning in Python&lt;/strong&gt;. The goal of this sprint is to set the
API and the general coding guidelines of the scikit to be able to tackle
many different statistical learning problems in a consistent framework.&lt;/p&gt;
&lt;p&gt;This is why …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We are organizing a coding sprint in Paris on &lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit learn&lt;/a&gt;,
&lt;strong&gt;machine learning in Python&lt;/strong&gt;. The goal of this sprint is to set the
API and the general coding guidelines of the scikit to be able to tackle
many different statistical learning problems in a consistent framework.&lt;/p&gt;
&lt;p&gt;This is why we would like to have people with different problems,
applications, and backgrounds to pitch in.&lt;/p&gt;
&lt;p&gt;It will be a two-days sprint. Everyone is welcome, so just fill in the
&lt;a class="reference external" href="http://www.doodle.com/4cqxnhuq5rr4qzn5"&gt;doodle&lt;/a&gt;, so that we can choose the date?&lt;/p&gt;
&lt;p&gt;And do not hesitate to suggest some topics that you would like to be
addressed during the sprint, and to discuss them on the &lt;a class="reference external" href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general"&gt;mailing-list&lt;/a&gt;!&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://parietal.saclay.inria.fr/Members/vincent-michel"&gt;Vincent Michel&lt;/a&gt; is organizing the sprint. If you have questions about
the sprint, you are welcomed to contact me, but please do put him in Cc
to.&lt;/p&gt;
</content><category term="programming"></category><category term="scikit-learn"></category><category term="scipy"></category><category term="scientific computing"></category><category term="sprint"></category><category term="conferences"></category></entry><entry><title>Simple object signatures</title><link href="https://gael-varoquaux.info/programming/simple-object-signatures.html" rel="alternate"></link><published>2010-07-16T23:31:00+02:00</published><updated>2010-07-16T23:31:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-07-16:/programming/simple-object-signatures.html</id><summary type="html">&lt;div class="section" id="a-signature-pattern"&gt;
&lt;h2&gt;A &lt;em&gt;signature&lt;/em&gt; pattern&lt;/h2&gt;
&lt;p&gt;There are many libraries around to specify what I call a &lt;em&gt;‘signature’&lt;/em&gt;
for an object, in other words a list of attributes that define its
parameter set. I have heavily used &lt;a class="reference external" href="http://code.enthought.com/projects/traits"&gt;Enthought’s Traits library&lt;/a&gt; for
this purpose, but the concept is fairly general and can be …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="section" id="a-signature-pattern"&gt;
&lt;h2&gt;A &lt;em&gt;signature&lt;/em&gt; pattern&lt;/h2&gt;
&lt;p&gt;There are many libraries around to specify what I call a &lt;em&gt;‘signature’&lt;/em&gt;
for an object, in other words a list of attributes that define its
parameter set. I have heavily used &lt;a class="reference external" href="http://code.enthought.com/projects/traits"&gt;Enthought’s Traits library&lt;/a&gt; for
this purpose, but the concept is fairly general and can be found &lt;em&gt;eg&lt;/em&gt; in
ORMs (Object Relational Mappers) or web frameworks.&lt;/p&gt;
&lt;p&gt;Specification of this interface of parameters may be used to answer a
variety of needs:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Typing&lt;/strong&gt;: in the case of an ORM, to generate UIs, or for better
error management, it may be desirable to have some control on the
types of certain attributes of an object. In this case, specifying
the signature corresponds to laying out a &lt;strong&gt;data model&lt;/strong&gt; for the
object.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reactive programming&lt;/strong&gt;: using properties to react to changes to
attributes, one can fully specify the API of an object in terms of
these attributes. This gives a message-passing like programming style
that can be very well suited to parallel-computing in particular
because it can easily be made thread-safe.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="signatures-for-statistical-learning-objects"&gt;
&lt;h2&gt;Signatures for statistical learning objects&lt;/h2&gt;
&lt;p&gt;Recently, I considered the &lt;em&gt;signature&lt;/em&gt; pattern in a new context. In the
&lt;a class="reference external" href="http://scikit-learn.sourceforge.net/"&gt;scikit-learn&lt;/a&gt;, we are interested in statistical learning. This entails
fitting models to data and often tuning parameters to select a model
that fits best (a problem called &lt;em&gt;model selection&lt;/em&gt;). Each of our models
is an object that implements a couple of key methods to fit to the data
and to apply to new data (&lt;em&gt;fit&lt;/em&gt; and &lt;em&gt;predict&lt;/em&gt;).&lt;/p&gt;
&lt;p&gt;The approach that we are currently taking for model selection is (more
or less) to generate a list of models with different parameters and fit
and test them on the data.&lt;/p&gt;
&lt;p&gt;A very nice feature would be to find out the parameters to vary simply
by inspecting the objects, and such a desire recently got us
&lt;a class="reference external" href="http://sourceforge.net/mailarchive/forum.php?thread_name=201007050958.16199.matthieu.perrot%40cea.fr&amp;amp;forum_name=scikit-learn-general"&gt;discussing&lt;/a&gt; of defining &lt;em&gt;signatures&lt;/em&gt; for our objects. I must confess
that I am a bit weary as this means either depending on a signature
library, or building one. We don’t want to grow our dependencies, and
most signature-definition code that I know involve meta-programming
tricks to avoid code duplication.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="solving-the-simple-problem-avoiding-type-checking"&gt;
&lt;h2&gt;Solving the simple problem: avoiding type checking&lt;/h2&gt;
&lt;p&gt;Today, I had to bite the bullet, because we were in a situation in which
we had to instantiate new models from the existing one during model
selection. For technical reasons, using a &lt;em&gt;copy.copy&lt;/em&gt; to create these
new models was not a great idea, and it was better to have the minimal
list of parameters required. Here come signatures again.&lt;/p&gt;
&lt;p&gt;After a bit of messing around with the code, I realized that typing
information was useless, and most probably harmful, to our immediate
goals and that I just needed the names of the relevant attributes. I
finally settled down to the following solution (which might still
change):&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;All parameters need to be specified as keyword arguments of the
&lt;em&gt;__init__&lt;/em&gt;. The &lt;em&gt;__init__&lt;/em&gt; may not have positional arguments
or ‘*’ arguments. Attributes on the objects have the same names as
the &lt;em&gt;__init__&lt;/em&gt; parameters.&lt;/li&gt;
&lt;li&gt;A simple base class, with couple of methods relying on a simple use
of the &lt;em&gt;inspect&lt;/em&gt; module to find the signature of the &lt;em&gt;__init__&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BaseEstimator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_param_names&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;varargs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kw&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inspect&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getargspec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;cls&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;varargs&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s1"&gt;&amp;#39;scikit learn estimators should always specify their &amp;#39;&lt;/span&gt;
            &lt;span class="s1"&gt;&amp;#39;parameters in the signature of their init (no varargs).&amp;#39;&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Remove &amp;#39;self&amp;#39;&lt;/span&gt;
        &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_get_param_names&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_set_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;valid_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_get_param_names&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iteritems&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;valid_params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Invalid parameter &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s1"&gt; &amp;#39;&lt;/span&gt;
                &lt;span class="s1"&gt;&amp;#39;for estimator &lt;/span&gt;&lt;span class="si"&gt;%s&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;
                &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__class__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="vm"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="nb"&gt;setattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The full code can be seen &lt;a class="reference external" href="attachments/base_estimator.py"&gt;here&lt;/a&gt; and adds a bit more features, such as
a clever &lt;em&gt;__repr__&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;What I like about this solution is that it (almost) does not use
metaprograming, and avoids code duplication without forcing any specific
pattern on the developer subclassing &lt;em&gt;BaseEstimator&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-next-step"&gt;
&lt;h2&gt;The next step&lt;/h2&gt;
&lt;p&gt;This approach solves my immediate problem, but not the bigger one of
finding what values can the different parameters take when varied for
model selection. Of course this second problem is much more complicated,
and maybe it is not worth solving it: the framework could very easily be
bringing in more problems than it solves.&lt;/p&gt;
&lt;p&gt;However, it seems that a fairly easy way of specifying possible values
for parameters would be to decorate the &lt;em&gt;__init__&lt;/em&gt;, giving the
possible parameters to be tested during the model selection:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@cv_params&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;logspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1e-4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;l1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fit_intercept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All the decorator has to do is to store the information in an attribute
attached to the &lt;em&gt;__init__&lt;/em&gt; (and probably to check that the
parameters it was given are valid arguments, in order to raise errors
early). Methods on the class can later inspect this information for
model selection, or GUI building (data-model specification will probably
require some typing language, rather than a simple list of possible
parameters).&lt;/p&gt;
&lt;p&gt;Once again, here we would be avoiding the difficulty of specifying type
information in a non restrictive way, but avoiding a problem that we
don’t have to solve is probably a good idea.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="software engineering"></category><category term="software architecture"></category><category term="design patterns"></category><category term="scientific computing"></category><category term="selected"></category></entry><entry><title>Euroscipy 2010: code, science, and a lot of fun</title><link href="https://gael-varoquaux.info/programming/euroscipy-2010-code-science-and-a-lot-of-fun.html" rel="alternate"></link><published>2010-07-13T17:31:00+02:00</published><updated>2010-07-13T17:31:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-07-13:/programming/euroscipy-2010-code-science-and-a-lot-of-fun.html</id><summary type="html">&lt;p&gt;&lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy2010"&gt;Euroscipy 2010&lt;/a&gt;, the third European conference for the use of Python in
science, is just over, and I think it was a great success.&lt;/p&gt;
&lt;div class="section" id="euroscipy-in-numbers"&gt;
&lt;h2&gt;Euroscipy in numbers&lt;/h2&gt;
&lt;p&gt;&lt;img alt="image0" src="http://farm5.static.flickr.com/4118/4779625445_0e783484cd_m_d.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The attendance this year was huge: there was a grand total of 160 who
came to EuroScipy, with 140 that came only to …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy2010"&gt;Euroscipy 2010&lt;/a&gt;, the third European conference for the use of Python in
science, is just over, and I think it was a great success.&lt;/p&gt;
&lt;div class="section" id="euroscipy-in-numbers"&gt;
&lt;h2&gt;Euroscipy in numbers&lt;/h2&gt;
&lt;p&gt;&lt;img alt="image0" src="http://farm5.static.flickr.com/4118/4779625445_0e783484cd_m_d.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;The attendance this year was huge: there was a grand total of 160 who
came to EuroScipy, with 140 that came only to the tutorials, and 130 only
the conference. This up by almost a factor of 3 compared to last year’s
EuroScipy, more than last year’s SciPy conference in Passadena, and
almost as much as this year’s SciPy conference in Austin that hosted 180
person. We had people coming from 16 country, and as far as New Zealand,
the US, or Turkey. Research lab, education, and industry (small to large
companies) were all well represented, with approximately a third of the
delegates coming from the industry. Similarly, many different scientific
field were discussed, ranging from landscape ecology to pure math.&lt;/p&gt;
&lt;p&gt;There were 2 tutorial tracks with 10 tutorial slots in each track. We
had 2 keynotes from Hans Petter Langtangen and Konrad Hinsen. With
regards to the contributed talks, the conference this year was highly
selective. We received 52 propositions. We unfortunately could accept
only 30 of them, which corresponds to an acceptance rate of 58%.
Finally, we had 18 &lt;a class="reference external" href="http://www.euroscipy.org/talk/937"&gt;lightning talks&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-warm-and-friendly-atmosphere"&gt;
&lt;h2&gt;A warm and friendly atmosphere&lt;/h2&gt;
&lt;p&gt;&lt;img alt="image1" src="http://farm5.static.flickr.com/4097/4774499149_5dda469dc2_m.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;As an organizer, I was really pleased to find out how much people were
relaxed and friendly. This certainly facilitates discussions during the
breaks. And the ambiance was undoubtedly warm: 140 people with laptops
in a room without air conditioning in the Paris summer :).&lt;/p&gt;
&lt;p&gt;Of course during the evenings, many people met to continue the
passionate discussions in restaurants and bars.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="trends-i-noticed"&gt;
&lt;h2&gt;Trends I noticed&lt;/h2&gt;
&lt;p&gt;What one remembers from a conference is obviously biased by personal
interests. With that disclaimer, here are the recurrent and important
topics that I noticed, both in the talks, but also in the coffee break
discussions:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Parallel computing&lt;/strong&gt;, in particular making it easy to do parallel
computing. &lt;a class="reference external" href="http://www.euroscipy.org/talk/2011"&gt;Konrad’s keynote&lt;/a&gt; had many interesting directions to
explore. (talks: &lt;a class="reference external" href="http://www.euroscipy.org/talk/2009"&gt;Playdoh&lt;/a&gt;, &lt;a class="reference external" href="http://www.euroscipy.org/talk/1686"&gt;DANA&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code generation&lt;/strong&gt;. In the various conferences I have been to
recently, I heard much talking about symbolic manipulation of
numerical problems to generate optimal computing kernels (talks:
&lt;a class="reference external" href="http://www.euroscipy.org/talk/1657"&gt;Efficient computation tutorial&lt;/a&gt;, &lt;a class="reference external" href="http://www.euroscipy.org/talk/1666"&gt;Theano&lt;/a&gt;, &lt;a class="reference external" href="http://www.euroscipy.org/talk/2045"&gt;Algorithmic
Differentiation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data management&lt;/strong&gt;, with problems such as provenance tracking for
reproducibility (talks: &lt;a class="reference external" href="http://www.euroscipy.org/talk/1960"&gt;Sumatra&lt;/a&gt;, &lt;a class="reference external" href="http://www.euroscipy.org/talk/880"&gt;Knowledge management
tutorial&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally installation problems of scientific tools were the subject of
many discussions, as each year. One thing that I did notice, is that
people stopped simply blaming each others and acknowledged that nobody
knew how to fix the problem. Somebody even pointed out that installing
any major scientific code was not a piece of cake. Hans Petter and
others said that they had solved the problem by relying on a virtual
machine and Ubuntu.&lt;/p&gt;
&lt;p&gt;Konrad has also &lt;a class="reference external" href="http://khinsen.wordpress.com/2010/07/12/euroscipy-2010/"&gt;blogged&lt;/a&gt;, giving his own view of the conference.&lt;/p&gt;
&lt;p&gt;&lt;img alt="image2" src="http://farm5.static.flickr.com/4097/4778812305_9217c5d3c2_m.jpg" /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="thanks"&gt;
&lt;h2&gt;Thanks&lt;/h2&gt;
&lt;p&gt;The conference could happen only because of the help of many people.
First we need to thank our sponsors: &lt;a class="reference external" href="http://www.enthought.com"&gt;Enthought&lt;/a&gt;, &lt;a class="reference external" href="http://www.python-academy.com/"&gt;Python Academy&lt;/a&gt;,
&lt;a class="reference external" href="http://www.pytables.org"&gt;Pytables&lt;/a&gt;, and especially our host &lt;a class="reference external" href="http://www.ens.fr"&gt;Ecole Normale Supérieure&lt;/a&gt;, which
not only provided us with the rooms, but also made sure that everything
was going well with the sound system, the projection, or the access to
the building. With regards to organization and planing, Nicolas and I
received a lot of help from &lt;a class="reference external" href="http://www.saint-gobain-recherche.com/svi/en/emmanuelle_gouillart.html"&gt;Emmanuelle Gouillart&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category><category term="conferences"></category></entry><entry><title>Personal views on scientific computing</title><link href="https://gael-varoquaux.info/programming/view_on_scientific_computing.html" rel="alternate"></link><published>2010-05-20T00:00:00+02:00</published><updated>2010-05-20T00:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-05-20:/programming/view_on_scientific_computing.html</id><summary type="html">&lt;p&gt;My contributions to the scientific computing software ecosystem are
motivated by my vision on computational science.&lt;/p&gt;
&lt;p&gt;Scientific research relies more and more on computing. However, most of
the researchers are not software engineers, and as computing is becoming
ubiquitous, the limiting factor becomes more and more the &lt;strong&gt;human
factor&lt;/strong&gt; &lt;a class="reference external" href="http://software-carpentry.org/articles/amsci-swc-2006.pdf"&gt;[G …&lt;/a&gt;&lt;/p&gt;</summary><content type="html">&lt;p&gt;My contributions to the scientific computing software ecosystem are
motivated by my vision on computational science.&lt;/p&gt;
&lt;p&gt;Scientific research relies more and more on computing. However, most of
the researchers are not software engineers, and as computing is becoming
ubiquitous, the limiting factor becomes more and more the &lt;strong&gt;human
factor&lt;/strong&gt; &lt;a class="reference external" href="http://software-carpentry.org/articles/amsci-swc-2006.pdf"&gt;[G. Wilson, 2006]&lt;/a&gt; &lt;a class="reference external" href="http://download.on9pc.com/ebook/programing/Teach%20Yourself%20Programming%20in%20Ten%20Years.pdf"&gt;[P.
Norvig, 2009]&lt;/a&gt;.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;To address the needs of computing accross scientific fields, I believe
that we need a &lt;strong&gt;general-purpose&lt;/strong&gt;, &lt;strong&gt;high-level&lt;/strong&gt;, &lt;strong&gt;interactive&lt;/strong&gt;, and
&lt;strong&gt;highly-readable&lt;/strong&gt; language and set of tools for scientific computing.&lt;/p&gt;
&lt;/div&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;C does not answer my needs: does a molecular biologist know about
pointers? Should she?&lt;/li&gt;
&lt;li&gt;Matlab does not answer my needs either: scientific work with computers
is not only about numerical computation. Have you tried writing an
experiment-control software with Matlab? How about file management?
Inserting the algorithms in a web server.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We need better teaching material, that sit at interfaces between software
engineer, and general science. Most top notch tools and libraries are
full of domain-specific jargon and conventions.&lt;/p&gt;
&lt;p&gt;For reproducible science, we need the code to be readable and to reflect
the corresponding scientific operation. We need it to be unit-tested to
ensure its correctness.&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;p class="first admonition-title"&gt;Note&lt;/p&gt;
&lt;p class="last"&gt;We need to consider scientific libraries as end-result of our
research with the same importance than articles &lt;a class="reference external" href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6201"&gt;[J. Buckheit and D.
Donoho. 1995]&lt;/a&gt;.
They need to convey a scientific message, to be &lt;strong&gt;understandable&lt;/strong&gt; and
&lt;strong&gt;refutable&lt;/strong&gt;. New results should be &lt;strong&gt;reproducible&lt;/strong&gt; via published code
&lt;a class="reference external" href="http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.14"&gt;[CISE Jan. 2009]&lt;/a&gt;. As
for established algorithms, scientific libraries with their
&lt;strong&gt;documentation&lt;/strong&gt; and &lt;strong&gt;examples&lt;/strong&gt; should be the textbooks of tomorrow.&lt;/p&gt;
&lt;/div&gt;
&lt;hr class="docutils" /&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Scientific software should be as reusable as possible&lt;/strong&gt;, to enable the
advancement of Science via software, year after year. This means that
we need to build general-purpose libraries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code quality and documentation are crucial&lt;/strong&gt;, as human factors are
often the limitation. As a corollary, scientific code should be
unit-tested to ensure correctness.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Core scientific software should be open source&lt;/strong&gt;, as scientific work
cannot build on black boxes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Algorithms should be written as simply as possible&lt;/strong&gt;. A high level of
sophistication in software engineering should not be a requirement to
all scientists&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer high-level languages&lt;/strong&gt;. The code should be written at the right
level of abstraction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;We need to build common and shared tools&lt;/strong&gt;. Scientific software
shouldn’t be ‘owned’ by a lab.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The source code should a deliverable of the research&lt;/strong&gt;. As a result, it
should read clearly and be understandable to all.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Documentation and examples are the textbooks of tomorrow&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Publications should be reproducible&lt;/strong&gt;. Ideally they should become an
example of the library. This should be mitigated by the fact that code
maintainance is costly, and achieving good code takes more work that
publishing. Focus should be on publications that will give rise to reference
results.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Academia need to value sotware maintainance&lt;/strong&gt;. It is hard and costly,
but it determines our future.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tools that develop the environment, rather than a specific algorithm or
scientific field are crucial&lt;/strong&gt; (one example is IPython).&lt;/li&gt;
&lt;/ul&gt;
&lt;!-- Cite V Stodden --&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Further reading:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Open source Machine Learning software &lt;a class="reference external" href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.5605&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;[S. Sonnenburg et al. 2007]&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Open source mathematical sofware &lt;a class="reference external" href="http://www.ams.org/notices/200710/tx071001279p.pdf"&gt;[D. Joyner and W. Stein 2007]&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Licensing, intellectual property in scientific work
&lt;a class="reference external" href="http://jolt.unc.edu/sites/default/files/7_nc_jl_tech_321.pdf"&gt;[A. Gonzalez 2006]&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Scientific software development best practices
&lt;a class="reference external" href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020087"&gt;[S. Baxter et al. 2006]&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
</content><category term="programming"></category><category term="science"></category><category term="academia"></category><category term="scientific computing"></category><category term="selected"></category><category term="scientific software"></category></entry><entry><title>EuroScipy abstract submission deadline extended</title><link href="https://gael-varoquaux.info/programming/euroscipy-abstract-submission-deadline-extended.html" rel="alternate"></link><published>2010-05-15T23:36:00+02:00</published><updated>2010-05-15T23:36:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-05-15:/programming/euroscipy-abstract-submission-deadline-extended.html</id><summary type="html">&lt;p&gt;Given that we have been able to turn on registration only very late, the
&lt;a class="reference external" href="http://www.euroscipy.org"&gt;EuroScipy&lt;/a&gt; conference committee is extending the deadline for abstract
submission for the 2010 EuroScipy conference.&lt;/p&gt;
&lt;p&gt;On Thursday May 20th, at midnight Samoa time, we will turn off the
abstract submission on the conference site. Up to …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Given that we have been able to turn on registration only very late, the
&lt;a class="reference external" href="http://www.euroscipy.org"&gt;EuroScipy&lt;/a&gt; conference committee is extending the deadline for abstract
submission for the 2010 EuroScipy conference.&lt;/p&gt;
&lt;p&gt;On Thursday May 20th, at midnight Samoa time, we will turn off the
abstract submission on the conference site. Up to then, you can modify
the already-submitted abstract, or submit new abstracts.&lt;/p&gt;
&lt;p&gt;We are very much looking forward to your submissions to the conference.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;Gaël Varoquaux&lt;/div&gt;
&lt;div class="line"&gt;Nicolas Chauvat&lt;/div&gt;
&lt;/div&gt;
&lt;hr class="docutils" /&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;EuroScipy 2010 is the annual European conference for scientists using Python. It will be held July 8-11 2010, in ENS, Paris, France.&lt;/div&gt;
&lt;div class="line"&gt;&lt;strong&gt;Links: `Conference website`_,&amp;nbsp; `Call for papers`_,&amp;nbsp; `Practical information`_&lt;/strong&gt;&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="conferences"></category><category term="science"></category></entry><entry><title>EuroScipy is finally open for registration</title><link href="https://gael-varoquaux.info/programming/euroscipy-is-finally-open-for-registration.html" rel="alternate"></link><published>2010-05-13T13:23:00+02:00</published><updated>2010-05-13T13:23:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-05-13:/programming/euroscipy-is-finally-open-for-registration.html</id><summary type="html">&lt;a class="reference external image-reference" href="attachments/poster_euroscipy_2010.pdf"&gt;&lt;img alt="" src="attachments/poster_euroscipy_2010.jpg" /&gt;&lt;/a&gt;
&lt;div class="section" id="the-registration-for-euroscipy-is-finally-open"&gt;
&lt;h2&gt;The registration for &lt;a class="reference external" href="http://www.euroscipy.org//conference/euroscipy2010"&gt;EuroScipy&lt;/a&gt; is finally open.&lt;/h2&gt;
&lt;p&gt;To register, go to the &lt;a class="reference external" href="http://www.euroscipy.org//conference/euroscipy2010"&gt;website&lt;/a&gt;, create an account, and you will see a
&lt;em&gt;‘register to the conference’&lt;/em&gt; button on the left. Follow it to a page
which presents a &lt;em&gt;‘shoping cart’&lt;/em&gt;. Simply submitting this information
registers you to the conference, and on …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;a class="reference external image-reference" href="attachments/poster_euroscipy_2010.pdf"&gt;&lt;img alt="" src="attachments/poster_euroscipy_2010.jpg" /&gt;&lt;/a&gt;
&lt;div class="section" id="the-registration-for-euroscipy-is-finally-open"&gt;
&lt;h2&gt;The registration for &lt;a class="reference external" href="http://www.euroscipy.org//conference/euroscipy2010"&gt;EuroScipy&lt;/a&gt; is finally open.&lt;/h2&gt;
&lt;p&gt;To register, go to the &lt;a class="reference external" href="http://www.euroscipy.org//conference/euroscipy2010"&gt;website&lt;/a&gt;, create an account, and you will see a
&lt;em&gt;‘register to the conference’&lt;/em&gt; button on the left. Follow it to a page
which presents a &lt;em&gt;‘shoping cart’&lt;/em&gt;. Simply submitting this information
registers you to the conference, and on the left of the website, the
button will now display &lt;em&gt;‘You are registered for the conference’&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The registration fee is 50 euros for the conference, and 50 euros for
the tutorial. Right now there is no payment system: you will be
contacted later (in a week) with instructions for paying.&lt;/p&gt;
&lt;p&gt;We apologize for such a late set up. We do realize this has come as an
inconvenience to people.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Do not wait to register: the number of people we can host is
limited.&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="an-exciting-program"&gt;
&lt;h2&gt;An exciting program&lt;/h2&gt;
&lt;div class="section" id="tutorials-from-beginners-to-experts"&gt;
&lt;h3&gt;Tutorials: from beginners to experts&lt;/h3&gt;
&lt;p&gt;We have two tutorial tracks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/track/871"&gt;**Introductory tutorial**&lt;/a&gt;: to get you to speed on scientific
programming with Python.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/track/872"&gt;**Advanced tutorial**&lt;/a&gt;: experts sharing their knowledge on specific
techniques and libraries.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="scientific-track-doing-new-science-in-python"&gt;
&lt;h3&gt;Scientific track: doing new science in Python&lt;/h3&gt;
&lt;p&gt;Although the abstract submission is not yet over, I can say that we are
going to have a rich set of talks, looking at the current submissions.
In addition to the contributed talks, we have:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy2010"&gt;**Keynote speakers**&lt;/a&gt;: Hans Petter Langtangen and Konrard Hinsen,
two major player of scientific computing in Python.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/talk/937"&gt;**Lightning talks**&lt;/a&gt;: one hour will be open for people to come up
and present in a flash an interesting project.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="publishing-papers"&gt;
&lt;h3&gt;Publishing papers&lt;/h3&gt;
&lt;p&gt;We are talking with the editors of a major scientific computing journal,
and the odds are quite high that we will be able to publish a special
issue on scientific computing in Python based on the proceedings of the
conference. The papers will undergo peer-review independently from the
conference, to ensure high quality of the final publication.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="call-for-papers"&gt;
&lt;h2&gt;Call for papers&lt;/h2&gt;
&lt;p&gt;Abstract submission is still open, though not for long. We are
soliciting contributions on scientific libraries and tools developed
with Python and on scientific or engineering achievements using Python.
These include applications, teaching, future development directions, and
current research. See the &lt;a class="reference external" href="http://www.euroscipy.org/card/euroscipy2010_call_for_papers"&gt;call for papers&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I am very much looking forward to passionate discussions about
Python in science in Paris&lt;/strong&gt;&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category></entry><entry><title>Status of the EuroScipy registration</title><link href="https://gael-varoquaux.info/programming/status-of-the-euroscipy-registration.html" rel="alternate"></link><published>2010-05-02T22:57:00+02:00</published><updated>2010-05-02T22:57:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-05-02:/programming/status-of-the-euroscipy-registration.html</id><summary type="html">&lt;p&gt;It is still not possible to register for the &lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy2010"&gt;Euroscipy conference&lt;/a&gt;: we
are having difficulties with payment for the registration, and we are
still not sure that we will be able to actually charge money!&lt;/p&gt;
&lt;p&gt;This might not be a bad news, because it might mean that the conference
will …&lt;/p&gt;</summary><content type="html">&lt;p&gt;It is still not possible to register for the &lt;a class="reference external" href="http://www.euroscipy.org/conference/euroscipy2010"&gt;Euroscipy conference&lt;/a&gt;: we
are having difficulties with payment for the registration, and we are
still not sure that we will be able to actually charge money!&lt;/p&gt;
&lt;p&gt;This might not be a bad news, because it might mean that the conference
will be completely free. This would mean that we would be able to
provide lunch which is a pity as there is nothing like eating with a
bunch of passionate experts to learn new tricks, but it would not hamper
the conference in any other way, as the rooms are already booked and
various little expenses covered.&lt;/p&gt;
&lt;p&gt;If we manage to sort out payments in the next weeks, the fee should be
50 euros for the 2 days of tutorial, and between 50 and 100 euros for
the full conference, depending on exactly what catering we offer.&lt;/p&gt;
&lt;p&gt;Anyhow, we should open the registration real-soon, with or without
payment. We will need to have some formal registration, as the number of
people that can fit in the rooms will be limited.&lt;/p&gt;
&lt;p&gt;All in all, with or without registration fees, it should be possible to
make it to Euroscipy keeping expenses low: we have indicated a few cheap
accommodation on the &lt;a class="reference external" href="http://www.euroscipy.org/card/euroscipy2010_practical_information"&gt;practical details page&lt;/a&gt;, and it is easy to get
good food for a good price in the area.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;I am very excited about this conference. We have two keynotes that I am
really looking forward to hearing, and I can say that we have been
getting pretty good submissions for presentations. Also, changes are
that we should be able to publish proceedings in a peer-reviewed
journal, although I can’t say more about that right now.&lt;/p&gt;
&lt;p&gt;Also, even if you are not interested in scientific research done using
Python, the tutorials are a unique opportunity: we are having top-notch
experts presenting with two tracks, &lt;a class="reference external" href="http://www.euroscipy.org/track/871"&gt;one&lt;/a&gt; to get beginners up to speed
and efficient in a couple of days, and the &lt;a class="reference external" href="http://www.euroscipy.org/track/872"&gt;other&lt;/a&gt; for exploring
advanced subjects. I know the speakers, and I can tell you that I won’t
be talking in the corridor, but sitting with my laptop and listening to
them. People pay large chunks of money for such training, usually.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="conferences"></category></entry><entry><title>Mayavi: Representing an additional scalar on surfaces</title><link href="https://gael-varoquaux.info/programming/mayavi-representing-an-additional-scalar-on-surfaces.html" rel="alternate"></link><published>2010-04-05T00:30:00+02:00</published><updated>2010-04-05T00:30:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-04-05:/programming/mayavi-representing-an-additional-scalar-on-surfaces.html</id><summary type="html">&lt;p&gt;We have been getting a few questions on the &lt;a class="reference external" href="https://mail.enthought.com/mailman/listinfo/enthought-dev"&gt;enthought-dev&lt;/a&gt;
mailing-list on how to represent an additional information on a surface
with &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi"&gt;Mayavi&lt;/a&gt;, using color not given eg by the elevation. A &lt;a class="reference external" href="http://dpinte.wordpress.com/2010/03/30/4d-surface-plots-in-mayavi/"&gt;recent
post&lt;/a&gt; on his blog by Didrik Pinte shows the problem quite well:&lt;/p&gt;
&lt;a class="reference external image-reference" href="http://dpinte.wordpress.com/2010/03/30/4d-surface-plots-in-mayavi/"&gt;&lt;img alt="" src="http://dpinte.files.wordpress.com/2010/03/option_valuation_3d.png" /&gt;&lt;/a&gt;
&lt;p&gt;This problem can be seen …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We have been getting a few questions on the &lt;a class="reference external" href="https://mail.enthought.com/mailman/listinfo/enthought-dev"&gt;enthought-dev&lt;/a&gt;
mailing-list on how to represent an additional information on a surface
with &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi"&gt;Mayavi&lt;/a&gt;, using color not given eg by the elevation. A &lt;a class="reference external" href="http://dpinte.wordpress.com/2010/03/30/4d-surface-plots-in-mayavi/"&gt;recent
post&lt;/a&gt; on his blog by Didrik Pinte shows the problem quite well:&lt;/p&gt;
&lt;a class="reference external image-reference" href="http://dpinte.wordpress.com/2010/03/30/4d-surface-plots-in-mayavi/"&gt;&lt;img alt="" src="http://dpinte.files.wordpress.com/2010/03/option_valuation_3d.png" /&gt;&lt;/a&gt;
&lt;p&gt;This problem can be seen as taking a standard &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_helper_functions.html#enthought.mayavi.mlab.surf"&gt;surf&lt;/a&gt; plot:&lt;/p&gt;
&lt;a class="reference external image-reference" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_helper_functions.html#enthought.mayavi.mlab.surf"&gt;&lt;img alt="" src="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/_images/enthought_mayavi_mlab_surf.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;but coloring it with a different scalar than the elevation.&lt;/p&gt;
&lt;p&gt;I would like to present two ways of solving this problem. First a very
simple way specific to the exact problem, second a more complicated but
quite generic approach.&lt;/p&gt;
&lt;div class="section" id="representing-surfaces-more-complex-than-an-elevation-map"&gt;
&lt;h2&gt;Representing surfaces more complex than an elevation map&lt;/h2&gt;
&lt;p&gt;The first option is simply to use the &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html#d-data"&gt;tools&lt;/a&gt; that Mayavi’s &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;mlab&lt;/a&gt;
interface provide to represent surfaces that are not the particular case
of an elevation plot. In our case, it is very easy to use the &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_helper_functions.html#enthought.mayavi.mlab.mesh"&gt;mesh
function&lt;/a&gt; which can take the x, y, z positions of a grid giving the
surface, but also an additional scalar value at these position:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;# Create some data&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mgrid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arctan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Visualize it&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;enthought.mayavi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mesh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;.05&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scalars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Finally, add a few decorations.&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;177&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;img alt="" src="attachments/mesh_example.png" /&gt;
&lt;p&gt;As you can see, this solution is really simple, and solves the problem.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="a-generic-way-of-representing-several-scalar-attributes-with-one-visualization"&gt;
&lt;h2&gt;A generic way of representing several scalar attributes with one visualization&lt;/h2&gt;
&lt;p&gt;If we think of the visualization problem as a way of representing two
scalar values, ‘z’ and ‘w’, and a function of two others, ‘x’ and ‘y’,
the above solution is not really satisfactory: the surf function really
turns the scalar value ‘z’ in elevation (using a WarpScalar filter). We
would like to be able to add an addition scalar value ‘w’ and turn it
into color, just like ‘z’ is turned into elevation. The pipeline that is
created by the surf function is the following:&lt;/p&gt;
&lt;img alt="" src="attachments/surf_pipeline.png" /&gt;
&lt;p&gt;The first element of the pipeline after the scene is the data source
created for us by the surf function: it is a 2D array that contains the
‘z’ value as a scalar value. The ‘WarpScalar’ filter is applied, and
transform that value into elevation. After that, a ‘PolyDataNormals’
filter is used to calculate normals, so as to have a smooth rendering,
and finally, a ‘Surface’ module is applied to display the resulting
elevation map as a surface, with a color reflecting the scalar value.&lt;/p&gt;
&lt;p&gt;The way we can operate on two scalar values and turn them into elevation
and color successively is to embed these two scalar values on the
dataset, ‘z’ and ‘w’, and use a ‘SetActiveAttribute’ to control on which
one the ‘Surface’ module is applied. This approach is much more powerful,
because we can tweak the pipeline ourselves, and use any filter to
replace the WarpScalar, and display the ‘z’ information (more on that
below).&lt;/p&gt;
&lt;p&gt;Here is how to do achieve a visualization with a similar look as above,
but with two scalar values transformed successively in elevation and
color:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;###############################################################&lt;/span&gt;
&lt;span class="c1"&gt;# Create some data&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mgrid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arctan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;###############################################################&lt;/span&gt;
&lt;span class="c1"&gt;# Visualize the data&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;enthought.mayavi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;

&lt;span class="c1"&gt;# Create the data source&lt;/span&gt;
&lt;span class="n"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array2d_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Add the additional scalar information &amp;#39;w&amp;#39;, this is where we need to be a bit careful,&lt;/span&gt;
&lt;span class="c1"&gt;# see&lt;/span&gt;
&lt;span class="c1"&gt;# http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/example_atomic_orbital.html&lt;/span&gt;
&lt;span class="c1"&gt;# and&lt;/span&gt;
&lt;span class="c1"&gt;# http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/data.html&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mlab_source&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dataset&lt;/span&gt;
&lt;span class="n"&gt;array_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;point_data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ravel&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;point_data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;point_data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Here, we build the very exact pipeline of surf, but add a&lt;/span&gt;
&lt;span class="c1"&gt;# set_active_attribute filter to switch the color, this is code very&lt;/span&gt;
&lt;span class="c1"&gt;# similar to the code introduced in:&lt;/span&gt;
&lt;span class="c1"&gt;# http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html#assembling-pipelines-with-mlab&lt;/span&gt;
&lt;span class="n"&gt;warp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;warp_scalar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;warp_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;normals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;poly_data_normals&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;warp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;active_attr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set_active_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;normals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                            &lt;span class="n"&gt;point_scalars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;color&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;surf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;surface&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;active_attr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Finally, add a few decorations.&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;177&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;82&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The pipeline that is created is the following:&lt;/p&gt;
&lt;img alt="" src="attachments/complex_pipeline.png" /&gt;
&lt;p&gt;In the first part of the pipeline, the ‘WarpScalar’ filter is applied to
the ‘z’ scalar value, whereas, due to the ‘SetActiveAttribute’ filter,
the ‘Surface’ module uses the ‘w’ scalar value to display the color.&lt;/p&gt;
&lt;p&gt;This pattern is very powerful, and can be used with other sets of
filters or modules. The example of this pattern that we use in the
Mayavi documentation is the following:&lt;/p&gt;
&lt;a class="reference external image-reference" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/example_atomic_orbital.html"&gt;&lt;img alt="" src="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/_images/example_atomic_orbital.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;We use a ‘Contour’ filter to contour on the amplitude of a complex a
field defined in the volume, and then switch to the phase to display the
color. See the &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/example_atomic_orbital.html"&gt;atomic orbital example&lt;/a&gt; in the Mayavi documentation for
more details.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="mayavi"></category><category term="scipy"></category><category term="scientific computing"></category></entry><entry><title>Book review: Matplotlib for Python Developpers</title><link href="https://gael-varoquaux.info/programming/book-review-matplotlib-for-python-developpers.html" rel="alternate"></link><published>2010-03-26T10:49:00+01:00</published><updated>2010-03-26T10:49:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-03-26:/programming/book-review-matplotlib-for-python-developpers.html</id><summary type="html">&lt;p&gt;&lt;em&gt;Packt publishing&lt;/em&gt; sent me a copy of Sandro Tosi’s book &lt;a class="reference external" href="http://www.packtpub.com/matplotlib-python-development/book"&gt;Matplotlib for
Python Developpers&lt;/a&gt; a while ago. Unfortunately, it arrived after I had
left for the Christmas break, and I couldn’t find time to review it for
a while (I am terribly bad at time-management, and I do …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;em&gt;Packt publishing&lt;/em&gt; sent me a copy of Sandro Tosi’s book &lt;a class="reference external" href="http://www.packtpub.com/matplotlib-python-development/book"&gt;Matplotlib for
Python Developpers&lt;/a&gt; a while ago. Unfortunately, it arrived after I had
left for the Christmas break, and I couldn’t find time to review it for
a while (I am terribly bad at time-management, and I do too many things,
as I result I am always overworked). 3 months later, I have finally
found time to read it and post a review.&lt;/p&gt;
&lt;div class="section" id="content"&gt;
&lt;h2&gt;Content&lt;/h2&gt;
&lt;p&gt;The book introduces &lt;a class="reference external" href="http://matplotlib.sourceforge.net/"&gt;matplotlib&lt;/a&gt; which is, for those who don’t know, a
truly fantastic library for scientific plotting in Python. Matplotlib is
great because it is really easy to pick up, and can be used to produce
very high-quality plots.&lt;/p&gt;
&lt;p&gt;The book starts by progressively introducing the simple, imperative API
for matplotlib, with a focus on getting the user immediately plotting
data. It then moves on to a review of the functionality for plotting in
matplotlib and the object-oriented usage of matplotlib. Finally, Sandro
shows us how to embedded matplotb in various environment such as GUI
toolkits or web development tools.&lt;/p&gt;
&lt;p&gt;The last part of the book is, in my opinion the most original and
precious, as these subjects are less well-known and documented in
classical references accessible to people with a scientific computing
background.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="target-audience"&gt;
&lt;h2&gt;Target audience&lt;/h2&gt;
&lt;p&gt;The book can pretty much be picked by a scientific Python beginner. It
does require some knowledge of the Python language, but if the reader
has programmed in another language, I don’t see this as a big problem.
In this regard, the book is especially interesting, as it can lead a
scientist from newbie to writing simple end-user programs. There is a
clear need for more of these documents currently.&lt;/p&gt;
&lt;p&gt;The book will also be useful for the experienced Python developers
looking to pick up quickly matplotlib.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="personal-comments-on-the-book"&gt;
&lt;h2&gt;Personal comments on the book&lt;/h2&gt;
&lt;p&gt;In my experience, exposing a tool such as matplotlib is a challenge:
everybody has different plotting needs and there is an infinity of
variation in ways that you can use a powerful library like matplotlib.
Thus, Sandro’s exposition of matlplotlib will not suffice: people should
absolutely read more, and I can’t stress too much that the matplotlib
documentation is excellent, and people should read more of it.&lt;/p&gt;
&lt;p&gt;In general, I found that the books reads fairly well. Off course, I am
not the best critic in term of ease of read, as I know matplotlib very
well. I do find that the book lacks a &lt;em&gt;personal touch&lt;/em&gt; such as
interesting examples, or profound insights on specific problems. There
is nothing that got me excited in the book (again, maybe it’s because I
know what’s in the book quite well).&lt;/p&gt;
&lt;p&gt;Once again, in my eyes, the biggest contribution of this book is to put
together an introduction to matplotlib, and examples of application
building using matplotlib. I would especially recommend the book for
people wanting to build simple data visualization GUIs.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Finally, with regards to interactive data visualization, in my
experience, scientific programmers achieve better productivity when
avoiding to work at the widget level and using an abstraction library. I
strongly recommend looking at &lt;a class="reference external" href="http://code.enthought.com/projects/traits/docs/html/"&gt;TraitsUI&lt;/a&gt; for this purpose. you can find
a tutorial &lt;a class="reference external" href="http://gael-varoquaux.info/computers/traits_tutorial/index.html"&gt;here&lt;/a&gt; (disclaimer: I wrote that tutorial).&lt;/p&gt;
&lt;p&gt;Also, if you are going to write a data visualization program that is
interactive in the sens that it enables the user to interact with the
data, using &lt;a class="reference external" href="http://code.enthought.com/chaco/"&gt;Chaco&lt;/a&gt; instead of matplotlib may make your life easier.
Chaco is not as well polished and documented as matplotlib, and I would
never use it for a quick scripting work, but it has a strong focus on
data interaction, and as such makes it really easy to build very
responsive user interfaces, because it is very fast and has a clear
object-oriented API.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="books"></category></entry><entry><title>New Mayavi release</title><link href="https://gael-varoquaux.info/programming/new-mayavi-release.html" rel="alternate"></link><published>2010-03-14T12:58:00+01:00</published><updated>2010-03-14T12:58:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-03-14:/programming/new-mayavi-release.html</id><summary type="html">&lt;p&gt;A week ago, the Peter Wang released a new version of the &lt;a class="reference external" href="http://code.enthought.com/"&gt;Enthought Tool
Suite (ETS)&lt;/a&gt;. With it came a new version of &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Prabhu and I have been horribly busy we real life, and I had the bad
feeling that we were not giving enough love to Mayavi. I …&lt;/p&gt;</summary><content type="html">&lt;p&gt;A week ago, the Peter Wang released a new version of the &lt;a class="reference external" href="http://code.enthought.com/"&gt;Enthought Tool
Suite (ETS)&lt;/a&gt;. With it came a new version of &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi2&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Prabhu and I have been horribly busy we real life, and I had the bad
feeling that we were not giving enough love to Mayavi. I was surprised
when I put together the list of features and bugs fixes that went in
Mayavi for the last two releases. The full list can be found &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/changes.html"&gt;in the
documentation&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="contributors"&gt;
&lt;h2&gt;Contributors&lt;/h2&gt;
&lt;p&gt;We are not being terribly good at tracking external ideas and patches,
so I hope that I haven’t forgotten anybody, but I am very happy to say
that Prabhu and I have received a fair amount of help from non core
contributors:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Chris Colbert&lt;/li&gt;
&lt;li&gt;Darren Dale&lt;/li&gt;
&lt;li&gt;Dave Martin&lt;/li&gt;
&lt;li&gt;Dave Peterson&lt;/li&gt;
&lt;li&gt;Emmanuelle Gouillart&lt;/li&gt;
&lt;li&gt;Erik Tollerud&lt;/li&gt;
&lt;li&gt;Evan Patterson&lt;/li&gt;
&lt;li&gt;Gary Ruben&lt;/li&gt;
&lt;li&gt;Kyle Mandli&lt;/li&gt;
&lt;li&gt;Michele Mattioni&lt;/li&gt;
&lt;li&gt;Ondrej Certik&lt;/li&gt;
&lt;li&gt;Ram Rachum&lt;/li&gt;
&lt;li&gt;Robert Kern&lt;/li&gt;
&lt;li&gt;Scott Warts&lt;/li&gt;
&lt;li&gt;Suyog Jain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;On top of these people, I wish to thank the people making sure that the
Mayavi packages are available in the different Linux distributions:
Varun Hiremath, Lev Givon, Andrea Colangelo, Rakesh Pandit, as well as
Pierre Raybault for integrating in &lt;a class="reference external" href="http://pythonxy.com"&gt;Pythonxy&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="important-features-added-in-3-3-0"&gt;
&lt;h2&gt;Important features added in 3.3.0&lt;/h2&gt;
&lt;p&gt;3.3.0 was released last fall. We had not compiled the list of changes at
the time, I am giving it here:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;An &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/examples.html"&gt;example gallery&lt;/a&gt; in the documentation.&lt;/li&gt;
&lt;li&gt;A &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_figure.html#sync-camera"&gt;sync_camera&lt;/a&gt; helper function to synchronize camera between two
scenes.&lt;/li&gt;
&lt;li&gt;A &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_other_functions.html#text3d"&gt;text3d&lt;/a&gt; module, for position text in 3D that is scaled and hidden
like a data object.&lt;/li&gt;
&lt;li&gt;A &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_figure.html#close"&gt;close&lt;/a&gt; function to close scenes, similar to that in pylab or
matlab.&lt;/li&gt;
&lt;li&gt;A new filter to crop datasets: &lt;em&gt;DataSet Clipper&lt;/em&gt;. This filter is
terribly useful.&lt;/li&gt;
&lt;li&gt;All the &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab_pipeline_reference.html"&gt;mlab.pipeline&lt;/a&gt; functions now take a &lt;em&gt;figure=&lt;/em&gt; keyword
argument. This is very useful when coding with several figures
embedded in GUIs, as in a GUI you can’t rely on a context. This is
illustrated in this &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/example_multiple_mlab_scene_models.html"&gt;example&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="important-features-added-in-3-3-1"&gt;
&lt;h2&gt;Important features added in 3.3.1&lt;/h2&gt;
&lt;p&gt;In latest release the following important features were added:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_figure.html#savefig"&gt;mlab.savefig&lt;/a&gt; can now reliably save images of a size larger than
the window.&lt;/li&gt;
&lt;li&gt;The interactive VTK documentation browser is now available in the
GUI.&lt;/li&gt;
&lt;li&gt;New functions added to &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;mlab&lt;/a&gt; to control position of the camera:
&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_camera.html#move"&gt;move&lt;/a&gt;, &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_camera.html#yaw"&gt;yaw&lt;/a&gt;, and &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_camera.html#pitch"&gt;pitch&lt;/a&gt;. These complement the existing &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_camera.html#view"&gt;view&lt;/a&gt;
and &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_camera.html#roll"&gt;roll&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Make the lines smoother when using &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_helper_functions.html#enthought.mayavi.mlab.plot3d"&gt;mlab.plot3d&lt;/a&gt; (use a VTK Stripper
filter)&lt;/li&gt;
&lt;li&gt;Add a &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_figure.html#enthought.mayavi.mlab.screenshot"&gt;screenshot&lt;/a&gt; function to mlab for easy screen capture as a
numpy array. This is very useful when creating figures that combine
3D using Mayavi and 2D using pylab. I use it all the time.&lt;/li&gt;
&lt;li&gt;Add a &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_pipeline_data.html#probe-data"&gt;probe_data&lt;/a&gt; function to return the data values of Mayavi
objects at given locations as numpy arrays. This is very useful to
combine numerics with Mayavi.&lt;/li&gt;
&lt;li&gt;Add a auto mode to &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/auto/mlab_camera.html#view"&gt;mlab.view&lt;/a&gt; to compute position and distance
based on the objects on the image.&lt;/li&gt;
&lt;li&gt;Add a helper function to easily interact with the data: a callback
can easily be registered to picking data with the mouse. &lt;a class="reference external" href="https://svn.enthought.com/enthought/browser/Mayavi/trunk/examples/mayavi/data_interaction/"&gt;Two
examples&lt;/a&gt; illustrate this new functionality. This is a major step
forward in making life easier for people using Mayavi to build custom
interfaces.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="mayavi"></category></entry><entry><title>Using Python, Scipy, ETS, … to implement art</title><link href="https://gael-varoquaux.info/programming/using-python-scipy-ets-to-implement-art.html" rel="alternate"></link><published>2010-02-14T14:14:00+01:00</published><updated>2010-02-14T14:14:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-02-14:/programming/using-python-scipy-ets-to-implement-art.html</id><summary type="html">&lt;p&gt;The &lt;a class="reference external" href="http://sites.google.com/site/aikonproject/"&gt;Aikon project&lt;/a&gt; has just been slashdotted.&lt;/p&gt;
&lt;p&gt;The project is about implementing a robotic artist, with a special
artistic touch:&lt;/p&gt;
&lt;img alt="" src="http://lh5.ggpht.com/_MJv2VVPLQaA/SUTeNkUi3zI/AAAAAAAAADA/aJkTS88XqGo/s512/0046.jpg" /&gt;
&lt;p&gt;The Co-principal investigator, Patrick Tresset, gave a talk at the
French Pycon this year and I was simply flabbergasted by the project. It
is amazing to mix together art and …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The &lt;a class="reference external" href="http://sites.google.com/site/aikonproject/"&gt;Aikon project&lt;/a&gt; has just been slashdotted.&lt;/p&gt;
&lt;p&gt;The project is about implementing a robotic artist, with a special
artistic touch:&lt;/p&gt;
&lt;img alt="" src="http://lh5.ggpht.com/_MJv2VVPLQaA/SUTeNkUi3zI/AAAAAAAAADA/aJkTS88XqGo/s512/0046.jpg" /&gt;
&lt;p&gt;The Co-principal investigator, Patrick Tresset, gave a talk at the
French Pycon this year and I was simply flabbergasted by the project. It
is amazing to mix together art and technology in such a way, you should
really have a look at the &lt;a class="reference external" href="http://sites.google.com/site/aikonproject/"&gt;videos of the robotic arm making sketches of
people.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;But I was even more startled when I discovered that the project was
using &lt;a class="reference external" href="http://www.scipy.org"&gt;scipy&lt;/a&gt; and all my beloved stack for scientific computing in
Python, including the &lt;a class="reference external" href="http://code.enthought.com/"&gt;Enthought Tool Suite&lt;/a&gt;: &lt;a class="reference external" href="http://sites.google.com/site/aikonproject/isle"&gt;check it out&lt;/a&gt;. I really
want scientific computing software to be tools opening new ideas and new
research. This research goes beyond my dreams.&lt;/p&gt;
&lt;p&gt;
&lt;object width="425" height="344"&gt;
&lt;embed src="http://www.youtube.com/v/AOtQAhblRps&amp;amp;hl=en_US&amp;amp;fs=1&amp;amp;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"&gt;
&lt;/embed&gt;
&lt;/object&gt;
&lt;/p&gt;</content><category term="programming"></category><category term="mayavi"></category><category term="scipy"></category><category term="python"></category><category term="scientific computing"></category><category term="art"></category></entry><entry><title>EuroScipy 2010, Paris July 8-11. Save the date!</title><link href="https://gael-varoquaux.info/programming/euroscipy-2010-paris-july-8-11-save-the-date.html" rel="alternate"></link><published>2010-02-14T00:02:00+01:00</published><updated>2010-02-14T00:02:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2010-02-14:/programming/euroscipy-2010-paris-july-8-11-save-the-date.html</id><summary type="html">&lt;p&gt;&lt;strong&gt;EuroScipy 2010&lt;/strong&gt;, the 3rd European meeting on Python in Science, will
be held July 8-11 in the center of Paris, at the &lt;a class="reference external" href="http://www.ens.fr/?lang=en"&gt;Ecole Normale
Supérieure&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We have made good progress in the organization, and we already have &lt;a class="reference external" href="http://www.euroscipy.org/conference/867?vid=primary"&gt;an
exciting program&lt;/a&gt; although paper submission is not yet even open.&lt;/p&gt;
&lt;div class="section" id="tutorial-tracks"&gt;
&lt;h2&gt;Tutorial tracks …&lt;/h2&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;&lt;strong&gt;EuroScipy 2010&lt;/strong&gt;, the 3rd European meeting on Python in Science, will
be held July 8-11 in the center of Paris, at the &lt;a class="reference external" href="http://www.ens.fr/?lang=en"&gt;Ecole Normale
Supérieure&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We have made good progress in the organization, and we already have &lt;a class="reference external" href="http://www.euroscipy.org/conference/867?vid=primary"&gt;an
exciting program&lt;/a&gt; although paper submission is not yet even open.&lt;/p&gt;
&lt;div class="section" id="tutorial-tracks"&gt;
&lt;h2&gt;Tutorial tracks&lt;/h2&gt;
&lt;p&gt;There will be two tutorials tracks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/track/871"&gt;An introductory track&lt;/a&gt;, to bring attendees up to speed with Python
in science. Even if you are a complete beginner, after these two
days, you should be able to be efficient using Python for scientific
purposes.&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.euroscipy.org/track/872"&gt;An advanced tutorial track&lt;/a&gt;, covering in-depth specific tools and
projects, aimed at experienced users and presented by leading experts
of the topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will soon be requesting feedback from you to help us choose between
the different thrilling tutorial propositions that we have for these
tracks. More on that later…&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="keynote-speakers"&gt;
&lt;h2&gt;Keynote speakers&lt;/h2&gt;
&lt;div class="section" id="hans-petter-langtangen"&gt;
&lt;h3&gt;Hans Petter Langtangen&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://simula.no/"&gt;Simula laboratory&lt;/a&gt;, Oslo, director of scientific computing and
bio-medical research&lt;/li&gt;
&lt;li&gt;Author of the famous book &lt;a class="reference external" href="http://www.springer.com/mathematics/numerical+and+computational+mathematics/book/978-3-540-73915-9"&gt;Python scripting for computational
science&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="konrad-hinsen"&gt;
&lt;h3&gt;Konrad Hinsen&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://www.synchrotron-soleil.fr/"&gt;Synchrotron SOLEIL&lt;/a&gt; and &lt;a class="reference external" href="http://dirac.cnrs-orleans.fr/plone"&gt;Centre de Biophysique Moléculaire&lt;/a&gt;
(Orléans)&lt;/li&gt;
&lt;li&gt;One of the fathers of numeric, and developer of &lt;a class="reference external" href="http://dirac.cnrs-orleans.fr/plone/software/scientificpython/"&gt;Scientific Python.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="help-us-spread-the-word"&gt;
&lt;h2&gt;Help us spread the word&lt;/h2&gt;
&lt;p&gt;The poster of the conference can be downloaded:&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/poster_euroscipy_2010.pdf"&gt;&lt;img alt="" src="attachments/poster_euroscipy_2010.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;Help us spread the word: print it and post it at your workplace!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-exciting-city-of-paris"&gt;
&lt;h2&gt;The exciting city of Paris&lt;/h2&gt;
&lt;p&gt;The conference will take place in the center of Paris, in the very
lively “quartier latin”, in the prestigious and historical ‘Ecole
Normale Supérieure’. In the morning, on your way to ENS, drop by a café
for a French croissant, served by a French waiter with a typical French
accent in English. In the evenings, walk one block to enjoy the night
life “rue Mouffetard”, or venture further to stroll on the river banks
of the Seine, along which people dance to street music.&lt;/p&gt;
&lt;img alt="" src="attachments/banner_euroscipy.jpg" /&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category><category term="conferences"></category></entry><entry><title>The SciPy 2009 proceedings are online</title><link href="https://gael-varoquaux.info/programming/the-scipy-2009-proceedings-are-online.html" rel="alternate"></link><published>2009-12-20T18:49:00+01:00</published><updated>2009-12-20T18:49:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-12-20:/programming/the-scipy-2009-proceedings-are-online.html</id><summary type="html">&lt;p&gt;We are finally announcing the online edition of SciPy proceedings:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://conference.scipy.org/proceedings/SciPy2009/"&gt;http://conference.scipy.org/proceedings/SciPy2009/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This year, we tried to raise the bar in terms of article quality. This
involved having a more strict review process, and we must thank a lot
all the reviewers. I have the feeling …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We are finally announcing the online edition of SciPy proceedings:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://conference.scipy.org/proceedings/SciPy2009/"&gt;http://conference.scipy.org/proceedings/SciPy2009/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This year, we tried to raise the bar in terms of article quality. This
involved having a more strict review process, and we must thank a lot
all the reviewers. I have the feeling it did improve the quality of the
final papers. Actually, I must say that there are some really nice
papers in the proceedings. I am not going to list them here, you can
have a glance at the contents, but they range from fairly technical
papers on tools development that are more in the software engineering
and computer science fields, to application papers demonstrating how the
tools can be used.&lt;/p&gt;
&lt;p&gt;I must apologize for the time it took to publish the proceedings. All
this was actually a lot of work, and it has taken me a lot of energy. I
hope that you will it was worth it.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category><category term="publishing"></category></entry><entry><title>Announcing EuroScipy 2010</title><link href="https://gael-varoquaux.info/programming/announcing-euroscipy-2010.html" rel="alternate"></link><published>2009-12-14T01:01:00+01:00</published><updated>2009-12-14T01:01:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-12-14:/programming/announcing-euroscipy-2010.html</id><summary type="html">&lt;div class="section" id="the-3rd-european-meeting-on-python-in-science"&gt;
&lt;h2&gt;The 3rd European meeting on Python in Science&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Paris, Ecole Normale Supérieure, July 8-11 2010&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We are happy to announce the 3rd EuroScipy meeting, in Paris, July 2010.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;The EuroSciPy meeting is a cross-disciplinary gathering focused on the&lt;/div&gt;
&lt;div class="line"&gt;use and development of the Python language in scientific research. This&lt;/div&gt;
&lt;div class="line"&gt;event …&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</summary><content type="html">&lt;div class="section" id="the-3rd-european-meeting-on-python-in-science"&gt;
&lt;h2&gt;The 3rd European meeting on Python in Science&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Paris, Ecole Normale Supérieure, July 8-11 2010&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We are happy to announce the 3rd EuroScipy meeting, in Paris, July 2010.&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;The EuroSciPy meeting is a cross-disciplinary gathering focused on the&lt;/div&gt;
&lt;div class="line"&gt;use and development of the Python language in scientific research. This&lt;/div&gt;
&lt;div class="line"&gt;event strives to bring together both users and developers of&lt;/div&gt;
&lt;div class="line"&gt;scientific tools, as well as academic research and state of the art&lt;/div&gt;
&lt;div class="line"&gt;industry.&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="important-dates"&gt;
&lt;h3&gt;Important dates&lt;/h3&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;strong&gt;Registration opens&lt;/strong&gt;: Sunday March 29&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;&lt;strong&gt;Paper submission deadline&lt;/strong&gt;: Sunday May 9&lt;/div&gt;
&lt;div class="line"&gt;&lt;strong&gt;Program announced&lt;/strong&gt;: Sunday May 22&lt;/div&gt;
&lt;div class="line"&gt;&lt;strong&gt;Tutorials tracks&lt;/strong&gt;: Thursday July 8 - Friday July 9&lt;/div&gt;
&lt;div class="line"&gt;&lt;strong&gt;Conference track&lt;/strong&gt;: Saturday July 10 - Sunday July 11&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="tutorial"&gt;
&lt;h3&gt;Tutorial&lt;/h3&gt;
&lt;p&gt;There will be two tutorial tracks at the conference, an introductory one,
to bring up to speed with the Python language as a scientific tool, and
an advanced track, during which experts of the field will lecture on
specific advanced topics such as advanced use of numpy, scientific
visualization, software engineering…&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="main-conference-topics"&gt;
&lt;h3&gt;Main conference topics&lt;/h3&gt;
&lt;p&gt;We will be soliciting talks on the follow topics:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Presentations of scientific tools and libraries using the Python
language, including but not limited to:&lt;ul&gt;
&lt;li&gt;Vector and array manipulation&lt;/li&gt;
&lt;li&gt;Parallel computing&lt;/li&gt;
&lt;li&gt;Scientific visualization&lt;/li&gt;
&lt;li&gt;Scientific data flow and persistence&lt;/li&gt;
&lt;li&gt;Algorithms implemented or exposed in Python&lt;/li&gt;
&lt;li&gt;Web applications and portals for science and engineering&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Reports on the use of Python in scientific achievements or ongoing
projects.&lt;/li&gt;
&lt;li&gt;General-purpose Python tools that can be of special interest to the
scientific community.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Keynote Speaker: Hans Petter Langtangen&lt;/p&gt;
&lt;p&gt;We are excited to welcome Hans Petter Langtangen as our keynote speaker.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Director of scientific computing and bio-medical research at Simula
labs, Oslo&lt;/li&gt;
&lt;li&gt;Author of the famous book Python scripting for computational science
&lt;a class="reference external" href="http://www.springer.com/math/cse/book/978-3-540-73915-9"&gt;http://www.springer.com/math/cse/book/978-3-540-73915-9&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="section" id="the-organizers"&gt;
&lt;h4&gt;The organizers:&lt;/h4&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;Gaël Varoquaux (INRIA Saclay, Parietal), conference co-chair&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;Nicolas Chauvat (Logilab), conference co-chair&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="program-committee"&gt;
&lt;h4&gt;Program committee&lt;/h4&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;Romain Brette (ENS Paris, DEC)&lt;/div&gt;
&lt;div class="line"&gt;Mike Müller (Python Academy)&lt;/div&gt;
&lt;div class="line"&gt;Christophe Pradal (CIRAD/INRIA, DigiPlantes team)&lt;/div&gt;
&lt;div class="line"&gt;Pierre Raybault (CEA, DAM)&lt;/div&gt;
&lt;div class="line"&gt;Jarrod Millman (UC Berkeley, Helen Wills NeuroScience institute)&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="scientific computing"></category><category term="conferences"></category></entry><entry><title>Decoration in Python done right: Decorating and pickling</title><link href="https://gael-varoquaux.info/programming/decoration-in-python-done-right-decorating-and-pickling.html" rel="alternate"></link><published>2009-11-13T00:14:00+01:00</published><updated>2009-11-13T00:14:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-11-13:/programming/decoration-in-python-done-right-decorating-and-pickling.html</id><summary type="html">&lt;p&gt;Decoration is a fantastic pattern in Python that allows for very
light-weight metaprograming with functions rather than objects (see
&lt;a class="reference external" href="http://www.ibm.com/developerworks/linux/library/l-cpdecor.html"&gt;this article&lt;/a&gt; for an in-depth discussion). However, when decorating,
it is very easy to break another great feature of the language: its
reflectivity and its ability to do static representations of …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Decoration is a fantastic pattern in Python that allows for very
light-weight metaprograming with functions rather than objects (see
&lt;a class="reference external" href="http://www.ibm.com/developerworks/linux/library/l-cpdecor.html"&gt;this article&lt;/a&gt; for an in-depth discussion). However, when decorating,
it is very easy to break another great feature of the language: its
reflectivity and its ability to do static representations of its
internal objects: pickling.&lt;/p&gt;
&lt;p&gt;In this blog post, I’d like to rewrite a post I made on the IPython
mailing list a month ago, summing up the few things to have in mind when
decorating a function.&lt;/p&gt;
&lt;div class="section" id="a-pattern-to-avoid"&gt;
&lt;h2&gt;A pattern to avoid?&lt;/h2&gt;
&lt;p&gt;I have recently been revisiting my decoration code, to fight a common
mistake I had been doing, and it was partly due to the heavy use of a
simplified pattern for decorating:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;with_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Decorate a function to print its arguments.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;my_func&lt;/span&gt;

&lt;span class="nd"&gt;@with_print&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;f called&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The nice thing about this pattern is that is it quite easy to type, and
to read.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="why-it-is-harmful"&gt;
&lt;h2&gt;Why it is harmful&lt;/h2&gt;
&lt;p&gt;The decorated function is actually the function ‘my_func’, with a
reference to the original function ‘func’, a part of the scope of the
decorator ‘with_print’, and thus in the closure of the with_print
function.&lt;/p&gt;
&lt;p&gt;The problem is that we have a closure here. Thus we have variables that
are hard to get to (the undecorated function), and the decorated
function is not picklable (which is more and more important to me, e.g.
for parallel computing).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="some-solutions"&gt;
&lt;h2&gt;Some solutions&lt;/h2&gt;
&lt;div class="section" id="avoiding-the-closure"&gt;
&lt;h3&gt;Avoiding the closure&lt;/h3&gt;
&lt;p&gt;Use objects as a scope, rather than a closure:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WithPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="fm"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="fm"&gt;__call__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This solution is not enough: the following code won’t pickle:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nd"&gt;@WithPrint&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;g called&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The reason this won’t pickle is that we have a name collision: the code
above expands to:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;g called&amp;#39;&lt;/span&gt;

&lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WithPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and trying to pickle raises the following PicklingError:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;Can&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;t pickle &amp;lt;function g at 0x6ed2a8&amp;gt;: it&amp;#39;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;same&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;__main__&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If we do:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;g called&amp;#39;&lt;/span&gt;

&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WithPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;we can pickle h, hurray!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="using-functools-wraps"&gt;
&lt;h3&gt;Using functools.wraps&lt;/h3&gt;
&lt;p&gt;However, Python comes with the answer in the standard libary:
functools.wraps does the name unmangling.&lt;/p&gt;
&lt;p&gt;Thus the following code produces a pickleable f:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;with_print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Decorate a function to print its arguments.&lt;/span&gt;
&lt;span class="sd"&gt;    &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;my_func&lt;/span&gt;

&lt;span class="nd"&gt;@with_print&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;f called&amp;#39;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;The pickling works simply because using functools.wraps resets the&lt;/div&gt;
&lt;div class="line"&gt;.func_name attribute of f to have a well-defined import path. Thus&lt;/div&gt;
&lt;div class="line"&gt;pickling works, simply by storing the import path, as all pickling of&lt;/div&gt;
&lt;div class="line"&gt;functions.&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Notice that there is only a one-line difference with the original code!&lt;/p&gt;
&lt;p&gt;I actually tend to use a combination of both solution (an object, using
functools.wraps), to keep a reference on the undecorated functions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Demo code of this blog post can be found &lt;a class="reference external" href="attachments/pickling_tests_py"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="take-home-messages-for-pickling"&gt;
&lt;h2&gt;Take home messages for pickling&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Decorators can be more clever than you think, and might not return
objects as simple as you think&lt;/li&gt;
&lt;li&gt;Think about pickling, or you’ll get bitten at some point (for
instance when doing parallel computing).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;and most important:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Use functools.wraps&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="a-remark-about-object-oriented-programming"&gt;
&lt;h2&gt;A remark about object-oriented programming&lt;/h2&gt;
&lt;p&gt;To jump on the band-wagon behind &lt;a class="reference external" href="http://blog.enthought.com/?p=223"&gt;Travis&lt;/a&gt;, I believe that this
discussion teaches us a bit about object-oriented programming. When
decorating, we are really taking a callable object, and redefining how
the call is handled.&amp;nbsp; If we do this the naive way, we loose
introspection (there is no way to access the original callable from
Python), and as a result pickling, and many of the nice feature going
with reflexivity in Python. This is because we trapped information in a
scope that is not accessible by normal Python code (without playing at
the frame level). If on the other hand, we accept that what we have
behind all this are nested scope with a control of lookups, and we
create a full-blown object, we have the benefits of the black box, and
the benefits of reflexivity.&lt;/p&gt;
&lt;p&gt;But this is not the point I want to make. The point I want to make is
that, by decorating, we are piggy-backing on an absolutely universal
object/interface: the callable. Everybody knows what a callable is, and
knows how to employ it. From a pure object-oriented point of view,
decorating is simply some kind of proxy design pattern. But, to stress
Travis’s point, introducing new objects that have their own behavior
puts cognitive load on the programmer. The real value of decoration is
that it is object-oriented programming without adding any new or
surprising interface. You don’t really have to care what is going on,
you can still use the resulting ‘proxied’ function as the original
function: a simple function.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="design patterns"></category><category term="software architecture"></category><category term="selected"></category></entry><entry><title>Writing parallel code in a readable way</title><link href="https://gael-varoquaux.info/programming/writing-parallel-code-in-a-readable-way.html" rel="alternate"></link><published>2009-11-09T00:10:00+01:00</published><updated>2009-11-09T00:10:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-11-09:/programming/writing-parallel-code-in-a-readable-way.html</id><summary type="html">&lt;p&gt;Although I often have embarrasingly parallel problems (data parallel),
and I have an 8-CPU box at work, I used to frown on writing parallel
computing code when doing exploratory coding. We now have fantastic
parallel computing facilities in Python (amongst other,
&lt;a class="reference external" href="http://docs.python.org/library/multiprocessing.html"&gt;multiprocessing&lt;/a&gt;, &lt;a class="reference external" href="http://ipython.scipy.org/doc/rel-0.9.1/html/parallel/index.html"&gt;IPython&lt;/a&gt;, and &lt;a class="reference external" href="http://www.parallelpython.com/"&gt;parallel Python&lt;/a&gt;). However, in my
opinion …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Although I often have embarrasingly parallel problems (data parallel),
and I have an 8-CPU box at work, I used to frown on writing parallel
computing code when doing exploratory coding. We now have fantastic
parallel computing facilities in Python (amongst other,
&lt;a class="reference external" href="http://docs.python.org/library/multiprocessing.html"&gt;multiprocessing&lt;/a&gt;, &lt;a class="reference external" href="http://ipython.scipy.org/doc/rel-0.9.1/html/parallel/index.html"&gt;IPython&lt;/a&gt;, and &lt;a class="reference external" href="http://www.parallelpython.com/"&gt;parallel Python&lt;/a&gt;). However, in my
opinion, there are two reasons to hesitate to use them, especially when
the code is very imature (which is almost always my case, in research
settings):&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;It makes the code look less like the ideas it is trying to express.
Peter Norvig made &lt;a class="reference external" href="http://www.archive.org/details/scipy09_day1_03-Peter_Norvig"&gt;a pretty convincing case&lt;/a&gt; for scientific code
reading like math at &lt;a class="reference external" href="http://conference.scipy.org/"&gt;SciPy2009&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Because parallel computing is out of process, in Python, it is simply
harder to debug (though I hear that the IPython guys are on that).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I have progressively developed a tiny tool to adress both problems, at
least for my embarrasingly-parallel problems. I address the second
problem by having a trivial switch to run my code without importing any
fancy parallel computing tools. And I address the first problem using
syntactic sugar to be able to type in map/reduce code that actually
looks like standard procedural code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Parallel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;
            &lt;span class="n"&gt;delayed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_calculation&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;data1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                    &lt;span class="n"&gt;parameter1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parameter2&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;data1&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;store1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;data2&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;store2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There are several tricks here:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;I use a ‘&lt;em&gt;delayed&lt;/em&gt;‘ decorator that creates the argument list and
keyword argument dictionary for me so that I can type something that
really looks like a function call. Also, the decorator checks to see
if the function and the arguments can be pickled, because if not the
parallel computing libraries will raise errors, sometimes with
hard-to-understand messages.&lt;/li&gt;
&lt;li&gt;I use list comprehension to create the list to apply the map/reduce
onto. List comprehension is really readable, and very powerful.&lt;/li&gt;
&lt;li&gt;The ‘&lt;em&gt;Parallel&lt;/em&gt;‘ object hides all the cleverness. If the
‘&lt;em&gt;n_jobs&lt;/em&gt;‘ parameter is set to 1, it does not call any parallel
computing library. If it is set to -1, all the CPUs are used. The
object instantiates the parallel computing context and also destroys
it. While this is inefficient, it is great for catching errors early.
And finally, while I have implemented this only for the
multiprocessing module, any fork/join-based parallel computing
library could be encapsulated the same way, thus providing a uniform
API to do multi-node parallel computing or single-computer shared
memory (as multi-processing uses the Unix fork call, and all modern
Unices implement copy on write of memory pages, you get some shared
memory for free without worrying about race conditions).&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Update&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This pattern has actually evolved in the &lt;a class="reference external" href="https://pythonhosted.org/joblib/"&gt;joblib project&lt;/a&gt; ,
which provides a lot of cleverness under the hood.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="scipy"></category><category term="software engineering"></category><category term="joblib"></category></entry><entry><title>EuroScipy 2010 in Paris</title><link href="https://gael-varoquaux.info/programming/euroscipy-2010-in-paris.html" rel="alternate"></link><published>2009-10-27T23:22:00+01:00</published><updated>2009-10-27T23:22:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-10-27:/programming/euroscipy-2010-in-paris.html</id><summary type="html">&lt;p&gt;Next year’s EuroScipy will be in Paris, as Nicolas Chauvat and myself
announced in Leipzig this summer. We are still busy organizing, but we
have pretty much settled down for a dates: &lt;strong&gt;July 8th- July 11th&lt;/strong&gt;. So
mark those dates, and get ready to come to Paris for a …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Next year’s EuroScipy will be in Paris, as Nicolas Chauvat and myself
announced in Leipzig this summer. We are still busy organizing, but we
have pretty much settled down for a dates: &lt;strong&gt;July 8th- July 11th&lt;/strong&gt;. So
mark those dates, and get ready to come to Paris for a fantastic event
where Science meets computing thanks to Python.&lt;/p&gt;
&lt;p&gt;On the Thursday and Friday, we will have 2 days of optional tutorials;
introductory ones to get up to speed with Python, and advanced ones,
where experts explain the tools they know best. On the Saturday and
Sunday, the main conference will be held, and if it is anywhere like
last year’s, we will be hearing thrilling discussions with topics
ranging from the latest libraries for better scientific computing to how
Python was used in top-notch scientific achievements.&lt;/p&gt;
</content><category term="programming"></category><category term="conferences"></category><category term="python"></category><category term="scientific computing"></category></entry><entry><title>Useful trick for functions and tests using np.random</title><link href="https://gael-varoquaux.info/programming/useful-trick-for-functions-and-tests-using-nprandom.html" rel="alternate"></link><published>2009-08-29T16:00:00+02:00</published><updated>2009-08-29T16:00:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-08-29:/programming/useful-trick-for-functions-and-tests-using-nprandom.html</id><summary type="html">&lt;p class="first last"&gt;How to test functions that use the numpy random number generator&lt;/p&gt;
</summary><content type="html">&lt;p&gt;Recently, listening to Robert Kern taught a new useful trick to use when
you write functions that use the numpy random number generator:&lt;/p&gt;
&lt;p&gt;As always, when using random number generation in code, the problem is
to get ‘repeatable results’. Of course, you want only repeatable
statistics, and with statistics, the problem is to define what
repeatable is. Anyhow, for various reasons, it is useful to be able to
reproduce exactly runs, for instance when testing, fine tuning, or
debugging. That is why you would like to be able to control the seed of
your random number generation. Robert Kern showed us (at the SciPy
conference tutorial) a way to control the pseudo random number generator
(PRNG) in a function, without affecting the rest of the code. This does
not involve setting the seed of the global PRNG, as this is evil,
because it has global effects. The idea is to pass in to your functions
a PRNG instance (by default the global one):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prng&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nb"&gt;print&lt;/span&gt; &lt;span class="n"&gt;pnrg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;if you want to use your function with a controlled PRNG, you can
instantiate one with a specific seed:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;prng&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RandomState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and pass it to your function.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="software engineering"></category><category term="design patterns"></category><category term="selected"></category></entry><entry><title>SciPy 2009 is over!</title><link href="https://gael-varoquaux.info/programming/scipy-2009-is-over.html" rel="alternate"></link><published>2009-08-29T12:21:00+02:00</published><updated>2009-08-29T12:21:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-08-29:/programming/scipy-2009-is-over.html</id><summary type="html">&lt;p&gt;The week is over, and I am finally catching up with things, back here in
France.&lt;/p&gt;
&lt;p&gt;The SciPy conference was exciting and fun as usual. It was great to meet
old friends and put faces on names on the mailing list.&lt;/p&gt;
&lt;p&gt;The turn out was very good: we had 150 …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The week is over, and I am finally catching up with things, back here in
France.&lt;/p&gt;
&lt;p&gt;The SciPy conference was exciting and fun as usual. It was great to meet
old friends and put faces on names on the mailing list.&lt;/p&gt;
&lt;p&gt;The turn out was very good: we had 150 people total. This is more than
last year (125), which shows that there is high interest, given that
most institutions have travel restrictions due to this year’s low
budget.&lt;/p&gt;
&lt;p&gt;The year, the conference was very international. I was really happy
that, partly thanks to the &lt;a class="reference external" href="http://conference.scipy.org/student-funding"&gt;PSF contribution&lt;/a&gt;, we had the visit of
young contributors coming from far away, such as David Cournapeau
(Japan), Dag Sverre Seljebotn (Norway), Pauli Virtanen (Finland), and
Stéfan van der Walt (South Africa). For me, living in France, it was
also great to have people from major European institutions, such as the
&lt;a class="reference external" href="http://conference.scipy.org/abstract?id=7"&gt;ESRF&lt;/a&gt; (European Synchrotron Radiation Facility), the &lt;a class="reference external" href="http://conference.scipy.org/abstract?id=14"&gt;Fraunhofer
institute&lt;/a&gt;,&amp;nbsp; the &lt;a class="reference external" href="http://conference.scipy.org/abstract?id=14"&gt;Max Planck Institute&lt;/a&gt;. These people not only
committed important projects to the scientific Python tools, but made
the effort of coming all the way to California to talk about it, which
is non negligible given the cost of the trip all the way from Europe. To
me this is important because it means that we are getting more
interaction wordwide, and thus the tools are more likely to converge to
something of generaluse. Also, for the first time ever, one of my bosses
came to the conference. It is fantastic to be working with great
scientists who actually understand that technology is important to do
good science, and that programming is actually hard, and a matter of
interest per se.&lt;/p&gt;
&lt;p&gt;On the other hand, I was disappointed that we had no presentations from
the industry. There were a lot of industry people in the audience, and
it is always fun to here what they use Python for.&lt;/p&gt;
&lt;p&gt;I really enjoyed the keynote by Peter Norvig because Peter talked about
the importance of having a clear language to expose and formulate
scientific ideas. This is something that is very dear to me, and I must
say that the code snippets he presented were crystal clear, and
involving non-trivial maths explained in a way that made them look
simple, similar to his famous blog post on &lt;a class="reference external" href="http://norvig.com/spell-correct.html"&gt;spell checking&lt;/a&gt;. It was
really inspiring for me, and driving me into trying to write even
cleared and simpler code.&lt;/p&gt;
&lt;p&gt;The technical keynote by Jon Guyer also hit a soft spot for me, not only
because the physics presented was very beautiful, but also because my
partner is doing research in similar fields (with Python, of course),
and Jon made an excellent argument for using Python, which is not always
easy when you are discussing heavily computational problems.&lt;/p&gt;
&lt;p&gt;For my personal work, SciPy was very exciting, because I had so many
discussions with different people on how we could share efforts, by
tweaking a data structure in an existing package, or simply having a
look at a package I wasn’t aware of. The machine learning BoF was
extremely enthusiastic, and I am really looking forward to October, when
we will be able to start working on that. If only half of the things was
talked about ever get done, I will be thrilled.&lt;/p&gt;
&lt;p&gt;I should point out that, thanks to hard work by Jeff Teeters and Kilian
Koepsell from Berkeley, the videos of every talk are on the &lt;a class="reference external" href="http://www.archive.org/search.php?query="&gt;web&lt;/a&gt; for
the first time.&lt;/p&gt;
&lt;p&gt;Also, we have a nice &lt;a class="reference external" href="http://www.flickr.com/photos/irees/sets/72157622006161097/"&gt;photo gallery&lt;/a&gt; with a group picture.&lt;/p&gt;
&lt;p&gt;We have so many people to thank. I think special thanks go to Leah
Jones, at Enthought, and Julie Ponce, at CACR, Caltech. They made sure
that the organization committee didn’t forget anything important and did
a lot of the grunt work. Thanks also to Enthought and CACR, and many of
their staff, for the support in the organization. PSF founded students,
and that is a big deal. We should thank all the tutorial presenters, it
takes a lot of work to put together the material. We were very grateful
to the &lt;a class="reference external" href="http://conference.scipy.org/organizers"&gt;program committee&lt;/a&gt; for reviewing the papers. Also thanks to all
the speakers, and to all the attendees. The SciPy conference is a bit
special to me, because it is very laid back, and I can trust that it
will be great almost by self-organization, as you put together nice and
clever people, and they find ways of discussing of interested things
with enthusiasm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; That blog post feels way too ‘political’. I dislike sale
pitches, and it does feel like one. But, how to sum up some important
contributions and thanks the people who helped out? I made a point of
always wearing a T-shirt of the conference, rather than a shirt, but I
guess that there is a point at which trying to dodge etiquette with a
T-shirt and a pony tail is just another cliché[*] and formalism.&lt;/p&gt;
&lt;p&gt;[*] &lt;em&gt;cliché&lt;/em&gt; is a French word for cliché.&lt;/p&gt;
</content><category term="programming"></category></entry><entry><title>Mayavi: 2 videos of tutorial-like presentation</title><link href="https://gael-varoquaux.info/programming/mayavi-2-videos-of-tutorial-like-presentation.html" rel="alternate"></link><published>2009-07-16T23:35:00+02:00</published><updated>2009-07-16T23:35:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-07-16:/programming/mayavi-2-videos-of-tutorial-like-presentation.html</id><summary type="html">&lt;p&gt;I gave a presentation on Mayavi in the Python for science seminar
organised by Fernando Perez at Berkeley. I was loudmouth and obnoxious
as usual, and unfortunately for me, I was recorded.&lt;/p&gt;
&lt;p&gt;More seriously, Jeff Teeters has filmed the presentation and recorded
the sound was a microphone I was wearing …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I gave a presentation on Mayavi in the Python for science seminar
organised by Fernando Perez at Berkeley. I was loudmouth and obnoxious
as usual, and unfortunately for me, I was recorded.&lt;/p&gt;
&lt;p&gt;More seriously, Jeff Teeters has filmed the presentation and recorded
the sound was a microphone I was wearing. I find that he did a really
excellent job. Getting a good recording is &lt;strong&gt;hard&lt;/strong&gt;, and he really got
good audio and good framing. I am amazed and I don’t know how to thank
him enough.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.archive.org/details/ucb_py4science_2009_07_14_Gael_Varoquaux"&gt;http://www.archive.org/details/ucb_py4science_2009_07_14_Gael_Varoquaux&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Also, Stefan van der Waalt gave a talk in Souft Africa on Mayavi that
was recorded. Another very useful resource for learning Mayavi:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www.archive.org/details/ctpug-2008-09-mayavi"&gt;http://www.archive.org/details/ctpug-2008-09-mayavi&lt;/a&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scipy"></category><category term="science"></category><category term="mayavi"></category></entry><entry><title>Announcing the SciPy conference schedule</title><link href="https://gael-varoquaux.info/programming/announcing-the-scipy-conference-schedule.html" rel="alternate"></link><published>2009-07-16T03:02:00+02:00</published><updated>2009-07-16T03:02:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-07-16:/programming/announcing-the-scipy-conference-schedule.html</id><summary type="html">&lt;p&gt;The SciPy conference committee is pleased to announce the schedule of
the conference:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://conference.scipy.org/schedule"&gt;http://conference.scipy.org/schedule&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This year’s program is very rich. In order to limit the number of
interesting talks that we had to turn down, we decided to reduce the
length of talks. Although this …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The SciPy conference committee is pleased to announce the schedule of
the conference:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://conference.scipy.org/schedule"&gt;http://conference.scipy.org/schedule&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This year’s program is very rich. In order to limit the number of
interesting talks that we had to turn down, we decided to reduce the
length of talks. Although this results in many short talks, we hope that
it will foster discussions, and give new ideas. Many subjects are
covered, both varying technical subject in the scientific computing
spectrum, and covering a lot of different research areas.&lt;/p&gt;
&lt;p&gt;I would personally like to thank the members of the program committee,
who spent time reviewing the proposed abstracts and giving the chairs
feedback.&lt;/p&gt;
&lt;p&gt;Fernando Perez and the tutorial presenters are hard at work finishing
planning all the details of the two-day tutorial session that will
precede the conference. An introduction tutorial track and an advanced
tutorial track, both covering various aspect of scientific computing in
Python, presented by experts in the field, should help many people
getting up to speed on the amazing technology driving this community.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The SciPy 2009 program committee&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;strong&gt;Co-Chair&lt;/strong&gt; Gaël Varoquaux, Applied Mathematics and Neuroscience,
Neurospin, CEA - INRIA Saclay (France)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Co-Chair&lt;/strong&gt; Stéfan van der Walt, Applied Mathematics, University of
Stellenbosch (South Africa)&lt;/li&gt;
&lt;li&gt;Michael Aivazis, Center for Advanced Computing Research, California
Institute of Technology (USA)&lt;/li&gt;
&lt;li&gt;Brian Granger, Physics Department, California Polytechnic State
University, San Luis Obispo (USA)&lt;/li&gt;
&lt;li&gt;Aric Hagberg, Theoretical Division, Los Alamos National Laboratory
(USA)&lt;/li&gt;
&lt;li&gt;Konrad Hinsen, Centre de Biophysique Moléculaire, CNRS Orléans
(France)&lt;/li&gt;
&lt;li&gt;Randall LeVeque, Mathematics, University of Washington, Seattle (USA)&lt;/li&gt;
&lt;li&gt;Travis Oliphant, Enthought (USA)&lt;/li&gt;
&lt;li&gt;Prabhu Ramachandran, Department of Aerospace Engineering, IIT Bombay
(India)&lt;/li&gt;
&lt;li&gt;Raphael Ritz, International Neuroinformatics Coordinating Facility
(Sweden)&lt;/li&gt;
&lt;li&gt;William Stein, Mathematics, University of Washington, Seattle (USA)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Conference Chair&lt;/strong&gt;: Jarrod Millman, Neuroscience Institute, UC
Berkeley (USA)&lt;/p&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="science"></category><category term="scientific computing"></category><category term="conferences"></category></entry><entry><title>My article on scientific computing with Python</title><link href="https://gael-varoquaux.info/programming/my-article-on-scientific-computing-with-python.html" rel="alternate"></link><published>2009-07-13T03:23:00+02:00</published><updated>2009-07-13T03:23:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-07-13:/programming/my-article-on-scientific-computing-with-python.html</id><content type="html">&lt;p&gt;I have never sold the rights to the article I published in LinuxMagazine
France on scientific computing with Python. So I am uploading it to the
net, under a CC-by-SA license : &lt;a class="reference external" href="http://hal.inria.fr/hal-00776672/"&gt;http://hal.inria.fr/hal-00776672/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It is in French, so it restricts the audience.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scipy"></category><category term="scientific computing"></category><category term="publishing"></category></entry><entry><title>Tutorial on scientific use of Python</title><link href="https://gael-varoquaux.info/programming/tutorial-on-scientific-use-of-python.html" rel="alternate"></link><published>2009-07-08T19:38:00+02:00</published><updated>2009-07-08T19:38:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-07-08:/programming/tutorial-on-scientific-use-of-python.html</id><content type="html">&lt;p&gt;The notes of the tutorial I gave on scientific use of Python at PyconFR
are online. They are in French, but I am giving the link here, just in
case it is needed:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://dl.afpy.org/pycon-fr-09/python_scientifique/index.html"&gt;http://dl.afpy.org/pycon-fr-09/python_scientifique/index.html&lt;/a&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="python"></category><category term="scientific computing"></category><category term="teaching"></category></entry><entry><title>Object-oriented design: framework objects versus data containers</title><link href="https://gael-varoquaux.info/programming/object-oriented-design-framework-objects-versus-data-containers.html" rel="alternate"></link><published>2009-07-01T05:13:00+02:00</published><updated>2009-07-01T05:13:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-07-01:/programming/object-oriented-design-framework-objects-versus-data-containers.html</id><summary type="html">&lt;p&gt;I find that in object oriented design, there are two kinds of objects:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;A first kind is the object encoding logics. This is an object for
which clever and complex design will hold together the logics of a
state-full application. It can often be part of a forest of objects …&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;I find that in object oriented design, there are two kinds of objects:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;A first kind is the object encoding logics. This is an object for
which clever and complex design will hold together the logics of a
state-full application. It can often be part of a forest of objects
that are linked together via design patterns. The interfaces of these
objects are driven by their active role in the application. These
objects are prominently present in interactive application and
interactive application. They are mostly particular to an application
or a framework, and are mostly implementation-defined.&lt;/li&gt;
&lt;li&gt;The second type of object is a data container. It strives to expose a
data model that can be of use in various situations, as it is the
link between different parts of the code that do not talk to each
other apart through data. It is responsible for loose coupling
(something that is very important to achieve a maintainable code
base) by having a light and shallow interface. It must be
interfaced-designed, rather than implementation-designed. One should
very easily get a grasp, an almost physical feeling, for the object
by simple interaction with it. I have what I call the ‘explaining
test’ for these objects: can I explain fully and completely to
somebody what the object does, and any possible caveat, without being
sidetracked into special discussions? If not, back to the drawing
board: the object will not gain acceptance. In my experience, only
the objects of the second kind can easily be shared between different
projects.&lt;/li&gt;
&lt;/ul&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="software engineering"></category><category term="software architecture"></category></entry><entry><title>SciPy abstract submission deadline extended</title><link href="https://gael-varoquaux.info/programming/scipy-abstract-submission-deadline-extended.html" rel="alternate"></link><published>2009-06-27T08:14:00+02:00</published><updated>2009-06-27T08:14:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-06-27:/programming/scipy-abstract-submission-deadline-extended.html</id><summary type="html">&lt;p&gt;Greetings,&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;The conference committee is extending the deadline for abstract&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;submission for the Scipy conference 2009 one week.&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;On Friday July 3th, at midnight Pacific, we will turn off the abstract&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;submission on the conference site. Up to then, you can modify the&lt;/div&gt;
&lt;div class="line"&gt;already-submitted abstract, or submit new abstracts.&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;The …&lt;/strong&gt;&lt;/p&gt;</summary><content type="html">&lt;p&gt;Greetings,&lt;/p&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;The conference committee is extending the deadline for abstract&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;submission for the Scipy conference 2009 one week.&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;On Friday July 3th, at midnight Pacific, we will turn off the abstract&lt;/div&gt;
&lt;div class="line-block"&gt;
&lt;div class="line"&gt;submission on the conference site. Up to then, you can modify the&lt;/div&gt;
&lt;div class="line"&gt;already-submitted abstract, or submit new abstracts.&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;The SciPy 2009 executive committee&lt;/strong&gt;&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Jarrod Millman, UC Berkeley, USA (Conference Chair)&lt;/li&gt;
&lt;li&gt;Gaël Varoquaux, INRIA Saclay, France (Program Co-Chair)&lt;/li&gt;
&lt;li&gt;Stéfan van der Walt, University of Stellenbosch, South Africa
(Program Co-Chair)&lt;/li&gt;
&lt;li&gt;Fernando Pérez, UC Berkeley, USA (Tutorial Chair)&lt;/li&gt;
&lt;/ul&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="scientific computing"></category><category term="conferences"></category></entry><entry><title>SciPy 2009 conference opened up for registration</title><link href="https://gael-varoquaux.info/programming/scipy-2009-conference-opened-up-for-registration.html" rel="alternate"></link><published>2009-06-19T14:53:00+02:00</published><updated>2009-06-19T14:53:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-06-19:/programming/scipy-2009-conference-opened-up-for-registration.html</id><summary type="html">&lt;p&gt;We are finally opening the registration for the SciPy 2009 conference.
It took us time, but the reason&amp;nbsp; is that we made careful budget
estimations to bring the registration cost down.&lt;/p&gt;
&lt;p&gt;We are very happy to announce that this year registration to the
conference will be only $150, tutorial $100 …&lt;/p&gt;</summary><content type="html">&lt;p&gt;We are finally opening the registration for the SciPy 2009 conference.
It took us time, but the reason&amp;nbsp; is that we made careful budget
estimations to bring the registration cost down.&lt;/p&gt;
&lt;p&gt;We are very happy to announce that this year registration to the
conference will be only $150, tutorial $100, and students get half
price! We made this effort because we hope it will open up the
conference to more people, especially students that often have to
finance such trip with little budget. As a consequence, however,
catering at noon is not included.&lt;/p&gt;
&lt;p&gt;This does not mean that we are getting a reduced conference. Quite on
the contrary, this year we have two keynote speakers. And what speakers:
Peter Norvig and Jon Guyer! Peter Norvig is the director of research at
Google and Jon Guyer is a research scientist at NIST, in the
Thermodynamics and Kinetics Group, where he leads a fiPy, a finite
element project in Python.&lt;/p&gt;
&lt;div class="section" id="the-scipy-2009-conference"&gt;
&lt;h2&gt;The SciPy 2009 Conference&lt;/h2&gt;
&lt;p&gt;SciPy 2009, the &lt;a class="reference external" href="http://conference.scipy.org/"&gt;8th Python in Science conference&lt;/a&gt;, will be held from
August 18-23, 2009 at Caltech in Pasadena, CA, USA.&lt;/p&gt;
&lt;p&gt;Each year SciPy attracts leading figures in research and scientific
software development with Python from a wide range of scientific and
engineering disciplines. The focus of the conference is both on
scientific libraries and tools developed with Python and on scientific
or engineering achievements using Python.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="call-for-papers"&gt;
&lt;h2&gt;Call for Papers&lt;/h2&gt;
&lt;p&gt;We welcome contributions from the industry as well as the academic
world. Indeed, industrial research and development as well academic
research face the challenge of mastering IT tools for exploration,
modeling and analysis.&lt;/p&gt;
&lt;p&gt;We look forward to hearing your recent breakthroughs using Python!
Please read the &lt;a class="reference external" href="http://conference.scipy.org/call_for_papers"&gt;full call for papers&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="important-dates"&gt;
&lt;h2&gt;Important Dates&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Friday, June 26: Abstracts Due&lt;/li&gt;
&lt;li&gt;Saturday, July 4: Announce accepted talks, post schedule&lt;/li&gt;
&lt;li&gt;Friday, July 10: Early Registration ends&lt;/li&gt;
&lt;li&gt;Tuesday-Wednesday, August 18-19: Tutorials&lt;/li&gt;
&lt;li&gt;Thursday-Friday, August 20-21: Conference&lt;/li&gt;
&lt;li&gt;Saturday-Sunday, August 22-23: Sprints&lt;/li&gt;
&lt;li&gt;Friday, September 4: Papers for proceedings due&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="section" id="the-scipy-2009-executive-committee"&gt;
&lt;h3&gt;The SciPy 2009 executive committee&lt;/h3&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Jarrod Millman, UC Berkeley, USA (Conference Chair)&lt;/li&gt;
&lt;li&gt;Gaël Varoquaux, INRIA Saclay, France (Program Co-Chair)&lt;/li&gt;
&lt;li&gt;Stéfan van der Walt, University of Stellenbosch, South Africa
(Program Co-Chair)&lt;/li&gt;
&lt;li&gt;Fernando Pérez, UC Berkeley, USA (Tutorial Chair)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I correct the typo in the original blog post: the sprints
are free, the tutorial are $100.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scientific computing"></category><category term="scipy"></category><category term="conferences"></category></entry><entry><title>Fuzzy on OOP and the French</title><link href="https://gael-varoquaux.info/programming/fuzzy-on-oop-and-the-french.html" rel="alternate"></link><published>2009-06-14T10:38:00+02:00</published><updated>2009-06-14T10:38:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-06-14:/programming/fuzzy-on-oop-and-the-french.html</id><content type="html">&lt;p&gt;&lt;a class="reference external" href="http://www.voidspace.org.uk/python/weblog/arch_d7_2009_06_13.shtml#e1097"&gt;Fantastic&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote class="epigraph"&gt;
Haha - I shake my fuzzywuzzy beard at you in bewilderment. Do you people
dislike OOP, the class statement is mere boilerplate to you, I mumble
incoherent French obscenities in your general direction. (Did you know
the French acronym for object-oriented programming is POO?).&lt;/blockquote&gt;
</content><category term="programming"></category><category term="python"></category><category term="humor"></category></entry><entry><title>Job offering for junior Python developer</title><link href="https://gael-varoquaux.info/programming/job-offering-for-junior-python-developer.html" rel="alternate"></link><published>2009-06-07T19:53:00+02:00</published><updated>2009-06-07T19:53:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-06-07:/programming/job-offering-for-junior-python-developer.html</id><summary type="html">&lt;p&gt;Our lab is seeking to hire an engineer to work on porting our machine
learning code to the scikit learn, adding tests and documentation and
packaging it.&lt;/p&gt;
&lt;p&gt;We are looking for someone motivated by quality in software and open
source. No prior scientific computing experience is required. You will
be …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Our lab is seeking to hire an engineer to work on porting our machine
learning code to the scikit learn, adding tests and documentation and
packaging it.&lt;/p&gt;
&lt;p&gt;We are looking for someone motivated by quality in software and open
source. No prior scientific computing experience is required. You will
be working in a highly stimulating research environment (&lt;a class="reference external" href="http://www-dsv.cea.fr/neurospin/"&gt;Neurospin&lt;/a&gt;),
near Paris and employed by the French research institute in computer
science and applied math (&lt;a class="reference external" href="http://www.inria.fr/saclay/home/view?set_language=en"&gt;INRIA&lt;/a&gt;), a prestigious institution.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://www-dsv.cea.fr/neurospin/"&gt;Neurospin&lt;/a&gt; is a research institute dedicated to the understanding of
the brain. You will be working with computer-assisted neurology
laboratory, the image-analysis and branch of Neurospin, in the small
‘Parietal’ INRIA team embedded in NeuroSpin and dedicated to statistical
modeling.&lt;/p&gt;
&lt;p&gt;Over the years, the lab has developed a set of tools for machine
learning and statistical analysis in Python (with some C). There are
some tools for this purpose available in the open-source world
(BSD-licensed) in the scikit learn. We want to extract the good and
unique parts of our internal library, and release it in the open source
world through the scikit learn. Our code is fully Python code, using
scipy and matlab, with some bindings to R. As we want the code to be
BSD-licensed, we will remove the bindings with R, and replace when
possible. The job does not involve developing new algorithms, but
testing, improving, and documenting the existing one. There is a big
quality assurance work to be done. The code needs to be put to the right
coding standards; APIs should be cleaned; tests added. Dead code should
be delete. There is some optimization work to be done. Also, if there is
any duplicated funcitonnality with the scikit learn, you should analyse
both code and determine which one to code. The job also involves working
with the community, documentating the code, and releasing the project,
including binary packages. And finally, all the original authors of the
algorithms, and experts in the field, are in the lab. So you will be
able to learn from them and pester them if there is a problem with the
code.&lt;/p&gt;
&lt;p&gt;In one word, this is about transforming an internal project, into a
leading open source project that will rock and live on!&lt;/p&gt;
&lt;p&gt;The job description is available &lt;a class="reference external" href="http://www.inria.fr/travailler/mrted/fr/jd/details.html?id=PGTFK026203F3VBQB6G68LONZ&amp;amp;LOV5=4510&amp;amp;ContractType=4545&amp;amp;LG=FR&amp;amp;Resultsperpage=20&amp;amp;nPostingID=3487&amp;amp;nPostingTargetID=7675&amp;amp;option=52&amp;amp;sort=DESC&amp;amp;nDepartmentID=10"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;There are to caveats: first it is a 2 year position. Second, you need to
have graduated recently (how recently I don’t know exactly, but I will
inquire).&lt;/p&gt;
&lt;p&gt;If you are interested, or just want to ask questions, don’t hesitate to
send me an e-mail, I am _really_ looking forward to collaborate with
someone motivated on this project.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;UPDATE:&lt;/strong&gt; I have more details on the restrictions of the job offering:
you need to have graduated in 2008 or 2009. This is a very hard
restriction, and I am recieving many excellent CVs that I even consider
because of this restriction. I am sorry, I cannot do anything about
it&lt;strong&gt;.&lt;/strong&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scipy"></category><category term="scientific computing"></category><category term="jobs"></category></entry><entry><title>Pycon FR: presentations and tutorials</title><link href="https://gael-varoquaux.info/programming/pycon-fr-presentations-and-tutorials.html" rel="alternate"></link><published>2009-05-16T16:25:00+02:00</published><updated>2009-05-16T16:25:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-05-16:/programming/pycon-fr-presentations-and-tutorials.html</id><summary type="html">&lt;p&gt;May 30th and 31st the French Python conference, &lt;a class="reference external" href="http://fr.pycon.org"&gt;Pycon FR&lt;/a&gt;, will be
held at ‘la citée des sciences’, la Villette, in Paris.&lt;/p&gt;
&lt;p&gt;The first day, I will be giving a one-hour-long tutorial (in French) on
numpy, scipy, and all the Python for Science jazz. On the following day,
I will …&lt;/p&gt;</summary><content type="html">&lt;p&gt;May 30th and 31st the French Python conference, &lt;a class="reference external" href="http://fr.pycon.org"&gt;Pycon FR&lt;/a&gt;, will be
held at ‘la citée des sciences’, la Villette, in Paris.&lt;/p&gt;
&lt;p&gt;The first day, I will be giving a one-hour-long tutorial (in French) on
numpy, scipy, and all the Python for Science jazz. On the following day,
I will be giving a half-hour-long talk to ilustrate the use of Python in
my current work: statistical analysis and modelling of brain activity.&lt;/p&gt;
&lt;p&gt;I’ll be giving my tutorial in one room, while David Larlet (the famous
Biologeek) will be giving one on Django in another room. Tough
competition :-P .&lt;/p&gt;
&lt;p&gt;The &lt;a class="reference external" href="http://fr.pycon.org/sessions"&gt;program&lt;/a&gt; of the conference is very eclectic, ranging from general
programming talks, to GUIs or web development. While this might deter
the pure scientific computing folks, I strongly encourage you to attend.
Indeed, a lot of the development, packaging, quality assurance, …
problems encountered in scientific computing are universal in computing.&lt;/p&gt;
&lt;p&gt;You might think that you are only interested in writing algorithms,or
processing data, but this code will have to live on. My experience is
that it is terribly hard to have code in a lab that can be somewhat
shared and live on when people move away to another lab, or stop having
time to maintain the code. Talks like&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/correction_d'un_bug_et_naissance_d'une_nouvelle_fonctionnalité_dans_cpython"&gt;Correction d’un bug et naissance d’une nouvelle fonctionnalité dans
CPython&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/contrôle_de_versions_de_source%3A_pourquoi%3F_comment%3F"&gt;Contrôle de versions de source: pourquoi? comment?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/de_la_qualité_dans_un_projet_en_python"&gt;De la qualité dans un projet en Python&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/amusez-vous_à_tester_avec_les_objets_farceurs_(mock)"&gt;Amusez-vous à tester avec les objets farceurs (mock)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;can probably be of some use.&lt;/p&gt;
&lt;p&gt;Also, don’t underestimate the fact that some other communities might
have solved some of the issues you struggle with. When dealing with
real-world problems, and not only developing algorithms on a few set of
test data, a large fraction of the code lines and related to IO,
interfaces, data massaging… Two years ago, I remember that I was not
terribly interested in the web-development talks. I tried to be
open-minded and listen to them, but… Now I have done a bit of web
development myself, and I have played with some of the famous ‘web
frameworks’. I can tell you, there are some really interesting concepts
there. The web guys have managed to extract a set of patterns from the
problems they face and provide excellent abstracts to data handling and
display. Can we learn from them? I am especially interested in getting
more insight from things like ORMs (object relational mappers), and
understanding better the web frameworks:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/django-roa_pour_une_architecture_orient%C3%A9e_ressources"&gt;Django-ROA pour une architecture orientée ressources&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/django-roa_pour_une_architecture_orient%C3%A9e_ressources"&gt;PyQt4: Un exemple de sur-mesure en Model/View/Delegate&lt;/a&gt; (this is
not about web, but MVC/MVD pattern has been used in web a lot and is
universal and very important, IMHO).&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/oubliez_le_sql_avec_sqlalchemy"&gt;Oubliez le sql avec SQLAlchemy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/developpement_d'applications_maintenables_avec_django"&gt;Developpement d’applications maintenables avec Django&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/turbogears_2%2C_présentation_et_introduction_(tutoriel)"&gt;Turbogears 2, présentation et introduction (tutoriel)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/programmer_couchdb_avec_couchdbkit"&gt;Programmer CouchDB avec couchdbkit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/réflexion_sur_l'utilisation_de_python_pour_le_développement_d'une_plateforme_web_d'annotation_génomique"&gt;Réflexion sur l’utilisation de python pour le développement d’une
plateforme web d’annotation&lt;/a&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/réflexion_sur_l'utilisation_de_python_pour_le_développement_d'une_plateforme_web_d'annotation_génomique"&gt;génomique&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/django_par_la_pratique%2C_premiers_pas"&gt;Oubliez le sql avec
SQLAlchemy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/django_par_la_pratique%2C_premiers_pas"&gt;Django par la pratique&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://fr.pycon.org/sessions/seances/python_et_les_bases_de_données_non_sql."&gt;Python et les bases de données non sql.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And finally, one more reason to come: it is so nice to actually get to
meet in real life people, and have a chat.&lt;/p&gt;
&lt;p&gt;So, see you there, for those who live in France.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="scipy"></category><category term="conferences"></category><category term="scientific computing"></category><category term="software engineering"></category></entry><entry><title>Minimum spanning tree</title><link href="https://gael-varoquaux.info/programming/minimum-spanning-tree.html" rel="alternate"></link><published>2009-05-10T23:52:00+02:00</published><updated>2009-05-10T23:52:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-05-10:/programming/minimum-spanning-tree.html</id><summary type="html">&lt;p&gt;Gary Ruben came up with the excellent idea of visualizing the minimum
spanning tree of a Delaunay tesselation in addition to Delaunay
tessalation itself. After he sent me his code, I spent some times
playing with it, because I found out that, with the right choice of
visualization parameter, it …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Gary Ruben came up with the excellent idea of visualizing the minimum
spanning tree of a Delaunay tesselation in addition to Delaunay
tessalation itself. After he sent me his code, I spent some times
playing with it, because I found out that, with the right choice of
visualization parameter, it gave me a nice understanding of what a
minimum spanning tree was: a tree structure of minimal total length
connecting all the vertices of the graphs, and embedded in the graph. On
the visualization, the Delaunay graph is displayed in grey, and the
minimum spanning tree in thick and colors.&lt;/p&gt;
&lt;img alt="" src="attachments/mst.jpg" /&gt;
&lt;p&gt;The minimum spanning tree is calculated using Prim’s algorithms, on the
fullly-connected distance-weighted graph of all points. One can clearly
see that is it embedded in the Delaunay graph. In fact I have tested
that calculating a minimum spanning tree on the Delaunay graph, or on
the complete graph, gave the same result.&lt;/p&gt;
&lt;p&gt;The code to create this picture can be found &lt;a class="reference external" href="attachments/mst_py"&gt;here&lt;/a&gt;.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="mayavi"></category><category term="scientific computing"></category><category term="algorithms"></category></entry><entry><title>Extracting the data from the Delaunay triangulation</title><link href="https://gael-varoquaux.info/programming/extracting-the-data-from-the-delaunay-triangulation.html" rel="alternate"></link><published>2009-05-01T16:42:00+02:00</published><updated>2009-05-01T16:42:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-05-01:/programming/extracting-the-data-from-the-delaunay-triangulation.html</id><summary type="html">&lt;p&gt;Gary Ruben just asked me if it was possible to retrieve the
triangulation information from my previous Delaunay example. Actually
the reason I came up with this example is that Emanuelle Gouillart, my
partner[*], needed to do Delaunay triangulation on some data. She was
kind enough to extract that code …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Gary Ruben just asked me if it was possible to retrieve the
triangulation information from my previous Delaunay example. Actually
the reason I came up with this example is that Emanuelle Gouillart, my
partner[*], needed to do Delaunay triangulation on some data. She was
kind enough to extract that code from her code base. &lt;a class="reference external" href="attachments/extract_delaunay_edges_py"&gt;Here&lt;/a&gt; it is.&lt;/p&gt;
&lt;p&gt;[*] The various languages do not seem to have evolved quickly enough to
cope with the fact that people can now have a stable long-term
relationship with someone you are not married to. What word should I be
using here: ‘girlfriend’, ‘partner’… ?&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>Mayavi image of the … month</title><link href="https://gael-varoquaux.info/programming/mayavi-image-of-the-month.html" rel="alternate"></link><published>2009-04-27T22:42:00+02:00</published><updated>2009-04-27T22:42:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-04-27:/programming/mayavi-image-of-the-month.html</id><summary type="html">&lt;p&gt;Tonight I sat down and played a bit with VTK’s Delaunay tessalation
filter. I wanted to inspect the local structure of a graph created by
Delaunay tessalation of random points. To see better the structure, I
selected a slab of the resulting unstructured grid. I think the image is …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Tonight I sat down and played a bit with VTK’s Delaunay tessalation
filter. I wanted to inspect the local structure of a graph created by
Delaunay tessalation of random points. To see better the structure, I
selected a slab of the resulting unstructured grid. I think the image is
not only instructive to explain what a Delaunay tessalation is, it also
looks pretty cool. Here is the image and the Mayavi &lt;a class="reference external" href="attachments/delaunay_py"&gt;script&lt;/a&gt; that
creates it.&lt;/p&gt;
&lt;img alt="" src="attachments/delaunay_mayavi.jpg" /&gt;
</content><category term="programming"></category><category term="python"></category><category term="scipy"></category><category term="mayavi"></category><category term="scientific computing"></category><category term="art"></category></entry><entry><title>Long sys.path and consequences, one more reason not to use easy_install</title><link href="https://gael-varoquaux.info/programming/long-syspath-and-consequences-one-more-reason-not-to-use-easy_install.html" rel="alternate"></link><published>2009-04-09T08:43:00+02:00</published><updated>2009-04-09T08:43:00+02:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-04-09:/programming/long-syspath-and-consequences-one-more-reason-not-to-use-easy_install.html</id><summary type="html">&lt;p&gt;For those who don’t know, sys.path is the path that the Python
interpreter traverse at each module import to look for the module file
imported.&lt;/p&gt;
&lt;p&gt;This blog post is about the consequences of having a long sys.path. I’ll
try and make it short, but I would …&lt;/p&gt;</summary><content type="html">&lt;p&gt;For those who don’t know, sys.path is the path that the Python
interpreter traverse at each module import to look for the module file
imported.&lt;/p&gt;
&lt;p&gt;This blog post is about the consequences of having a long sys.path. I’ll
try and make it short, but I would have a lot to say. I am just reacting
on &lt;a class="reference external" href="http://artificialcode.blogspot.com/2009/04/short-circuiting-python-module-lookup.html"&gt;Noah Gift’s post on performance improvement&lt;/a&gt;, not making a full
essay on why overloading sys.path is considered harmful.&lt;/p&gt;
&lt;p&gt;When using easy_install (or setuptools), each new project is installed
in a different directory, and the directory is added at runtime to the
sys.path (the addition at runtime confuses many users who are not aware
of it). As a result, you quickly end up with more than 40 directory on
your sys.path. These directories are ‘stat-ed’ one after the other on
each module import. Thus if you have a long sys.path, there are a large
amount of system calls to read directories. To check this out, simply
try:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;strace&lt;span class="w"&gt; &lt;/span&gt;python&lt;span class="w"&gt; &lt;/span&gt;-c&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;import foobar&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&amp;gt;&lt;span class="p"&gt;&amp;amp;&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;less
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can see the amount of noise created by a simple (failing) import
statement. On a system with high latency (such as an NFS, as we use at
work), this is very costly.&lt;/p&gt;
&lt;p&gt;Noah joyfully reports performance improvements by hijacking the Python
import mechanism. I claim that part of what Noah has done is not really
hijacking the import mechanism, it is undoing the hijacking performed by
setuptools.&lt;/p&gt;
&lt;p&gt;I know I am being rude, but many people raised this point before, and it
is not getting any traction from the setuptools maintainer. I claim that
you should not be using setuptools or easy_install if you want
performance or control. I claim that you should not be using setuptools
unless you understand well what you are doing (which defeats the name
easy_install).&lt;/p&gt;
&lt;p&gt;The way I install packages when I want good control via easy_install is
in a virtual environment to discovered the dependencies, and then:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;easy_install&lt;span class="w"&gt; &lt;/span&gt;-Zeab&lt;span class="w"&gt; &lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;package_name
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;to download the package for each required package, and&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;python&lt;span class="w"&gt; &lt;/span&gt;setup.py&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;--single-version-externally-managed&lt;span class="w"&gt; &lt;/span&gt;--record&lt;span class="w"&gt; &lt;/span&gt;./foobar
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;if the package itself is using setuptools.&lt;/p&gt;
&lt;p&gt;As you can see, setuptools make it really hard to do a clean install.
Its a design choice :(.&lt;/p&gt;
&lt;p&gt;Another alternative is to use &lt;a class="reference external" href="http://pypi.python.org/pypi/pip"&gt;pip&lt;/a&gt; which I strongly encourage.&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category></entry><entry><title>Mayavi documentation: in multiple small pages, or a few long ones</title><link href="https://gael-varoquaux.info/programming/mayavi-documentation-in-multiple-small-pages-or-a-few-long-ones.html" rel="alternate"></link><published>2009-03-15T00:58:00+01:00</published><updated>2009-03-15T00:58:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-03-15:/programming/mayavi-documentation-in-multiple-small-pages-or-a-few-long-ones.html</id><summary type="html">&lt;p&gt;Prabhu and I can’t decide: what is best for the documentation, have more
pages, and thus have them be small, or have longer pages, but have less.
Two specific examples:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/examples.html"&gt;http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/examples.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;http://code.enthought.com/projects/mayavi/docs …&lt;/a&gt;&lt;/p&gt;</summary><content type="html">&lt;p&gt;Prabhu and I can’t decide: what is best for the documentation, have more
pages, and thus have them be small, or have longer pages, but have less.
Two specific examples:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/examples.html"&gt;http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/examples.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Right now, these pages are split in smaller pages. Should all these
smaller pages be folded back in one long page? It would be a long page,
but all the information would be there.&lt;/p&gt;
&lt;p&gt;Neither Prabhu, nor I, want to decide solely on our personnal
preference. We would to do what suits users most. This I why we want
FEEDBACK :). Could you please give feedback by mail, or in a comment on
this blog. Thank you!&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="mayavi"></category></entry><entry><title>Mayavi on the web</title><link href="https://gael-varoquaux.info/programming/mayavi-on-the-web.html" rel="alternate"></link><published>2009-03-07T13:06:00+01:00</published><updated>2009-03-07T13:06:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-03-07:/programming/mayavi-on-the-web.html</id><summary type="html">&lt;p&gt;&lt;a class="reference external" href="http://ondrejcertik.blogspot.com/"&gt;Ondrej Certik&lt;/a&gt; has installed a &lt;a class="reference external" href="http://www.sagemath.org/"&gt;sage&lt;/a&gt; notebook on a server opened to
the net, with &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi&lt;/a&gt; installed on it. The result is that you have a
command line interface on the web, in which you can enter Mayavi
commands, and see the result. You have to be very careful to …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;a class="reference external" href="http://ondrejcertik.blogspot.com/"&gt;Ondrej Certik&lt;/a&gt; has installed a &lt;a class="reference external" href="http://www.sagemath.org/"&gt;sage&lt;/a&gt; notebook on a server opened to
the net, with &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/"&gt;Mayavi&lt;/a&gt; installed on it. The result is that you have a
command line interface on the web, in which you can enter Mayavi
commands, and see the result. You have to be very careful to switch
Mayavi in offscreen mode as soon as you load it. To see the result of a
plot, just save it in a file. The sage notebook will display the image.&lt;/p&gt;
&lt;a class="reference external image-reference" href="http://nb.hpfem.org/home/pub/16/"&gt;&lt;img alt="" src="attachments/mayavi2_in_sage1.png" /&gt;&lt;/a&gt;
&lt;p&gt;I have always had in mind the use of Mayavi as a backend for a
scientific web application, for instance for a neuromaging database, but
what is really stuning with this implementation, is the way you interact
with it: full-blown Python comand line.&lt;/p&gt;
</content><category term="programming"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>Mayavi UI issue</title><link href="https://gael-varoquaux.info/programming/mayavi-ui-issue.html" rel="alternate"></link><published>2009-02-18T09:25:00+01:00</published><updated>2009-02-18T09:25:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-02-18:/programming/mayavi-ui-issue.html</id><summary type="html">&lt;p&gt;I have been wanting to change slightly the design of a Mayavi dialog for
a while. Here is the issue: when you create a visualization, eg throught
the command line in &lt;a class="reference external" href="http://ipython.scipy.org/"&gt;IPython&lt;/a&gt;, whith &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;mlab&lt;/a&gt;, you get a nice and small
window with only your visualization, and a toolbar. If you …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I have been wanting to change slightly the design of a Mayavi dialog for
a while. Here is the issue: when you create a visualization, eg throught
the command line in &lt;a class="reference external" href="http://ipython.scipy.org/"&gt;IPython&lt;/a&gt;, whith &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;mlab&lt;/a&gt;, you get a nice and small
window with only your visualization, and a toolbar. If you want to
change the properties of the objects on the visualization, or add some
more, you need to click on a button on the toolbar, which displays a
dialog, from which you can open more dialogs to edit the objects:&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/mayavi_old_ui.png"&gt;&lt;img alt="" class="align-center" src="attachments/mayavi_old_ui.png" /&gt;&lt;/a&gt;
&lt;p&gt;I am thinking to changing this to a single dialog:&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/mayavi_new_ui.png"&gt;&lt;img alt="" class="align-center" src="attachments/mayavi_new_ui.png" /&gt;&lt;/a&gt;
&lt;p&gt;The single-object-editing dialogs could still be opened by
double-clicking on the pipeline.&lt;/p&gt;
&lt;p&gt;I am not going to discuss why I believe the new version would be better
than the old one, because I do not want to bias people. However, I would
prefer not making the decision to change based only on my feelings. So I
ask everybody, users of Mayavi or not: what do you think is better? And
why? I will probably leave an option to have the old behavior, anyhow,
but the default is very important.&lt;/p&gt;
</content><category term="programming"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>Error in my article</title><link href="https://gael-varoquaux.info/programming/error-in-my-article.html" rel="alternate"></link><published>2009-01-27T22:00:00+01:00</published><updated>2009-01-27T22:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-01-27:/programming/error-in-my-article.html</id><summary type="html">&lt;p&gt;There is an error in a code example in my article that just came out in
Linux Magazine France. I am so ashamed. I did test the code, but I
didn’t have automated tests, so I broke it when tweaking it :(. I think
the lesson is that you need …&lt;/p&gt;</summary><content type="html">&lt;p&gt;There is an error in a code example in my article that just came out in
Linux Magazine France. I am so ashamed. I did test the code, but I
didn’t have automated tests, so I broke it when tweaking it :(. I think
the lesson is that you need to do more than doc-testing articles (it was
doc-tested).&lt;/p&gt;
&lt;p&gt;The code example is about calculating the Mandebrot set. The idea is
that you take a grid of the numbers c complex plane, and iterate on it
the function f = lambda z: z**2 + c. You look at the divergence of
this iteration, and plotting a mesure of the divergence gives you a nice
figure. The code I wrote was:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ogrid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ones&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;
&lt;span class="n"&gt;threshold_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;threshold_time&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The error is subtle. First there is the not so subtle mask error: I am
masking the points that diverge, and iterate them even further. This is
exactly the opposite that I meant to do. Then there is the more subtle
bug: the line “z[mask] = z[mask]**2 + c[mask]” is an in-place
assignment. As a result the dtype of z is not modified: z is not
magically cast in a complex. Thus the imaginary information coming from
c is lost. And that information is crucial to Mandelbrot.&lt;/p&gt;
&lt;p&gt;The right code is:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ogrid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;isnan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;complex&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;
&lt;span class="n"&gt;threshold_time&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nb"&gt;complex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ones&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mask&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;mask&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;threshold_time&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;mask&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Plot the threshold_time array with pylab.imshow (from the matlplotlib
project) to get a nice figure.&lt;/p&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="scientific computing"></category></entry><entry><title>Mayavi image of the fortnight</title><link href="https://gael-varoquaux.info/programming/mayavi-image-of-the-fortnight.html" rel="alternate"></link><published>2009-01-25T19:21:00+01:00</published><updated>2009-01-25T19:21:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-01-25:/programming/mayavi-image-of-the-fortnight.html</id><summary type="html">&lt;p&gt;It’s been two weeks since I posted a ‘Mayavi image of the week’. Prabhu
has made a really cool example of integrating trajectories in a 3D
vector field, using, of course, the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Lorenz_system"&gt;Lorenz equation&lt;/a&gt; for the 3D field.
With nice colors, it makes a new fantastic image:&lt;/p&gt;
&lt;img alt="" src="https://gael-varoquaux.info/programming/attachments/lorentz_mayavi.png" /&gt;
&lt;p&gt;The green …&lt;/p&gt;</summary><content type="html">&lt;p&gt;It’s been two weeks since I posted a ‘Mayavi image of the week’. Prabhu
has made a really cool example of integrating trajectories in a 3D
vector field, using, of course, the &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Lorenz_system"&gt;Lorenz equation&lt;/a&gt; for the 3D field.
With nice colors, it makes a new fantastic image:&lt;/p&gt;
&lt;img alt="" src="https://gael-varoquaux.info/programming/attachments/lorentz_mayavi.png" /&gt;
&lt;p&gt;The green surface represented is an isosurface of the z component of the
vector field: on this surface, the z-component changes sign. This can be
seen from the trajectories, as they start going up once they pass the
surface. The script generating this image is checked in as an example:
&lt;a class="reference external" href="https://svn.enthought.com/enthought/browser/Mayavi/trunk/examples/mayavi/lorenz.py"&gt;https://svn.enthought.com/enthought/browser/Mayavi/trunk/examples/mayavi/lorenz.py&lt;/a&gt;&lt;/p&gt;
</content><category term="programming"></category><category term="python"></category><category term="science"></category><category term="mayavi"></category></entry><entry><title>LinuxMag special edition on Python</title><link href="https://gael-varoquaux.info/programming/linuxmag-special-edition-on-python.html" rel="alternate"></link><published>2009-01-24T12:42:00+01:00</published><updated>2009-01-24T12:42:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-01-24:/programming/linuxmag-special-edition-on-python.html</id><summary type="html">&lt;p&gt;The French LinuxMag just published a special edition on Python, in which
I authored a 12-page article on scientific computing. The edition is in
French, so if you don’t speak French, it is of limited interested.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Ce dossier hors-série est une excellente ressource pour découvrir
Python, entre autre par …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The French LinuxMag just published a special edition on Python, in which
I authored a 12-page article on scientific computing. The edition is in
French, so if you don’t speak French, it is of limited interested.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;Ce dossier hors-série est une excellente ressource pour découvrir
Python, entre autre par ce qu’il présente Python sous beaucoup d’aspects
différents, et permet donc de découvrir quelles sont les outils avancés
disponibles pour s’attaquer à une tâche particulière.&lt;/p&gt;
&lt;p&gt;Lien vers la présentation du magazine, ainsi que où l’acheter:
&lt;a class="reference external" href="http://www.gnulinuxmag.com/index.php/2009/01/23/gnulinux-magazine-hs-n%C2%B040-janvierfevrier-2009-chez-votre-marchand-de-journaux"&gt;http://www.gnulinuxmag.com/index.php/2009/01/23/gnulinux-magazine-hs-n%C2%B040-janvierfevrier-2009-chez-votre-marchand-de-journaux&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Feuilletez en ligne: &lt;a class="reference external" href="http://ed-diamond.com/feuille_lmhs40/index.html"&gt;http://ed-diamond.com/feuille_lmhs40/index.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Il ne coûte que 6 euros 50, et est disponible dans tous les kiosques de
France (sauf au Monoprix à coté de chez moi :&amp;lt;). Ce n’est pas bien cher
pour une soixantaine de pages d’informations spécialisés. Achetez le,
même si vous connaissez bien Python, et n’apprendrez rien de nouveau.
Vous le laisserez traîner au boulot, pour faire de la propangande
passive :).&lt;/p&gt;
&lt;p&gt;Cela fait plus d’un ans que les auteurs travaillent sur leurs articles.
Je ne sais pas si cela dénote un grand perfectionnisme, ou une grande
inefficacité :). En tout cas, un grand merci à Philippe Biondi qui a été
le maître d’oeuvre du project, et qui l’a tiré en avant.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Le PDF de l’article&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;J’ai mis le PDF de l’article &lt;a class="reference external" href="./my-article-on-scientific-computing-with-python.html"&gt;en ligne&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="publishing"></category></entry><entry><title>Mayavi image of the week</title><link href="https://gael-varoquaux.info/programming/mayavi-image-of-the-week.html" rel="alternate"></link><published>2009-01-13T00:38:00+01:00</published><updated>2009-01-13T00:38:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2009-01-13:/programming/mayavi-image-of-the-week.html</id><summary type="html">&lt;p&gt;The title of this post is a lure: there won’t be a Mayavi image each
week, because I would run out quickly. But it sounded cool.&lt;/p&gt;
&lt;p&gt;Anyway, here is an image of a graph, visualized with Mayavi. The graph
is actually a protein structure, downloaded from the PDB. The …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The title of this post is a lure: there won’t be a Mayavi image each
week, because I would run out quickly. But it sounded cool.&lt;/p&gt;
&lt;p&gt;Anyway, here is an image of a graph, visualized with Mayavi. The graph
is actually a protein structure, downloaded from the PDB. The Python
script producing this visualization is checked in as a Mayavi example:
&lt;a class="reference external" href="https://svn.enthought.com/enthought/browser/Mayavi/trunk/examples/mayavi/protein.py"&gt;https://svn.enthought.com/enthought/browser/Mayavi/trunk/examples/mayavi/protein.py&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The part of the code to read the PDB file is actualy way longer than the
visualization part.&lt;/p&gt;
&lt;p&gt;I hope this script inspires people trying to visualize graphs. The
combination of the GaussianSplatter filter and the volume rendering to
create a halo renders really well, IMHO.&lt;/p&gt;
&lt;img alt="" src="https://gael-varoquaux.info/programming/attachments/protein_mayavi.png" /&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="python"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>Tracking objects in scientific code</title><link href="https://gael-varoquaux.info/programming/tracking-objects-in-scientific-code.html" rel="alternate"></link><published>2008-12-23T01:26:00+01:00</published><updated>2008-12-23T01:26:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2008-12-23:/programming/tracking-objects-in-scientific-code.html</id><summary type="html">&lt;p&gt;When I started working in my new field (data analysis of functional
brain images), I was surprised to find in our data-analysis scripts what
I thought was a very particular &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Code_smell"&gt;code smell&lt;/a&gt;: the numerical code is
always doing a lot of filename and path manipulation, loading and saving
data even …&lt;/p&gt;</summary><content type="html">&lt;p&gt;When I started working in my new field (data analysis of functional
brain images), I was surprised to find in our data-analysis scripts what
I thought was a very particular &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Code_smell"&gt;code smell&lt;/a&gt;: the numerical code is
always doing a lot of filename and path manipulation, loading and saving
data even in the core routines. I couldn’t picture what seemed wrong
with this, but I was uncomfortable with it.&lt;/p&gt;
&lt;div class="section" id="the-good"&gt;
&lt;h2&gt;The good&lt;/h2&gt;
&lt;div class="section" id="memory-management"&gt;
&lt;h3&gt;Memory management&lt;/h3&gt;
&lt;p&gt;In the data-processing work I am currently doing, we deal with large
objects, mostly huge numpy arrays, though sometimes some domain-specific
data containers creep in. As a result, simple calculations take time (an
SVD is 10 minutes), and I am always fighting with memory.&lt;/p&gt;
&lt;p&gt;Saving to disk is a handy way of freeing memory. Moreover, with
memmapping, reading only the relevant parts of pre-calculated arrays
becomes very cheap.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="crash-resistance"&gt;
&lt;h3&gt;Crash-resistance&lt;/h3&gt;
&lt;p&gt;When the simplest operation takes ten minutes, you want to save
intermediate steps, to be able to resume calculations, or to inspect why
the code crashed. And who knows, you might need this intermediate step.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-bad"&gt;
&lt;h2&gt;The bad&lt;/h2&gt;
&lt;p&gt;The immediate apparent problem is that your code becomes riddled with
path-management code. We often joke that once we have figured out the
algorithm, the longest surviving piece of code is the path-related junk.&lt;/p&gt;
&lt;p&gt;But, I believe this is only the tip of the iceberg, and that this code
smell hints to deeper problems.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-ugly"&gt;
&lt;h2&gt;The ugly&lt;/h2&gt;
&lt;div class="section" id="loss-of-scoping"&gt;
&lt;h3&gt;Loss of scoping&lt;/h3&gt;
&lt;p&gt;When I started working on these problems, I was startled to encounter
basic domain-specific algorithmic functions taking input and output data
filenames. It took me a while to realize that the huge problem with this
is that I loose scoping, or in other words naming locality. Let us
pretend that I have a function ‘foo’ that does basic numerics on large
numpy arrays, but to save memory it takes as a signature the name of the
file where the input array is stored, and the name of the filename where
the output array should be stored. So I have some code that looks like
this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_sessions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_files&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;session_file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_file&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s1"&gt;&amp;#39;.out&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Saving to files in the loop is a huge gain of memory;&lt;/p&gt;
&lt;p&gt;Now I decide I want to add a parameter to foo, and vary this parameter,
with, eg:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;param&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;process_sessions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;param&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;My code is hard to refactor, because I need to introduce modifications
deep in all subroutines to make sure they do not save their outputs in
the same file.&lt;/p&gt;
&lt;p&gt;Suppose session_files are actually extracted from an upstream dataset,
and now I want to apply my algorithm on a set of these upstream
datasets, and in parallel. Once again I need to generate a score of new
filenames and keep track of them. I can use temporary files, but I need
to keep hold of this information too, and I loose most of my
crash-resistance.&lt;/p&gt;
&lt;p&gt;When you think it over, the way programming languages solve this problem
elegantly, is by the rules connecting names to objects, and in
particular scoping: a name corresponds to an object in a given function.
Using files is equivalent to using globals, and we have to cook up our
own scoping rules (which results in a lot of path-massaging code).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="no-history-tracking"&gt;
&lt;h3&gt;No history tracking&lt;/h3&gt;
&lt;p&gt;When I find a file on the disk, I do not really know how it has been
generated. As a results, the crash-resistance is compromised. Moreover,
when tweaking algorithms, we often try to rerun only the necessary parts
of the algorithms, relying on the precomputed parts saved to the disk.
We comment out code, or exercise different code paths. As a result we
often end up in situations where the whole code does not actually run.
And once again refactoring is hard, because we have not expressed the
dependency relations between our intermediate results.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="doing-better"&gt;
&lt;h2&gt;Doing better?&lt;/h2&gt;
&lt;p&gt;Once again, today I was refactoring my algorithm, or my “pipeline” as we
call it. And once again, I felt the failure to have the proper tools,
the proper abstractions, words, to express the problem in the code.
Manipulating files directly seems wrong, for the reason expressed above.
But can we do better?&lt;/p&gt;
&lt;p&gt;The problem, I believe, is that we need a lightweight persistence
framework adapted to scientific purposes. I remember telling Travis
Vaught a few weeks before beginning my new job that scientists had no
problem with their persistence. Well, I was so wrong.&lt;/p&gt;
&lt;p&gt;By a persistence framework, I do not mean a persistence mechanism, like
numpy.save, or hdf5, or a database. I am interested in the objects with
which we represent it in the code. How do we solve the scoping problem?
And the history problem? Can we implement a “trajectory tracking”, to
reuse the &lt;a class="reference external" href="http://article.gmane.org/gmane.comp.python.french/5423"&gt;words of Alexandre Fayolle&lt;/a&gt;, for our data containers?&lt;/p&gt;
&lt;p&gt;I am thinking about a small set of well-thought abstractions, a bit like
the use of ORM (object relational mappers) in web application, that
would take care of the mapping from in-memory objects to objects on the
disk for us.&lt;/p&gt;
&lt;p&gt;I am starting to have some ideas. I am thinking in terms of context
objects, with getattr tricks to do the mapping to a database doing the
bookkeeping and the trajectory tracking, and doing the impedance
matching with objects stored as numpy “.npy” files, hdf5 files, nifti
files, or whatever you want. The added value of a database would be that
it would give some robust locking, and possible network abstraction, to
allow for crash-safety, and parallel or distributed computing.&lt;/p&gt;
&lt;p&gt;This may sound overkill, or overcomplicated. I’ve tried simple things.
They all failed.&lt;/p&gt;
&lt;p&gt;This is a problem that matters a lot to me. I feel I am loosing a lot of
time on this. However I feel that the effort to do something good is
quite important. I am also afraid of polluting my numerical code with
unnecessary abstractions. The main problem is that attempting to solve
this problem would require a significant investment in time, and I don’t
really see where I can find this time.&lt;/p&gt;
&lt;p&gt;Have people encountered similar problems? Do you have any suggestions,
any trick to share?&lt;/p&gt;
&lt;p&gt;I’d be very happy to read any comments that can move forward my
thinking, even if it is about pointing out problems and not solutions. I
still think I haven’t identified the problems well.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: I have just realized that I will be almost without internet
access for the next week, starting from pretty much now. Looks like it
was a bad moment to start a thrilling discussion. I guess I got carried
away by the discontent of a day doing some bad refactoring. I really
look forward to catching up when I come back. Please forgive me for the
bad timing.&lt;/p&gt;
&lt;div class="topic"&gt;
&lt;p class="topic-title"&gt;&lt;strong&gt;Update&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Patterns that derived from this line of thoughts are now implemented
in the &lt;a class="reference external" href="https://pythonhosted.org/joblib/"&gt;joblib&lt;/a&gt; library.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="python"></category><category term="scientific computing"></category><category term="software engineering"></category><category term="software architecture"></category><category term="selected"></category><category term="joblib"></category></entry><entry><title>What’s new in Mayavi 3.1.0?</title><link href="https://gael-varoquaux.info/programming/whats-new-in-mayavi-310.html" rel="alternate"></link><published>2008-12-11T00:56:00+01:00</published><updated>2008-12-11T00:56:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2008-12-11:/programming/whats-new-in-mayavi-310.html</id><summary type="html">&lt;p&gt;Mayavi 3.1.0 has just been released, and I think it is a fantastic
version. We are starting to be able to focus on the details and the
focus. In addition, we are getting user feedback, which helps identify
the pain points.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automatic scripting&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a huge deal …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Mayavi 3.1.0 has just been released, and I think it is a fantastic
version. We are starting to be able to focus on the details and the
focus. In addition, we are getting user feedback, which helps identify
the pain points.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Automatic scripting&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is a huge deal! You now have a &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/automatic_scripting.html%20"&gt;record button&lt;/a&gt; on the pipeline
view. In record mode, the modifications that you make to the objects
properties are recorded as valid Python lines: Mayavi tells you what are
the line of code to modify those properties or create new objects. I use
this a lot: I first build a skeletton of a visualization using &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;mlab&lt;/a&gt;
but when it comes to tuning parameters, I do it interactively, and
record.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Much more testing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We added a huge amount of testing (many thanks to Suyog who contribed
quite few). From an user’s point of view this has two consequences. First
the code is more robust (for instance the mlab commands are more flexible
on the shape of the arguments passed in). Second the rendering part of
the Mayavi engine is well-separated from the algorithmes, which means
that the VTK algorithms &lt;a class="reference external" href="https://mail.enthought.com/pipermail/enthought-dev/2008-December/018935.html"&gt;can now be used&lt;/a&gt; easily to manipulate numpy
arrays through Mayavi.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two new mlab functions: barchart and triangular_mesh&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Mlab has &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html#d-data"&gt;two new functions&lt;/a&gt;: one to create nice bar chart, for 2D
histograms displayed in 3D, and one to build meshes defined from their
triangle.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Control of the pipeline through mlab is easier and more robust&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As the mlab.pipeline is getting more usage, it is being ironed out. For
instance applying a module to a source object (may it be a Mayavi
source, or a vtk dataset) adds it automatically to the figure if it is
not already in it. Also, when adding an additional module on an existing
source, a new module manager (object controlling the colormap) is added
automatically if the colormaps or extents differ. Many modules take
keyword arguments to make common operations easier.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IPython in Mayavi&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If you have a recent version of IPython installed (&amp;gt; 0.9), Mayavi will
use an IPython widget, instead of the vanilla pyshell.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;mlab.view has now a sensible behavior&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The mlab.view no longer gives a bad roll angle to the camera. This makes
it much easier to do animations during which the camera moves.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Axes and outline extents&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;mlab.axes and mlab.outline now adjust by default to the extents of the
object they are applied on. This removes a bad surprise for people having
tuned the scale of their visualization.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;enthought.tvtk.tools.visual in Mayavi&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;enthought.tvtk.tools.visual can now be used inside Mayavi, to provide a
&lt;a class="reference external" href="https://mail.enthought.com/pipermail/enthought-dev/2008-October/018402.html"&gt;visual-like&lt;/a&gt; API in mayavi.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Documentation has recieved some love&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Documentation has been added and completed, with a focus on making it
easier for the beginner to discover the features of Mayavi. We try more
and more to walk the user through complete usecases of Mayavi, in a
task-oriented documentation, such as in the introductory examples, or in
&lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html#case-studies-of-some-visualizations"&gt;case-studies&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two new sources&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There are two new sources that do not require data. The first creates
objects, such as an arrow, a cube, or a view of the earth, to be viewed
with a ‘surface’ module. The second creates image data, such as a disk,
or a 2d gaussian, or (my favorite) the Mandelbrot set. This can be
viewed with an ImageActor, or (even better) with a WarpScalar filter and
a Surface. These sources have been contributed by Suyog.&lt;/p&gt;
&lt;div class="section" id="a-word-of-thanks"&gt;
&lt;h2&gt;A word of thanks&lt;/h2&gt;
&lt;p&gt;I am sure I am going to forget some people here, but I’d like to thank a
lot those who have been helping us with getting Mayavi2 going. First of
all, Dave Peterson, who is doing the release management for ETS. This is
a lot of work, and we would never have frequent releases without him.
I’d also like to thank Suyog Jain, who contributed some code. This is
fantastic, and I am sure we are going to have more people contributing
improvements. Finally, I’d like to thank Pierre Raybault, of
Python(x,y), and Varun Hiremath, of Debian. Packaging is very important
to our users, and it is not a trivial piece of work… Hum, I almost
forgot Chris Casey. Chris has been updating the docs on the net and
making sure the docs build well. This is also very important, as the web
page is a major means of communication with our users.&lt;/p&gt;
&lt;/div&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>120 pages!</title><link href="https://gael-varoquaux.info/programming/120-pages.html" rel="alternate"></link><published>2008-12-09T01:00:00+01:00</published><updated>2008-12-09T01:00:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2008-12-09:/programming/120-pages.html</id><summary type="html">&lt;p&gt;The mayavi manual in SVN has now 120 pages when compiled to pdf. I know
that this is a stupid metric, and that the quality is more important
then the number of pages, but it does give me a warm and fuzzy feeling.&lt;/p&gt;
&lt;p&gt;More seriously, next release of Mayavi (coming …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The mayavi manual in SVN has now 120 pages when compiled to pdf. I know
that this is a stupid metric, and that the quality is more important
then the number of pages, but it does give me a warm and fuzzy feeling.&lt;/p&gt;
&lt;p&gt;More seriously, next release of Mayavi (coming soon, we promise) is
going to have a lot of added documentation for the casual users. In
particular the &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;mlab section&lt;/a&gt; has been expended a lot and is starting
to hint at Mayavi’s full power.&lt;/p&gt;
&lt;p&gt;Thanks to Chris Casey, who is making sure that the docs land on the net
as soon as they are written.&lt;/p&gt;
</content><category term="programming"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>Using Mayavi to explore a potential field</title><link href="https://gael-varoquaux.info/programming/using-mayavi-to-explore-a-potential-field.html" rel="alternate"></link><published>2008-11-22T15:22:00+01:00</published><updated>2008-11-22T15:22:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2008-11-22:/programming/using-mayavi-to-explore-a-potential-field.html</id><summary type="html">&lt;p&gt;As promised, here is the sequel to the tutorial I posted yesterday on
using Mayavi with scipy to understand the trajectories of a particle in
a potential. (chances are you are reading this before my previous post.
I suggest you first jump to my previous post, and then come back …&lt;/p&gt;</summary><content type="html">&lt;p&gt;As promised, here is the sequel to the tutorial I posted yesterday on
using Mayavi with scipy to understand the trajectories of a particle in
a potential. (chances are you are reading this before my previous post.
I suggest you first jump to my previous post, and then come back here).&lt;/p&gt;
&lt;p&gt;This tutorial shows you how to use the powerfull VTK and Mayavi feature
to explore the trajectories in the same potential. However, the tools we
are using do not given us as much control on the dynamics of the system,
so this time we do not add damping or oscillation of the potential. At
the end of the day, the resulting visualization is however much more
interactive. Once again, I would like as much feedback as possible, as
this is intended for the Mayavi User Guide.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;In this example, we create a vector field from the gradient of a scalar
field and explore it interactively. This example shows you how to do
some operations similar to the previous example, but interactively,
using the filters and module. This approach requires a better knowledge
of Mayavi and the VTK filters, but the big gain is that the resulting
visualization can be explored interactively.&lt;/p&gt;
&lt;p&gt;First, let us create the same scalar field as the previous example:. We
open Mayavi and enter the following code in the Python shell:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;enthought.mayavi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;V&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; A 3D sinusoidal lattice with a parabolic confinement. &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mgrid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contour3d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As in the previous example, we can change the color map and the values
chosen in the isosurfaces.&lt;/p&gt;
&lt;p&gt;We want to take the gradient of the scalar field, to create a vector
field. To do this we are going to use the CellDerivatives filter, that
takes derivatives of the data located in the cells (that is, between the
points, see &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/data.html"&gt;*Creating data for Mayavi*&lt;/a&gt;). For this, we first need to
interpolate the data from the points where it is located to the cells,
using a PointToCellData filter. We can then apply our CellDerivatives
filter, and then a CellToPointData filter to get point data back.
(remark: if you are not using the latest Mayavi from SVN - 3.1.0 - you
need to enable the ‘pass data’ option in the two CellToPointData and
PointToCellData filters).&lt;/p&gt;
&lt;p&gt;To visualize the vector field, we can use a VectorCutPlane module. The
resulting vectors are too large, and we can go to the Glyph tab, (and
the Glyph tab in this tab), to reduce the scale factor to 0.2. The
vector field is still too dense, therefore we go to the Masking tab to
enable masking, mask with an on ratio of 6 (one arrow out of 6 is
masked) and turn off the random mode.&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/vector_cut_plane2.jpg"&gt;&lt;img alt="" src="attachments/vector_cut_plane2.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;To have nice colors, we also changed the color map of the vector field
by going to the Colors and legend node just above the VectorCutPlane,
and choosing a look up table &lt;strong&gt;in the VectorLUT&lt;/strong&gt; tab, as there can be
different color maps for vector data and scalar data.&lt;/p&gt;
&lt;p&gt;Unlike the previous example, we can play with all the parameters in the
dialog box, like masking, or select color_by_scalar in the Glyph tab,
to display the value of the potential. We can also move the cut plane
used to display the vectors by dragging it.&lt;/p&gt;
&lt;p&gt;Now that we have a 3D vector field, we can also use Mayavi to integrate
the trajectory of a particle in it. For this we can use the streamline
module. It displays trajectories starting from the vertices of a seed
surface. We choose (in the Seed tab) a Point Widget as a seed. We can
then move the seed point by dragging it along in the 3D scene. This
allows us to explore the trajectories in the potential created by the
initial scalar field. In our case, all the trajectories end up in a
local potential minimum, and moving the seed point along lets us see in
which minimum each point will fall into, in other world the basin of
attraction of each local minimum.&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/streamline.jpg"&gt;&lt;img alt="" src="attachments/streamline.jpg" /&gt;&lt;/a&gt;
</content><category term="programming"></category><category term="mayavi"></category><category term="scipy"></category><category term="scientific computing"></category></entry><entry><title>Using Mayavi with Scipy: a tutorial</title><link href="https://gael-varoquaux.info/programming/using-mayavi-with-scipy-a-tutorial.html" rel="alternate"></link><published>2008-11-22T00:19:00+01:00</published><updated>2008-11-22T00:19:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2008-11-22:/programming/using-mayavi-with-scipy-a-tutorial.html</id><summary type="html">&lt;p&gt;Many years ago, I was working with a bright undergrad on the
trajectories of a atoms in a complex light field created by the
intersection of two laser beams. She had developped a code in C, and I
was starting to discover Python, so we had binded in t in …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Many years ago, I was working with a bright undergrad on the
trajectories of a atoms in a complex light field created by the
intersection of two laser beams. She had developped a code in C, and I
was starting to discover Python, so we had binded in t in Python. We
where using the Python binding, together with ipython and matplotlib to
explore and debug the code. However, our problem was readlly
fundementally 3D, and I din’t find the status of the 3D plotting tools
in Python satisfying.&lt;/p&gt;
&lt;p&gt;That usecase was very much on my mind while working on Mayavi, as I have
always believed that Mayavi and ipython could make a fantastic steering
and debugging tool for 3D Physics code. I think Mayavi is starting to be
pretty mature for this set of problems and as I am improvong the docs, I
decided to write a tutorial example on this specific problem. I am
posting it here as a preview. This is going to go in the docs, so
please, if you have any comments that might improve it, fire away.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;p&gt;This tutorial example shows you how how you can use Mayavi interactively
to visualize &lt;a class="reference external" href="http://www.scipy.org/"&gt;numpy&lt;/a&gt; arrays while doing numerical work with &lt;a class="reference external" href="http://www.scipy.org/"&gt;scipy&lt;/a&gt;.
It assumes that you are familiar with numerical Python tools, and shows
you how to use Mayavi in combination with these tools.&lt;/p&gt;
&lt;p&gt;Let us study the trajectories of a particle in a potential. This is a
very common problem in physics and engineering, and visualization of the
potential and the trajectories is key to developing an understanding of
the problem.&lt;/p&gt;
&lt;p&gt;The potential we are interested is a periodic lattice, immersed in a
parabolic confinement. We will shake this potential and see how the
particle jumps from a hole of the lattice to another. The parabolic
confinement is there to limit the excursions of the particle:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;V&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; A 3D sinusoidal lattice with a parabolic confinement. &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that we have defined the potential, we would like to see what it
looks like in 3D. To do this we can create a 3D grid of points, and
sample it on these points:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mgrid&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We are going to use the mlab module (see &lt;a class="reference external" href="http://code.enthought.com/projects/mayavi/docs/development/html/mayavi/mlab.html"&gt;*Simple Scripting with
mlab*&lt;/a&gt;) to interactively visualize this volumetric data. For this it is
best to type the commands in an interactive Python shell, either using
the built-in shell of the Mayavi2 application, on in ipython -wthread.
Let us visualize the 3D isosurfaces of the potential:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;enthought.mayavi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contour3d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can interact with the visualization created by the above command by
rotating the view, but to get a good understanding of the structure of
the potential, it is useful to vary the iso-surfaces. We can do this by
double-clicking on the IsoSurface in the Mayavi pipeline tree (if you
are running from ipython, you need to click on the Mayavi icon on the
scene to pop up the pipeline). This opens a dialog which lets us select
the values of the contours used. A good view of the potential can be
achieved by turning off auto contours and choosing -0.5 as a first
contour value (eg by entering it in the text box on the right, and
pressing tab). A second contour can be added by clicking on the blue
arrow and selecting “Add after”. Using a value of 15 gives a nice
result.&lt;/p&gt;
&lt;p&gt;We can now click on the Colors and legends on the pipeline and change
the colors used, by selecting a different LUT (Look Up Table). Let us
select ‘Paired’ as it separates well levels.&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/potential_ipython.jpg"&gt;&lt;img alt="" src="attachments/potential_ipython.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;To get a better view of the potential, we would like to display more
contours, but the problem with this approach is that closed contours
hide their interior. On solution is to use a cut plane. Right-click on
the IsoSurface node and add a ScalarCutPlane through the “Add module”
sub menu. You can move the cut plane by clicking on it and dragging.&lt;/p&gt;
&lt;p&gt;To make the link between our numpy arrays and the visualization, we can
use the same menu to add a Axes and an Outline. Finally, let us add a
colorbar. We can do this by typing:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;colorbar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Potential&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;vertical&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or using the options in the LUT dialog visited earlier.&lt;/p&gt;
&lt;a class="reference external image-reference" href="attachments/potential.jpg"&gt;&lt;img alt="" src="attachments/potential.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;We want to study the motion of a particle in this potential. For this we
need to derive the corresponding force, given by the gradient of the
potential. We create a gradient function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;gradient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; Return the gradient of f in (x, y, z). &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;fx&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fx_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fy&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fy_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fz&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fz_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fx&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fx_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fy&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fy_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fz&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fz_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To check that our gradient function works well, let us visualize the
vector field it creates. To avoid displaying too many vectors, we will
evaluate the gradient only along a cut for X=50, and every three points
on our grid:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;Vx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Vy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Vz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gradient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quiver3d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                     &lt;span class="n"&gt;Vx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Vy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Vz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;scale_factor&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;a class="reference external image-reference" href="attachments/gradient.jpg"&gt;&lt;img alt="" src="attachments/gradient.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;Now we can use scipy to integrate the trajectories. We first have to
define a dynamical flow, the function that returns the derivative of the
different parameters as a function of these parameters and of time. The
flow is used by every &lt;a class="reference external" href="http://en.wikipedia.org/wiki/Ordinary_differential_equation"&gt;ODE&lt;/a&gt; (ordinary differential equation) solver, it
give the dynamic of the system. The dynamics we are interested in is
made of the force deriving from the potential, that we shake with time
in the three direction, as well as a damping force. The damping
coefficient and the amount and frequency of shaking have been tuned to
give an interesting dynamic.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot; The dynamical flow of the system &amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;
    &lt;span class="n"&gt;fx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gradient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;.2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;.2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;.2&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;vx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fx&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;vx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fy&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;vy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;fz&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;vz&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can integrate the trajectory:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;scipy.integrate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;odeint&lt;/span&gt;

&lt;span class="c1"&gt;# Initial conditions&lt;/span&gt;
&lt;span class="n"&gt;R0&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Times at which we want the integrator to return the positions:&lt;/span&gt;
&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linspace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;R&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;odeint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;R0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And we can now plot the trajectories, after removing the cut plane and
the vector field by right-clicking on the corresponding pipeline node
and selecting delete. We also turn the first color bar off in the
corresponding Colors and legends node. We plot the trajectories with an
extra scalar information attached to it, to display the time via the
colormap:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;
&lt;span class="n"&gt;trajectory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot3d&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;colormap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;hot&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;tube_radius&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlab&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;colorbar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;trajectory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;Time&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orientation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;vertical&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;a class="reference external image-reference" href="attachments/trajectories.jpg"&gt;&lt;img alt="" src="attachments/trajectories.jpg" /&gt;&lt;/a&gt;
&lt;p&gt;If I have time, I’ll show later how some of the operations we have done
with numpy can be done with VTK and Mayavi. This will give us control of
these operation via widgets and thus more interativity.&lt;/p&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="mayavi"></category><category term="scientific computing"></category></entry><entry><title>Numpy documentation editor</title><link href="https://gael-varoquaux.info/programming/numpy-documentation-editor.html" rel="alternate"></link><published>2008-10-27T00:50:00+01:00</published><updated>2008-10-27T00:50:00+01:00</updated><author><name>Gaël Varoquaux</name></author><id>tag:gael-varoquaux.info,2008-10-27:/programming/numpy-documentation-editor.html</id><summary type="html">&lt;p&gt;Pauli Virtanen and myself have finally finished transfering the numpy
documentation editor to &lt;a class="reference external" href="http://docs.scipy.org"&gt;http://docs.scipy.org&lt;/a&gt;. The documentation editor
is a project that has been mainly championed by Pauli. It allows you to
edit in a wiki-like fashion the documentation for numpy, including the
docstring. The changes are reviewed …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Pauli Virtanen and myself have finally finished transfering the numpy
documentation editor to &lt;a class="reference external" href="http://docs.scipy.org"&gt;http://docs.scipy.org&lt;/a&gt;. The documentation editor
is a project that has been mainly championed by Pauli. It allows you to
edit in a wiki-like fashion the documentation for numpy, including the
docstring. The changes are reviewed by editors, and eventually merged in
the numpy svn. As a result, they are shipped with numpy and end up on
everybody’s install of numpy.&lt;/p&gt;
&lt;p&gt;the documentation editor has been deployed during the summer on my
girlfriend’s hosted server, but we where afraid it wouldn’t scale there
(and beside using my girlfriend’s server was not ideal). The
contributions made throught the web portal have already helped improve
the numpy documentation tremendously. It is a pleasure to look at the
docstring of a function and find it actually helpful. Now that it is
hosted on the main scipy servers, we are no longer afraid of making as
much publicity as possible around it. So please, go straight to
&lt;a class="reference external" href="http://docs.scipy.org"&gt;http://docs.scipy.org&lt;/a&gt; and start improving the docs. More seriously, when
you think a feature is poorly documented, when you have faught for a few
hours to understand how a function works, improve the docs, it is very
easy, and if everybody does this, you’ll save time too.&lt;/p&gt;
&lt;p&gt;In the long run we would like to get scipy itself under the same
mechanism, and I would love to open the service to other major Python
scientific computing librairies that form the scipy ecosystem.&lt;/p&gt;
</content><category term="programming"></category><category term="scipy"></category><category term="publishing"></category></entry></feed>