01 Jul

PhD proposal on material-science modeling with Python

Philippe Baucour from the ‘Unniversité de Franche Comté’ sent me an email saying that he was looking for the rare PhD candidate that would be able to do numerical modeling and material science on top of high-quality Python coding. I can sympathise with this quest: it is very hard to find someone who codes well, and if you want on top of that him to be able to do numerical modeling!

If you are not afraid of French, his PhD proposal is below. Contact him for more information. Please don’t contact me, I am drowning under (very interesting) e-mail.

Utilisation du calcul parallèle pour la modélisation fractale d’un stack de type PEMFC.

Proposition d’allocation de thèse
Thème 1.g : Modélisation, simulation et calcul haute performance
Thème 2.a : Energie, procédés, impacts environnementaux, stockage de l’énergie

Responsables au sein du département ENISYS-FEMTO-ST équipe Modélisation
D. Hissel, M.C. Péra, R. Glises, Ph. Baucour.

Les phénomènes qui prennent place au sein d’un stack de type PEMFC sont de nature multi-physiques et multi-échelles. Ainsi le comportement d’un stack complet ne peut être appréhendé dans sa globalité que s’il on intègre des domaines tout aussi différents que :

  • Les phénomènes électriques et électrochimiques.
  • La mécanique des fluides,
  • Les phénomènes de transferts de chaleur et de matières,

L’ensemble de ces disciplines interagissent à des niveaux d’échelles complètement différents : du dépôt catalytique (i.e. ~um) au stack (i.e. ~m) soit un facteur d’échelle d’environ 106. De plus, les constantes de temps des différents phénomènes sont elles aussi très différentes et rajoutent à la complexité du problème.

Il y a énormément d’études portant sur la modélisation des piles à combustibles mais les difficultés énoncées ci-dessus amènent à faire des restrictions soit sur le domaine d’étude (une cellule), la géométrie (1D ou 2D rarement 3D) ou la représentation des phénomènes (modélisation système). De plus, la puissance de calcul nécessaire pour ce type de problème fortement couplé et non-linéaire n’est pas facilement accessible.
Le travail envisagé consiste à développer une modélisation 3D complète d’un stack à toutes les échelles à la fois de temps et d’espace. L’approche envisagée consiste à utiliser un modèle fractal qui puisse se partitionner et s’adapter à l’ensemble des échelles (temps et espaces) présentes dans un stack. La conception d’un code modulaire permettrait à terme de tester certaines hypothèses sur le fonctionnement des PEMFC. On peut citer par exemple :

  • La gestion de l’eau et de l’humidification des gaz.
  • Le démarrage à froid.
  • Le fonctionnement en mode dégradé.
  • Le design des canaux d’alimentation en gaz.
  • Étude de la durabilité et de la fiabilité par un cyclage numérique.

Le laboratoire (ENISYS) dispose depuis peu d’un cluster de calcul qui permet d’envisager un modèle complet. Il est composé de 8 noeuds de calcul comportant un total de 64 processeurs pour 64 Go de mémoire et un espace disque de 1 To.
L’objectif de la thèse serait de développer un code parallèle qui permettrait de distribuer sur les 64 coeurs un modèle complet. Ce modèle peut s’envisager comme l’agrégation de modèles à différentes échelles :

  • Modèle d’Assemblage Membrane Electrode
  • Modèle d’écoulement non-conservatif dans un canal (déjà développé)
  • Modèle de comportement thermique des plaques bipolaires
  • Modèle de comportement électrique

Ces modèles relativement simples individuellement seront regroupés afin de former un modèle complet. La difficulté consiste à agréger les différents calculs à la fois en terme de temps et d’espaces, on parle alors de spatial computing ou de parallel computing si l’on distribue un problème complexe sur plusieurs processeurs. Dans le cas de la modélisation d’un stack PEMFC, le spatial computing est envisageable pour les différents domaines d’espaces mais il faudra recourir au parallel computing pour combiner l’ensemble des modèles et s’assurer de la convergence.
Cahier des charges de l’étude :
•   Définition du stack étudié en se calquant sur les données expérimentales disponibles.
•   Développement des codes de calcul en s’assurant de la compatibilité avec un fonctionnement dans un cluster.
•   Développement d’un modèle maître faisant la collecte des différents modèles.
•   Définition du partitionnement spatial et temporel.
•   Validation sur des données expérimentales disponibles au laboratoire.
Matériel et logiciel envisagé :

  • Utilisation du cluster sur une base de 32 processeurs en utilisation récurrente et 64 processeurs en utilisation intensive
  • Programmation en Python des codes individuels et du code maître en utilisant au mieux les bibliothèques de calcul scientifique (Scipy, Numpy, FiPy, PyPar). L’utilisation d’un code propriétaire entraînerait un surcoût exorbitant en termes de licences (64 licences Matlab par exemple !)
  • La parallélisation se fera par l’utilisation du MPI (Message Passing Interface) implémenté en Python.
  • L’utilisation d’une solution de parallélisation est envisageable à travers l’utilisation de Ipython.

Contact:

                                                                 Dpt-ENISYS
                                                        Energie, Ingénierie des Systèmes
                                                                   multiphysiques
                                                                        Daniel Hissel
                                                        Techn­Hom, 90010 Belfort Cedex, FRANCE
                                                                 Phone : 33 (0) 3 84 58 36 21
                                                                  Fax : 33 (0) 3 84 22 27 22
                                                              @ : danieL.hissel@univ-fcomte.fr
   Franche-Comté Electronique Mécanique Thermique et Optique - Sciences et Technologies
                                                   UMR CNRS 6174
Contact : Monsieur Daniel Hissel
Chef d'équipe Modélisation
01 Jul

Mumbles on object-oriented designs: framework objects and data containers

I recently sent on a mailing list a few thoughts object-oriented design, so I might as well also be ridiculous on my blog.

I find that in object oriented design, there are two kinds of objects:

  • A first kind is the object encoding logics. This is an object for which clever and complex design will hold together the logics of a state-full application. It can often be part of a forest of objects that are linked together via design patterns. The interfaces of these objects are driven by their active role in the application. These objects are prominently present in interactive application and interactive application. They are mostly particular to an application or a framework, and are mostly implementation-defined.
  • The second type of object is a data container. It strives to expose a data model that can be of use in various situations, as it is the link between different parts of the code that do not talk to each other apart through data. It is responsible for loose coupling (something that is very important to achieve a maintainable code base) by having a light and shallow interface. It must be interfaced-designed, rather than implementation-designed. One should very easily get a grasp, an almost physical feeling, for the object by simple interaction with it. I have what I call the ‘explaining test’ for these objects: can I explain fully and completely to somebody what the object does, and any possible caveat, without being sidetracked into special discussions? If not, back to the drawing board: the object will not gain acceptance. In my experience, only the objects of the second kind can easily be shared between different projects.
27 Jun

SciPy abstract submission deadline extended

Greetings,

The conference committee is extending the deadline for abstract
submission for the Scipy conference 2009 one week.

On Friday July 3th, at midnight Pacific, we will turn off the abstract
submission on the conference site. Up to then, you can modify the
already-submitted abstract, or submit new abstracts.

The SciPy 2009 executive committee

  • Jarrod Millman, UC Berkeley, USA (Conference Chair)
  • Gaël Varoquaux, INRIA Saclay, France (Program Co-Chair)
  • Stéfan van der Walt, University of Stellenbosch, South Africa (Program Co-Chair)
  • Fernando Pérez, UC Berkeley, USA (Tutorial Chair)
19 Jun

SciPy 2009 conference opened up for registration

We are finally opening the registration for the SciPy 2009 conference. It took us time, but the reason  is that we made careful budget estimations to bring the registration cost down.

We are very happy to announce that this year registration to the conference will be only $150, tutorial $100, and students get half price! We made this effort because we hope it will open up the conference to more people, especially students that often have to finance such trip with little budget. As a consequence, however, catering at noon is not included.

This does not mean that we are getting a reduced conference. Quite on the contrary, this year we have two keynote speakers. And what speakers: Peter Norvig and Jon Guyer! Peter Norvig is the director of research at Google and Jon Guyer is a research scientist at NIST, in the Thermodynamics and Kinetics Group, where he leads a fiPy, a finite element project in Python.

The SciPy 2009 Conference

SciPy 2009, the 8th Python in Science conference, will be held from August 18-23, 2009 at Caltech in Pasadena, CA, USA.

Each year SciPy attracts leading figures in research and scientific software development with Python from a wide range of scientific and engineering disciplines. The focus of the conference is both on scientific libraries and tools developed with Python and on scientific or engineering achievements using Python.

Call for Papers

We welcome contributions from the industry as well as the academic world. Indeed, industrial research and development as well academic research face the challenge of mastering IT tools for exploration, modeling and analysis.

We look forward to hearing your recent breakthroughs using Python! Please read the full call for papers.

Important Dates

  • Friday, June 26: Abstracts Due
  • Saturday, July 4: Announce accepted talks, post schedule
  • Friday, July 10: Early Registration ends
  • Tuesday-Wednesday, August 18-19: Tutorials
  • Thursday-Friday, August 20-21: Conference
  • Saturday-Sunday, August 22-23: Sprints
  • Friday, September 4: Papers for proceedings due

The SciPy 2009 executive committee

  • Jarrod Millman, UC Berkeley, USA (Conference Chair)
  • Gaël Varoquaux, INRIA Saclay, France (Program Co-Chair)
  • Stéfan van der Walt, University of Stellenbosch, South Africa (Program Co-Chair)
  • Fernando Pérez, UC Berkeley, USA (Tutorial Chair)

Update: I correct the typo in the original blog post: the sprints are free, the tutorial are $100.

14 Jun

Fuzzy on OOP and the French

Fantastic:

Haha - I shake my fuzzywuzzy beard at you in bewilderment. Do you people dislike OOP, the class statement is mere boilerplate to you, I mumble incoherent French obscenities in your general direction. (Did you know the French acronym for object-oriented programming is POO?).
07 Jun

Job offering for junior Python developer

Our lab is seeking to hire an engineer to work on porting our machine learning code to the scikit learn, adding tests and documentation and packaging it.

We are looking for someone motivated by quality in software and open source. No prior scientific computing experience is required. You will be working in a highly stimulating research environment (Neurospin), near Paris and employed by the French research institute in computer science and applied math (INRIA), a prestigious institution.

Neurospin is a research institute dedicated to the understanding of the brain. You will be working with computer-assisted neurology laboratory, the image-analysis and branch of Neurospin, in the small ‘Parietal’ INRIA team embedded in NeuroSpin and dedicated to statistical modeling.

Over the years, the lab has developed a set of tools for machine learning and statistical analysis in Python (with some C). There are some tools for this purpose available in the open-source world (BSD-licensed) in the scikit learn. We want to extract the good and unique parts of our internal library, and release it in the open source world through the scikit learn. Our code is fully Python code, using scipy and matlab, with some bindings to R. As we want the code to be BSD-licensed, we will remove the bindings with R, and replace when possible. The job does not involve developing new algorithms, but testing, improving, and documenting the existing one. There is a big quality assurance work to be done. The code needs to be put to the right coding standards; APIs should be cleaned; tests added. Dead code should be delete. There is some optimization work to be done. Also, if there is any duplicated funcitonnality with the scikit learn, you should analyse both code and determine which one to code. The job also involves working with the community, documentating the code, and releasing the project, including binary packages. And finally, all the original authors of the algorithms, and experts in the field, are in the lab. So you will be able to learn from them and pester them if there is a problem with the code.

In one word, this is about transforming an internal project, into a leading open source project that will rock and live on!

The job description is available here.

There are to caveats: first it is a 2 year position. Second, you need to have graduated recently (how recently I don’t know exactly, but I will inquire).

If you are interested, or just want to ask questions, don’t hesitate to send me an e-mail, I am _really_ looking forward to collaborate with someone motivated on this project.

UPDATE: I have more details on the restrictions of the job offering: you need to have graduated in 2008 or 2009. This is a very hard restriction, and I am recieving many excellent CVs that I even consider because of this restriction. I am sorry, I cannot do anything about it.

16 May

Pycon FR: presentations and tutorials

May 30th and 31st the French Python conference, Pycon FR, will be held at ‘la citée des sciences’, la Villette, in Paris.

The first day, I will be giving a one-hour-long tutorial (in French) on numpy, scipy, and all the Python for Science jazz. On the following day, I will be giving a half-hour-long talk to ilustrate the use of Python in my current work: statistical analysis and modelling of brain activity.

I’ll be giving my tutorial in one room, while David Larlet (the famous Biologeek) will be giving one on Django in another room. Tough competition :-P .

The program of the conference is very eclectic, ranging from general programming talks, to GUIs or web development. While this might deter the pure scientific computing folks, I strongly encourage you to attend. Indeed, a lot of the development, packaging, quality assurance, … problems encountered in scientific computing are universal in computing.

You might think that you are only interested in writing algorithms,or processing data, but this code will have to live on. My experience is that it is terribly hard to have code in a lab that can be somewhat shared and live on when people move away to another lab, or stop having time to maintain the code. Talks like

can probably be of some use.

Also, don’t underestimate the fact that some other communities might have solved some of the issues you struggle with. When dealing with real-world problems, and not only developing algorithms on a few set of test data, a large fraction of the code lines and related to IO, interfaces, data massaging… Two years ago, I remember that I was not terribly interested in the web-development talks. I tried to be open-minded and listen to them, but… Now I have done a bit of web development myself, and I have played with some of the famous ‘web frameworks’. I can tell you, there are some really interesting concepts there. The web guys have managed to extract a set of patterns from the problems they face and provide excellent abstracts to data handling and display. Can we learn from them? I am especially interested in getting more insight from things like ORMs (object relational mappers), and understanding better the web frameworks:

And finally, one more reason to come: it is so nice to actually get to meet in real life people, and have a chat.

So, see you there, for those who live in France.

10 May

Minimum spanning tree

Gary Ruben came up with the excellent idea of visualizing the minimum spanning tree of a Delaunay tesselation in addition to Delaunay tessalation itself. After he sent me his code, I spent some times playing with it, because I found out that, with the right choice of visualization parameter, it gave me a nice understanding of what a minimum spanning tree was: a tree structure of minimal total length connecting all the vertices of the graphs, and embedded in the graph. On the visualization, the Delaunay graph is displayed in grey, and the minimum spanning tree in thick and colors.

Minimmum spanning tree

The minimum spanning tree is calculated using Prim’s algorithms, on the fullly-connected distance-weighted graph of all points. One can clearly see that is it embedded in the Delaunay graph. In fact I have tested that calculating a minimum spanning tree on the Delaunay graph, or on the complete graph, gave the same result.

The code to create this picture can be found here.

01 May

Extracting the data from the Delaunay triangulation

Gary Ruben just asked me if it was possible to retrieve the triangulation information from my previous Delaunay example. Actually the reason I came up with this example is that Emanuelle Gouillart, my partner[*], needed to do Delaunay triangulation on some data. She was kind enough to extract that code from her code base. Here it is.

[*] The various languages do not seem to have evolved quickly enough to cope with the fact that people can now have a stable long-term relationship with someone you are not married to. What word should I be using here: ‘girlfriend’, ‘partner’… ?

27 Apr

Mayavi image of the … month

Tonight I sat down and played a bit with VTK’s Delaunay tessalation filter. I wanted to inspect the local structure of a graph created by Delaunay tessalation of random points. To see better the structure, I selected a slab of the resulting unstructured grid. I think the image is not only instructive to explain what a Delaunay tessalation is, it also looks pretty cool. Here is the image and the Mayavi script that creates it.

Delaunay interpolation