photo Davy Weissenbacher

About Me...

Postdoctoral Fellow in the DIEGO Lab, Arizona State University
Department of Biomedical Informatics
Mayo Clinic, Samuel C. Johnson Research Bldg
13212 East Shea Boulevard, Scottsdale, AZ 85259
USA

Associate Member of the RCLN team, LIPN, Université Paris XIII

Research Interest

My main interest is the quality of the annotations produced by NLP systems. My PhD thesis puts in evidence that, to date, no NLP system is able to produce automatically perfect annotations. Consequently, it is important to design NLP systems based on inference models dealing with uncertain information. During my first PostDoc I worked in close interaction with users from different domains. This was an opportunity to evaluate the usability of current NLP approaches according to the user's point of view. It seems that there is a certain threshold beyond which users will regard the output of an NLP system as reliable, and that current systems have not yet reached that point. This is particularly true for systems which produce semantic information (e.g. Anaphora Resolution or semantic frames extraction). Their use can even be obtrusive if they present noisy and distracting information to the user. I have been recently working on the problem of structured prediction with graphical models and constrained conditional models. These Machine Learning techniques predict jointly values of several random variables along with their relations. In this expressive framework linguistic constraints are easily expressed and integrated in the inference model to remove likely but inadequate solutions. I'm currently applying these techniques on the task of geographical relation extraction from medical texts to help phylogeography studies.

Education

Projects Participation

Publications

  • Tasnia Tahsin, Davy Weissenbacher, Robert Rivera, Rachel Beard, Mari Firago, Garrick Wallstrom, Matthew Scotch, Graciela Gonzalez. 2016. "A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records", Journal of the American Medical Informatics Association (JAMIA) (.pdf)
  • Davy Weissenbacher, Tahsin Tasnia, Beard Rachel, Figaro Mari, Rivera Robert, Scotch Matthew, Gonzalez Graciela. 2015. "Knowledge-driven geospatial location resolution for phylogeographic models of virus migration", Bioinformatics 2015 31 (12): i348-i356 (ISMB/ECCB'15) (.pdf)
  • Davy Weissenbacher and Adeline Nazarenko. 2011. "Comprendre les effets des erreurs d'annotations des plates-formes de TAL", Traitement Automatique des Langues. varia 52-1 pp. 161-185
  • Sophia Ananiadou, Paul Thompson, James Thomas, Tingting Mu, Sandy Oliver, Mark Rickinson, Yutaka Sasaki, Davy Weissenbacher and John McNaught. 2010. "Supporting the Education Evidence Portal via Text Mining", Philosophical Transaction of the Royal Society A. Royal Society, Vol. 368, No. 1925, pp. 3829-3844 (.pdf)
  • Weissenbacher Davy, Abeed Sarker, Tasnia Tahsin, Gonzalez Graciela, Matthew Scotch. 2016. "Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods". In Proceedings of AMIA Joint Summits on Translational Science, 2017 [long paper, to be published]
  • Weissenbacher Davy, Johnson Travis, Laura Wojtulewicz, Dueck Amylou, Locke Dona, Caselli Richard and Gonzalez Graciela. 2016. "Automatic Prediction of Linguistic Decline in Writings of Patients with Degenerative Dementia". 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics [long paper, .pdf]
  • Weissenbacher Davy, Tahsin Tasnia, Beard Rachel, Figaro Mari, Rivera Robert, Scotch Matthew and Gonzalez Graciela. 2015 "Detection and Disambiguation of Geospatial Locations for Phylogeography". 13th Annual Rocky Mountain Bioinformatics Conference.
  • Weissenbacher Davy, Tahsin Tasnia, Beard Rachel, Figaro Mari, Rivera Robert, Scotch Matthew and Gonzalez Graciela. 2015. "Knowledge-driven geospatial location resolution for phylogeographic models of virus migration". In Proceedings of International Conference on Intelligent Systems for Molecular Biology (ISMB/ECCB'15)  [long paper, Acceptance rate: 17.4%]
  • Weissenbacher Davy and Raymond Christian. 2015. "Tree-Structured Named Entities Extraction from Competing Speech Transcriptions". In Proceedings of International Conference on Application of Natural Language to Information Systems (NLDB'15)  [long paper, Acceptance rate: 18%]
  • Scotch Matthew, Rivera Robert, Tahsin Tasnia, Beard Rachel, Firago Mari, Weissenbacher Davy, Wallstrom Garrick and Graciela Gonzalez. 2014. "A Pipeline for Virus Phylogeography that Accounts for Geospatial Observation Error". 12th Annual Rocky Mountain Bioinformatics Conference.
  • Weissenbacher Davy and Sasaki Yutaka. 2013. "Which Factors Contributes to Resolving Coreference Chains with Bayesian Networks?". In Proceedings of Conference on Intelligent Text Processing and Computational Linguistics (CICLing'13) (.pdf) [long paper]
  • Sasaki Yutaka and Weissenbacher Davy. 2013. "Large-Scale Hierarchical Text Classification for LSHTC3 Data". Annual Meeting of the Association for Natural Language Processing  [short paper]
  • Weissenbacher Davy and Nazarenko Adeline. 2007. "A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem". In Proceedings of the Recent Advances in Natural Language Processing (RANLP'07) (.pdf) [Poster]
  • Weissenbacher Davy and Nazarenko Adeline. 2007. "A bayesian classifier for the recognition of the impersonal occurrences of the it pronoun". In Proceedings of the 6th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC'07), pp.145-150 (.pdf) [long paper]
  • Weissenbacher Davy. 2006. "Bayesian Network, a model for NLP?". In Proceedings EACL'06 Trento Italie , pp.195-198 (.pdf) [Poster, Acceptance rate: 39%]
  • Corpus(.tar.gz)
  • Sharma A., Weissenbacher D., Baral C. and Gonzalez G. 2015. "Generating Semantic Graphs from Image Descriptions for Alzheimer's Disease Detection". 3rd Coherence of Discourse Workshop
  • Sarker A., Nikfarjam A., Weissenbacher D., Gonzalez G. 2015 "DIEGOLab: An Approach for Message-level Sentiment Classification in Twitter". In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), p°510-514.
  • Tahsin T., Beard R., Rivera R., Lauder R., Weissenbacher D., Wallstrom G., Scotch M., Gonzalez G. 2014. "Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses". In Proceedings of the 2014 Workshop on Biomedical Natural Language Processing (BioNLP 2014), p°1-9
  • Sasaki Y., Weissenbacher D. 2012 "TTI's System for the LSHTC3 Challenge". Proceedings of LSHTC3: ECML/PKDD - PASCAL Discovery Challenge Workshop on Large-Scale Hierarchical Classification [long paper, To be published]
  • Sasaki Y., Ishihara K., Yamamoto Y., Weissenbacher D. 2010. "TTI's Systems for 2010 i2b2/VA Challenge" i2b2 Workshop 2010 [long paper]
  • Aubin S., Deriviere J., Hamon T., Nazarenko A., Poibeau T., Weissenbacher D. 2006. "A robust linguistic infrastructure for efficient web content analysis: the ALVIS project". Symposium on Digital Semantic Content across Cultures.
  • Weissenbacher, Davy. 2005. "A Bayesian Network for the resolution of non-anaphoric pronoun it". Workshop on Bayesian Methods for NLP, Neural Information Processing System (NIPS'05)(.pdf)
  • Erick Alphonse, Sophie Aubin, Philippe Bessières, Gilles Bisson, Thierry Hamon, sandrine Laguarigue, Adeline Nazarenko, Alain-Pierre Manine, Claire Nédellec, Mohamed Ould Abdel Vetah, Thierry Poibeau et Davy Weissenbacher. 2004. "Event-based Information Extraction for the Biomedical Domain: the Caderige Project", Proceedings of the International Workshop on Natural language Processing in Biomedicine and its Applications (JNLPBA) pp. 43-49 (.pdf)
  • Weissenbacher D., Pieri E., Ananiadou S., Rea B., Vis F., Lin Y., Procter R., Halfpenny P. 2009. "ASSIST: un moteur de recherche spécialisé pour l'analyse des cadres d'expériences". Proceedings of Traitement Automatique des Langues Naturelles (TALN'09). (.pdf)[Demonstration]
  • Weissenbacher, Davy. and Nazarenko, Adeline. 2007. "Identifier les pronoms anaphoriques et trouver leurs antécédents: l'intérèt de la classification bayésienne". Proceedings of Traitement Automatique des Langues Naturelles (TALN'07). (.pdf)[long paper]
  • Alphonse E., Aubin S., Bessières P., Bisson G., Hamon T., Lagarrigue S., Nazarenko A., Manine A-P.,Nédellec C., Ould Abdel Vetah M., Poibeau T., Weissenbacher D. 2004 "Extraction d'Information appliqué au domaine biomédical". Proceedings of CIFT pp. 7-20 (.pdf)[long paper]
  • Weissenbacher, Davy. 2004. "La relation de synonymie en génomique". In Actes RECITAL Fes, Maroc pp. 298-303 (.pdf) [Poster]
  • Weissenbacher, Davy. 2007. "Les réseaux bayésiens: un formalisme adapté au traitement automatique des langues?". Revue d'Intelligence Artificielle (RIA), numéro spécial "Modèles Graphiques Probabilistes", pp.371-389 [Acceptance rate:50%]
  • Aubin S., Deriviere J., Hamon T., Nazarenko A., Poibeau T., Weissenbacher D. 2007. "Une infrastructure pour l'annotation linguistique de documents issus du web: le projet ALVIS". Revue Nouvelle des Technologies de l'information d'intelligence Artificielle (RNTI).
  • Rea B., Weissenbacher D., Sasaki Y., Thomas J., and Ananiadou S., "ASSIST: Education Evidence Portal", UK e-Science All Hands Meeting 2009, Oxford, 7-9 Dec. 2009.
  • Weissenbacher D., Rea B., Ananiadou S., "Text Mining: beyond the CAQDAS tools?" Paper presented at the panel on Innovations in Methods in Media and Communication Studies at the Media, Communication and Cultural Studies Association (MeCCSA) 2009, Bradford [Short paper]
  • Weissenbacher D., Rea B., Ananiadou S., "Are the CAQDAS and the Text Mining Software Competitors?" Fourth International Conference on Interdisciplinary Social Sciences 2009, Athens [Abstract]
  • Ananiadou S., Weissenbacher D., Rea B., Pieri E., Lin Y., Vis F., Procter R., Halfpenny P., "Supporting Frame Analysis using Text Mining". 5th International Conference on e-Social Science 2009, Cologne [long paper]
  • Weissenbacher, Davy. 2008. "Effects of imperfect annotations on Natural Language Processing systems, an applicative case study: the pronominal anaphora resolution". PhD Thesis, Paris XIII. Under the supervision of the Professors Christophe Fouqueré and Adeline Nazarenko . (Thesis.pdf, Abstract.pdf)
  • Weissenbacher, Davy. 2003. "Etude et reconnaissance automatique des relations de synonymie et de renommage dans les textes de génomique". Master's Thesis Paris XIII (.doc)
  • Project ASSIST:
    • D. Weissenbacher et al. 2009. Final report on ASSIST (.doc)
  • Project ALVIS:
    • A. Nazarenko et al. 2007. Final report on NLP analysis and normalization. Deliverable D5.3 ALVIS
    • A. Nazarenko et al. 2007. Complete document processing prototype. Deliverable D5.4 ALVIS
    • J. Deriviére et al. 2006. Report on NLP normalization options for IR (plateform conception). Deliverable D5.2 ALVIS
    • E. Alphonse et al. 2005. Report on method and language for the production of the augmented document representations. Deliverable D5.1 ALVIS
    • C. Nédellec et al. 2006. Prototype and documents for learning and integration of named entities and terminology. Deliverable D6.3 ALVIS
    • E. Alphonse et al. 2004. Requirements for integration of WP6 results into WP5 normalization and representation tasks and into WP9 query refinement task. Deliverable D6.2 ALVIS

Teaching

2015-2016
  • Biomedical Informatics, M1 Students (4th year)
    • Course & Pratical work: Foundations of Biomedical Informatics Methods II, NLP & Database Modules (29h)
  • Biomedical Informatics, M2/PhD Students
    • Course: Software Engineering, Problem solving in Biomedical Informatics (29h)
2014-2015
  • Biomedical Informatics, M2/PhD Students
    • Course: Natural Language Processing Methods in Biomedical Text Mining (9h, Co-Teaching with Pr. Graciela Gonzalez)
  • Biomedical Informatics, M1 Students (4th year)
    • Course & Pratical work: Foundations of Biomedical Informatics Methods II, NLP Module (13h)
2010-2011
  • Licence Physics, L3 students (3rd year)
    • Course & Tutorial classes: Programming in Java (10h)
2006-2007
  • Master Mathematics-Computing Science, M1 students (4th year)
    • Course & Tutorial classes: Programming in C under Linux (18h)
  • Licence Science and Communication, L3 students (3rd year)
    • Course & Tutorial classes: Knowledge representation (39h)
  • Licence Computing Science, L2 students (2nd year)
    • Tutorial classes & Pratical work: Programming in Caml (39h)
2005-2006
  • Master Mathematics-Computing Science, M1 students (4th year)
    • Course & Tutorial classes: Programming in C under Linux (18h)
  • Licence Science and Communication, L3 students (3rd year)
    • Course & Tutorial classes: Knowledge representation (39h)
  • Master Computing Science, M1 students (4th year)
    • Supervision of project management (8h)
2004-2005
  • Licence Mathematics, L1 students(1st year)
    • Tutorial classes & Pratical work: Imperative programming in C (30h)
    • Supervision of multiple C programming projects (19.5h)
  • Licence Mathematics, L1 students (1st year)
    • Drawing a business plan (19.5h)
2003-2004
  • DEUG MIAS, L1 students(1st year)
    • Tutorial classes & Pratical work: Imperative programming in C (69h)
    • Supervision of multiple C programming projects

Other Activities

Software