Pablo Mendes, PhD Student

Text Mining and Information Extraction
Much of our current knowledge of diseases is buried under the vast amount of published scientific literature. Biomedical researchers need an efficient tool for discovering how things in our body relate to each other. The inherent ambiguity of natural language makes it a very hard task to automatically connect all of this information and promote knowledge discovery. We investigate the use of domain models to guide the extraction of such relationships from text.

Data Exploration on the Web: Unburying your data
We investigate how domain models and semantic web technologies can aid the dynamic exploration of the network of associations in databases and/or in text. The role of domain models in this research (e.g. RDF schemas) is to tell you expected/predicted/intensional structure of the underlying information. That can help guide your exploration. As results of this research, we have developed a few open source tools that are reusable for many use cases.


DBpedia Spotlight - Shedding Light on the Web of Documents
DBpedia Spotlight recognizes 3.8M things (of 380 types) and associates them to a unique identifier from DBpedia.
[ project page ]
Twarql - Streaming Annotated Tweets through SPARQL
Twarql encodes information from microblog posts as Linked Open Data in order to enable flexibility for those interested in collectively analyzing microblog data for sensemaking.
[ project page ]
Scooner - Semantically Connecting Objects through Named Entities and Relationships
We spot ontology terms in text and allow the user to navigate literature by following Semantic Trails as a series of simple and complex relationships connecting entities in documents across a corpus.
I have worked as a developer for the semantic browser, and architected our framework for data exploration. Now I'm concerned with large scale information extraction and navigation models for text exploration.
Keywords: Named Entity Recognition, Relationship Extraction, Navigation Models, Semantic Browsers, Exploratory Search.
[ project page ]

Twitris - Twitter through Space, Theme and Time.
Created by Karthik Gomadam, Meena Nagarajan et alli, this project gathers social signals from tweets and facilitates the collective analysis through spatiotemporal interfaces.
I have been working on large scale information extraction from tweets (hadoop-based), as well as on the challenges imposed by real-time extraction and delivery of information for situational awareness.
[ view video demo ] [ try it out ]

Cuebee - Knowledge Driven Query Formulation
Cuebee is a flexible, extensible application for querying the semantic web. It provides a friendly interface to guide users through the process of formulating complex queries. No technical knowledge of query languages or the semantic web is required. The system composes SPARQL queries and is able to query multiple servers in the background. Cuebee also supports easy plug-and-play of SPARQL Endpoints.
[ view poster ] [ project page ]

Exparql - Exploring SPARQL query results in visual interfaces
Exparql is a sister project to Cuebee that composes a flexible, extensible toolkit for visualizing and exploring SPARQL Endpoints. It provides a set of javascript widgets that can display and interact with any dataset exposed as a SPARQL Endpoint. [ project page ]

Semantic Browser - Browse Pubmed through UMLS connected MeSH terms
We spot ontology terms in text and allow the user to navigate literature by following Semantic Trails as a series of simple and complex relationships connecting entities in documents across a corpus.
[ view video demo ]

TcruziKB - A parasite knowledge base
The association of experimental data with domain knowledge expressed in ontologies facilitates information aggregation, meaningful querying and knowledge discovery to aid in the process of analyzing the extensive amount of interconnected data available for genome projects. TcruziKB is an ontology-based system to describe and provide access to the data available for the project TcruziDB, a genome database for the parasitic agent Trypanosoma cruzi. TcruziKB uses Cuebee as its query interface and D2R Server as the relational-to-RDF mapper.
[ view video demo ]

Cuadro - A Semantic Photo Sharing Community
Cuadro helps the user to describe the meaning of is or her photo contents, allowing for organizing and searching pictures based on the meaning of its contents, not only on filenames or contextless text tags.



Joachim Daiber, Max Jakob, Chris Hokamp, Pablo N. Mendes: Improving efficiency and accuracy in multilingual entity extraction. I-SEMANTICS 2013: 121-124
Pablo N. Mendes, Dirk Weissenborn, Chris Hokamp: DBpedia Spotlight at the MSM2013 Challenge. #MSM 2013: 57-61
Sebastian Hellmann, Agata Filipowska, Caroline Barrire, Pablo N. Mendes, Dimitris Kontokostas: NLP & DBpedia An Upward Knowledge Acquisition Spiral. NLP-DBPEDIA@ISWC 2013
Sebastian Hellmann, Agata Filipowska, Caroline Barrire, Pablo N. Mendes, Dimitris Kontokostas (Eds.): Proceedings of the NLP & DBpedia workshop co-located with the 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, October 22, 2013. CEUR Workshop Proceedings 1064, 2013


Mendes, P.N., Mika, P., Zaragoza, H., Blanco, R. Measuring Website Similarity using an Entity-Aware Click Graph. ACM Conference on Information and Knowledge Management, CIKM 2012. Oct 29 - Nov 2, 2012. Maui, Hawaii, USA.
Auer, S., Bhmann, L., Lehmann, J., Hausenblas, M., Tramp, S., van Nuffelen, B., Mendes, P.N., Dirschl, C., Isele, R., Williams, H. and Erling, O. Managing the life-cycle of Linked Data with the LOD2 Stack. In Use Track paper at the International Semantic Web Conference, ISWC 2012. 11-15 November 2012, Boston, USA. (accepted)
Bizer, C., Mendes, P.N., Jentzsch, A. Topology of the Web of Data. Book chapter in De Virgilio, Guerra,Yannis (Eds.): Semantic Search over the Web, Springer, 2012 (to appear).
Mendes, P.N., Daiber, J., Rajapakse, R., Sasaki, F., Bizer, C. Evaluating the Impact of Phrase Recognition on Concept Tagging. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2012, 21-27 May 2012, Istanbul, Turkey.
Mendes, P.N., Jakob, M., Bizer, C. DBpedia for NLP: A Multilingual Cross-domain Knowledge Base. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2012, 21-27 May 2012, Istanbul, Turkey.
Hder, M., Mendes P.N. Round-trip semantics with Sztakipedia and DBpedia Spotlight. Demo paper at the 21st International World Wide Web Conference, WWW 2012, April 16-20, 2012, Lyon, France.
Mendes P.N., Mhleisen, H., Bizer, C. Sieve: Linked Data Quality Assessment and Fusion. Invited paper at the 1st International Workshop on Linked Web Data Management (LWDM 2011) at the 15th International Conference on Extending Database Technology, EDBT 2012, 27-30 March 2012, Berlin, Germany.


Mendes P.N., Daiber, J., Jakob, M., Bizer, C. Evaluating DBpedia Spotlight for the TAC-KBP Entity Linking Task. Proceedings of the Text Analysis Conference, TAC 2011. 14-15 November 2011, Gaithersburg, Maryland USA.
Cameron D., Kavuluru R., Bodenreider O., Mendes P.N., Sheth A.P., Thirunarayan K. Semantic Predications for Complex Information Needs in Biomedical Literature. 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011, Hong Kong, China, 12-15 November 2011, Atlanta, GA. (acceptance rate 19.4%) [slides]
Mendes P.N., Jakob M., Garca-Silva A., Bizer C.DBpedia Spotlight: Shedding Light onthe WebofDocuments. Inthe Proceedings ofthe 7th International Conference onSemantic Systems (I-Semantics 2011). Graz, Austria, September 2011. (best paper award) [slides] [DOI 10.1145/2063518.2063519]
Garca-Silva A., Jakob M., Mendes P.N., Bizer C.Multipedia: Enriching DBpedia with Multimedia information. The Sixth International Conference on Knowledge Capture (K-CAP 2011), Banff, Alberta, Canada, June 2011. [slides]
Arajo L.R., Mendes P.N., de Souza J.F.. Publishing Linked Data from Brazilian Politicians on the Web.Workshop on Semantics in Governance and Policy Modelling, Extended Semantic Web Conference 2011 ESWC 2010. May 30, 2011 - Crete, Greece. [slides]
Mendes P.N., Kapanipathi, P., Passant A." Twarql: Tapping into the Wisdom of the Crowd,". Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 1-3 September 2010. (Winner of Triplification Challenge 2010) [ paper, announcement ]
Jardim R., Mendes P.N., Davila, A.M.R., Sheth, A.P. "Semantic Web application for the repositioning of drugs for neglected diseases caused by protozoans," International Workshop on Genomic Databases (IWGD'10), Búzios, Rio de Janeiro, Brazil. August 30th to September 3rd, 2010. (selected for oral presentation in 1st place, ranked by anonymous ad hoc)
Passant A. and Mendes P.N. "sparqlPuSH: Proactive notification of data updates in RDF stores using PubSubHubbub,". SFSW2010 Scripting for Semantic Web Workshop at ESWC2010. (Winner of SFSW2010 Challenge) [ paper, video, slides ]
Mendes PN, Passant A, Kapanipathi P, Sheth AP, "Linked Open Social Signals," WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010. (acceptance rate 16.6%) [project page]
Mendes PN, Kapanipathi P, Cameron D, Sheth AP, "Dynamic Associative Relationships on the Linked Open Data Web," In Proceedings of the Web Science 2010 (WebSci'10): Society On-Line, Raleigh, North Carolina, 26 & 27 April 2010. [ paper, poster ]
Christopher J. Thomas, Wenbo Wang, Pankaj Mehra, Delroy Cameron, Pablo N. Mendes, and Amit P. Sheth. "What Goes Around Comes Around - Improving Linked Opend Data through On-Demand Model Creation." In Proceedings of the Web Science 2010 (WebSci'10): Society On-Line, Raleigh, North Carolina, 26 & 27 April 2010. [ paper ]
Cameron D, Mendes PN, Sheth AP, Chan V, "Semantics-Empowered Text Exploration for Knowledge Discovery," 48th ACM Southeast Conference, ACMSE2010, Oxford Mississippi, April 15-17, 2010. (PDF)
Ramakrishnan C, Mendes PN, Gama RATS, Ferreira GCN, Sheth AP, "Joint Extraction of Compound Entities and Relationships from Biomedical Literature," WI2008 IEEE/WIC/ACM International Conference on Web Intelligence (WI-08), Sydney Australia, Dec. 9-12, 2008.
Ramakrishnan C, Mendes PN, Wang S, Sheth AP, "Unsupervised Discovery of Compound Entities for Relationship Extraction," EKAW 2008 - 16th International Conference on Knowledge Engineering and Knowledge Management Knowledge Patterns, Acitrezza, Catania, Italy, 9-29 tp 10-3, 2008. 10.1007/978-3-540-87696-0
Mendes PN, McKnight B, Sheth AP, Kissinger JC, "TcruziKB: Enabling Complex Queries for Genomic Data Exploration," International Conference on Semantic Computing, vol. 0, no. 0, pp. 432-439, 2008 IEEE International Conference on Semantic Computing, 2008.10.1109/ICSC.2008.93
Dávila AMR, Mendes PN, Wagner G, Tschoeke DA, Cuadrat RRC, Liberman F, Matos L, Satake T, Ocaña KACS, Triana O, Cruz SMS, Jucá HCL, Cury JC, Silva FN, Geronimo GA, Ruiz M, Ruback E, Silva Jr. FP, Probst CM, Grisard EC, Krieger MA, Goldenberg S, Cavalcanti MCR, Moraes MO, Campos MLM and Mattoso M. ProtozoaDB: dynamic visualization and exploration of protozoan genomes. Nucl. Acids Res., 2007: p. gkm820.
Aguero F, Zheng W, Weatherly DB, Mendes P, Kissinger JC. TcruziDB: An integrated, post-genomics community resource for Trypanosoma cruzi. Nucleic Acids Research 34, 2006: D428-431.
Dávila AMR, Lorenzini DM, Mendes PN, Satake TS, Sousa GR, Campos LM, Mazzoni CJ, Wagner G, Pires PF, Grisard EC. GARSA: Genomic Analysis Resources for Sequence Annotation. Bioinformatics, 2005.
Guerreiro LTA, Souza SS , Wagner G, Souza EA, Mendes PN, Campos LM, Barros L, Pires PF, Campos MLM, Grisard EC, Dávila, AMR. Exploring the genome of Trypanosoma vivax through GSS and in silico comparative analysis. Omics, NY, USA, v. 9, n. 1, p. 116-128, 2005.
Dávila AMR, Berriman M, Grisard EC, Kissinger J, Hertz-Fowler C, Pires PF, Barros L, Baião F, Mattoso M, Costa V, Zaverucha G, Aggarwal G, Castro PAR, Oliveira PM, Souza SS, Bernardes JS, Lorenzini DM, Guerreiro LTA, Wagner G, Ocaña KACS, Ferreira YC, Rech DH, Mendes PN, Campos LM, Mazzoni CJ, Frohlich AAM , Steindel M, Cavalcanti MC, Campos MLM. The BiowebDB Consortium: an integrative Bioinformatics approach for comparative genomics of Kinetoplastida. In: International Conference on Bioinformatics and Computational Biology, 2004, Angra dos Reis, RJ. International Conference on Bioinformatics and Computational Biology Abstracts, 2004.
Nascimento T, Mendes PN, Dávila AMR, Campos MLM, Campos LM, Pires PF, Barros L. A comparison between GUS and CHADO schemas for Genomic Databases. In: International Conference on Bioinformatics and Computational Biology, 2004, Angra dos Reis, RJ. International Conference on Bioinformatics and Computational Biology Abstracts, 2004.


IBM Research Almaden, July 2013 to present.

  • Position: Postdoctoral Research Associate
  • Activities: Evidence-based Discovery, Text Mining and Analytics

Freie Universitaet Berlin, June 2010 to October 2012

  • Position: Research Associate
  • Group: WBSG, DBpedia
  • Supervisors: Christian Bizer
  • Activities: Work Package Leader for Planet Data, researcher in LOD2, creator of DBpedia Spotlight, founder of DBpedia Portuguese.

Knoesis Center, July 2007 to present

  • Mentoring Pavan Kapanipathi on spatio-temporal-thematic (STT) mining of social signals as part of the Twitris project.
  • Architect and former lead developer of the Semantic Browser, now in collaboration with Delroy Cameron.
  • Architect and lead developer of Cuebee, knowledge-enabled query interface used in several projects at Kno.e.sis

Yahoo! Research Barcelona, June 2009 to September 2009

Yahoo! Inc., June 2008 to September 2008

  • Position: Technical Intern
  • Group: Structured Web Search, Santa Clara, CA
  • Product: Search Monkey
  • Supervisors: Kevin Haas & Peter Mika

Large Scale Distributed Systems Laboratory (LSDIS), August 2005 to July 2007.
Activities: Semantic Web and Web Services research applied to Social Media and Bioinformatics.

Laboratory for Genomics and Bioinformatics (LGB), January 2007 to July 2007.
Achievements: Cut in half the time to run the plate submission and blasting pipelines by reengineering and automating the processes. Activities: Development and mantainance of several genome databases (Oracle, Java Swing, Perl), Systems and Network Management (Linux and Windows) and other Bioinformatics tasks in general (Reading, Cleaning, Processing, Searching, etc).

Kissinger Lab @ CTEGD, August 2005 to January 2007.
Achievements: Publication of Activities: Development and mantainance of TcruziDB and ApiDB using the GUS Framework. Programming platform: Java, Perl, Oracle, Web Services and Processes (Axis, WSDL, BPEL).

Service to the profession

I am co-organizer of the workshop series Web of Linked Entities -- WoLE2012 at ISWC2012 and WoLE2013 at WWW2013.

I have served in the program committee of several workshops, conferences and journals such as IJSWIS, Semantic Web Journal, ESWC2013, HT2013, LDOW2013, ESWC 2012, COLD2012, #MSM2012, WEKEX2012, LDL'2012 (Linked Data in Linguistics 2012, DGfS), SEMAIS'2012, SDOW'2012, COLD'2012 (external reviewer), IC'2012, #MSM2011, SDOW'2011, SDOW'2010, ISWC'2010 (external reviewer).


Spring 2007
CSCI6760 - Computer Networks 4 A-
CSCI8380 - Advanced Topics in Information Systems 4A-
CSCI8950 - Machine Learning 4 B+
Fall 2006
CSCI6470 - Algorithms 4 A-
CSCI6550 - Artificial Intelligence 4A-
CSCI6950 - Directed Study (Advisor: Dr. Kissinger) 4 A
CSCI8990 - Research Seminar 4 S
Spring 2006
CSCI8351 - Semantic Web Services 4 A
CSCI6900 - AI and The Web 4A
CSCI6950 - Directed Study (Advisor: Dr. Sheth) 4 A
Fall 2005
CSCI8350 - Semantic Web 4 A
CSCI6050 - Software Engineering 4A
CSCI6370 Database Management 4 A
