Tag: Tika

TikaOnDotnet now supports Tika 1.7

February 6, 2015 TikaOnDotNet has been updated to the latest Tika 1.7. As always a Nuget has been published. Two things were great about this release. Jason Rotello did the heavy lifting grabbing an issue I had marked up for grabs and got it done. This is his second time updating the libray. Thank you very much Jason! It fixes a known problem where indexing mp4 video files would cause the file to be locked. Great way to end the week.

TikaOnDotNet 1.4 Released as a Nuget

July 12, 2013 A while back I shared a post about how we were successfully able to use the excellent , yet Java based Tika text extraction library in our .NET based applications.  Along side that post I also created a GitHub repo where the code for TikaOnDotNet lives and is maintained. At Dovetail Software we continue to use this library with great success on the .NET platform. Externally I’ve gotten responses from people using the ideas in the project but often they have problems creating their own release or getting Tika up and running in their projects.  Today I gave the project some love and some polish and updated it to support being consumed as a Nuget package which makes it really easy to use from your code base. Let’s take a look at how to use Tika in your .Net projects.…

Using the Tika Java Library In Your .Net Application With IKVM

July 2, 2010 Update   I've created a project page for TikaOnDotNet on github. Tika On DotNet   This may sound scary and heretical but did you know it is possible to leverage Java libraries from .Net applications with no TCP sockets or web services getting caught in the crossfire? Let me introduce you to IKVM, which is frankly magic:   IKVM.NET is an implementation of Java for Mono and the Microsoft .NET Framework. It includes the following components: A Java Virtual Machine implemented in .NET A .NET implementation of the Java class libraries Tools that enable Java and .NET interoperability   Using IKVM we have been able to successfully integrate our Dovetail Seeker search application with the Tika text extraction library implemented in Java. With Tika we can easily pull text out of rich documents from many supported formats. Why Tika?  Because there is nothing comparable in the .Net world as…