Technical Tidbit of the Day

Thursday, November 29, 2018

Sentiment Analysis pending patent published today

"Techniques for Sentiment Analysis of Data Using a Convolutional Neural Network and a Co-Occurrence Network"

http://pdfaiw.uspto.gov/.aiw?docid=20180341839&PageNum=1&IDKey=8234088170C0&HomeUrl=http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1%2526Sect2=HITOFF%2526d=PG01%2526p=1%2526u=%25252Fnetahtml%25252FPTO%25252Fsrchnum.html%2526r=1%2526f=G%2526l=50%2526s1=%25252220180341839%252522.PGNR.%2526OS=DN/20180341839%2526RS=DN/20180341839

Tuesday, July 3, 2018

Quickest inline d3.js in Jupyter

Everyone has their own way to use d3.js in Jupyter. Here is the shortest and most concise I've been able to put together.

Step 1

Download d3.js and put it into the directory ~/.jupyter/custom

Step 2

Create a file ~/.jupyter/custom/custom.js and put the following into that file (note there is no ".js" in the quoted filepath):
require.config({paths:{d3:"/custom/d3.v5.min"}})

Step 3

In a Jupyter cell:

from IPython.core.display import display, Javascript
display(Javascript('''require(['d3'], function(d3) {
var svg = d3.select(element.get(0)).append('svg').attr('width',600).attr('height',200)
svg.append('circle').attr('cx',30).attr('cy',30).attr('r',20)
})'''))

Friday, February 9, 2018

Minimal Scala Play

If you just need Scala Play for some quick testing/demo of Scala code, even the Scala Play Starter Example is too heavy. It has a lot of example code that is not needed and too much security for something to be run and accessed only locally.

Here is how to trim down the Scala Play Starter Example. First is the conf/application.conf file. All that is needed for the whole file is:

play.filters {
hosts { allowed = ["."] }
headers { contentSecurityPolicy = "default-src * 'unsafe-inline'" }
}

The hosts.allowed allows connections from external sources, and headers.contentSecurityPolicy allows things like remotely hosted Javascript (e.g. http://code.jquery.com/jquery-3.3.1.min.js) and Javascript inline directly in HTML elements (i.e. disable CSP and go back to 2016).

Then the conf/routes file:

GET / controllers.HomeController.index

GET /mywebservice controllers.MyWebServiceController.get(inputdata)

Specifically, you can delete the /count and /message routes, and then add whatever routes you need for web services (like /mywebservice above).

In the app directory:

rm -rf filters

rm -rf services

rm Module.scala

rm controllers/AyncController.scala

rm controllers/CountController.scala

rm views/main.scala.html

rm views/welcome.scala.html

And then in views/index.scala.html you can just delete all the code therein and write your own regular HTML and not bother with the Twirl template language if you don't need it.

Finally, you'll need to create controllers/MyWebServiceController.scala. You can use HomeController.scala as a template and add in import play.api.libs.json._ to gain access to the Play JSON APIs for parsing and generating JSON.

controllers/MyWebServiceController.scala

package controllers

import javax.inject._

import play.api.libs.json._

import play.api.mvc._

@Singleton

class MyWebServiceController @Inject()(cc: ControllerComponents) extends AbstractController(cc) {

def get(inputdata:String) = Action {

val a = Json.parse(inputdata)

val r = // do stuff with a

Ok(Json.toJson(r))

}

Tuesday, November 21, 2017

1PB in 1U

17 months ago I blogged about 900TB (nearly 1PB) in 1U of rack space for only $60k. There I noted that a 1U server for the new high-density SSDs wasn't commonly available. Well, that has changed. A couple of months ago, Super Micro announced their 32-bay 1U unit for 2.5" drives. With the 32TB SSDs that Samsung announced last year to be available this year, that yields 1PB.

It won't be cheap. Recall that the 900TB 4U for $60k was for spinning drives. Given that the 16TB SSDs go for nearly $12k a pop, the 32TB drive that has been slated for later this year would be at least twice that and likely much more initially. Even at $24k for each 32TB SSD, this 1U of 1PB SSD would set you back $800k.

Friday, October 27, 2017

The Spark GraphX of actual combat

Earlier this year, my book was translated into a Chinese edition. It actually has sold extremely well. I just noticed that Amazon has a product page for it, and they've given it the title The Spark GraphX of actual combat (Chinese Edition).

My hunch is that's what one gets if one translates "Spark GraphX in Action" into Chinese and then back into English.

Friday, October 20, 2017

Neo4j's query language Cypher coming to Spark

In my 2016 Spark Summit presentation Finding Graph Isomorphisms in GraphX and GraphFrames I reviewed the history of graphs in Spark, and how to query a graph in Spark GraphX required many more lines of code than an equivalent query in Neo4j using its Cypher language. Even Spark GraphFrames, which implements a tiny, tiny subset of Cypher requires more code than full Cypher.

Two years ago at the 2015 GraphConnect (an event sponsored by Neo4j), Ion Stoica of Databricks announced:

We look forward to bringing Cypher's graph pattern matching capabilities into the Spark stack, making graph querying more accessible to the masses.

Well, two years later, Neo4j announced yesterday:

Neo4j, a leader in connected data, announced that it has released the preview version of Cypher for Apache Spark (CAPS) language toolkit.

[...] Until now, data scientists have been using Spark and query tools like GraphX to define extensions to their graphs. Once identified, they would then re-implement and deploy that work within their applications. Now, with Cypher for Apache Spark, these scientists can iterate easier and connect adjacent data sources to their graph applications much more quickly.

[...] This announcement builds on Neo4j’s unveiling of openCypher in October 2015, as an effort to push the whole graph industry forward by tapping into the open source community and making Cypher’s evolution an open exercise while avoiding redundant research.