Introducing the New Yahoo Developer Network Website

By Amotz Maimon, Chief Architect, Yahoo

Today, we’re excited to announce the new Yahoo Developer Network (YDN) website. Supporting external developers with open source and open APIs has long been key to Yahoo’s success. With over 500,000 developers using Yahoo APIs, we needed a new home to support our developer community. The updated site significantly improves the experience for developers and advertisers using Yahoo APIs such as Search and advertising (Gemini).

What’s new?

  • Revamped experience: Based on your feedback, we’ve redesigned the site to make it easier for you to use our products. Our new design is mobile-friendly with an easier to use navigation and structure.

  • Documentation: With our new Getting Started Guides, it’s now faster to create applications using Yahoo APIs. We’re also featuring code examples in different languages to get you up and running as quickly as possible.

  • Developer Forums: This is the place to find answers or ask questions which may arise when developing with Yahoo APIs.

  • OAuth 2.0 Support: OAuth 2.0 is the next evolution of the OAuth protocol. We now support OAuth 2.0 for newer Yahoo APIs like Gemini API.

  • Attribution Guidelines: When you build a product using Yahoo APIs, follow the attribution guidelines posted here.

image

Check out the new Yahoo Developer Network

We’d love to get your input and hear your thoughts. Voice your feedback, suggestions or request features by visiting our Forums.

Happy coding!

 

Performance improvements for photo serving | code.flickr.com

We’ve been working to make Flickr faster for our users around the world. Since the primary photo storage locations are in the US, and information on the internet travels at a finite speed, the farther away a Flickr user is located from the US, the slower Flickr’s response time will be. Recently, we looked at opportunities to improve this situation. One of the improvements involves keeping temporary copies of recently viewed photos in locations nearer to users.  The other improvement aims to get a benefit from these caches even when a user views a photo that is not already in the cache.

It’s Official, Yahoo + Flurry!

Exploring Life Without Compass

Compass is a great thing. At Flickr, we’re actually quite smitten with it. But being conscious of your friends’ friends is important (you never know who they’ll invite to your barbecue), and we’re not so sure about this “Ruby” that Compass is always hanging out with. Then there’s Ruby’s friend Bundler who, every year at the Christmas Party, tells the same stupid story about the time the police confused him with a jewelry thief. Enough is enough! We’ve got history, Compass, but we just feel it might be time to try seeing other people. 

Changes in Flickr tables both v1 and v2

We recently announced that the Flickr API is going SSL-only.
To support this move, we have also restricted the Flickr YQL tables to be available over SSL-only.

All developers using the Flickr YQL tables will need to make the following updates to their API settings by June 24, 2014:

Protocol: HTTPS
Port: 443
The domain name query.yahooapis.com will remain the same.

As of June 24, 2014, we will limit all access to Flickr YQL tables to secure SSL connections only. No Flickr API data will be accessible over HTTP from this date onwards. If you don’t switch the access protocol to HTTPS, your users will not be able to access Flickr data via your service.
Thank you for supporting us and our users in making the shift to HTTPS.

You go to the Flickr Developer Guide for more information. 

Yahoo at Hadoop Summit, San Jose 2014

By Sumeet Singh, Sr. Director, Product Management, Hadoop


Yahoo and Hortonworks are pleased to host the 7th Annual Hadoop Summit - the leading conference for the Apache Hadoop community - on June 3-5, 2014 in San Jose, California.

image

Yahoo is a major open source contributor to and one of the largest users of Apache Hadoop.  The Hadoop project is at the heart of many of Yahoo’s important business processes and we continue to make the Hadoop ecosystem stronger by working closely with key collaborators in the community to drive more users and projects to the Hadoop ecosystem.

Join us at one of the following sessions or stop by Kiosk P9 at the Hadoop Summit to get an in-depth look at Yahoo’s Hadoop culture.


Keynote

Hadoop Intelligence – Scalable Machine Learning

Amotz Maimon (@AmotzM) – Chief Architect

"This talk will cover how Yahoo is leveraging Hadoop to solve complex computational problems with a large, cross-product feature set that needs to be computed in a fast manner.  We will share challenges we face, the approaches that we’re taking to address them, and how Hadoop can be used to support these types of operations at massive scale."


Track: Hadoop Driven Business

Day 1 (12.05 PM). Data Discovery on Hadoop – Realizing the Full Potential of Your Data

Thiruvel Thirumoolan (@thiruvel) – Principal Engineer

Sumeet Singh (@sumeetksingh) – Sr. Director of Product Management

"The talk describes an approach to manage data (location, schema knowledge and evolution, sharing and adhoc access with business rules based access control, and audit and compliance requirements) with an Apache Hive based solution (Hive, HCatalog, and HiveServer2)."

Day 1 (4.35 PM). Video Transcoding on Hadoop

Shital Mehta (@smcal75) – Architect, Video Platform

Kishore Angani (@kishore_angani) – Principal Engineer, Video Platform

"The talk describes the motivation, design and the challenges faced while building a cloud based transcoding service (that processes all the videos before they go online) and how a batch processing infrastructure has been used in innovative ways to build a transactional system requiring predictable response times."


Track: Committer

Day 1 (2.35 PM). Multi-tenant Storm Service on Hadoop Grid

Bobby Evans – Principal Engineer, Apache Hadoop PMC, Storm PPMC, Spark Committer

Andy Feng (@afeng76) – Distinguished Architect, Apache Storm PPMC

"Multi-tenancy and security are foundational to building scalable-hosted platforms, and we have done exactly that with Apache Storm.  The talk describes our enhancements to Storm that has allowed us to build one of the largest installations of Storm in the world to offer low-latency big data platform services to entire Yahoo on the common storm clusters while sharing infrastructure components with our Hadoop platform."

Day 2 (1.45 PM). Pig on Tez – Low Latency ETL with Big Data

Daniel Dai (@daijy)– Member of Technical Staff, Hortonworks, Apache Pig PMC

Rohini Palaniswamy (@rohini_aditya) – Principal Engineer, Apache Pig PMC and Oozie Committer

"Pig on Tez aims to make ETL faster by using Tez as the execution as it is a more natural fit for the query plan produced by Pig.  With optimized and shorter query plan graphs, Pig on Tez delivers huge performance improvements by executing the entire script within one YARN application as a single DAG and avoiding intermediate storage in HDFS. It also employs a lot of other optimizations made feasible by the Tez framework."


Track: Deployment and Operations

Day 1 (3:25 PM). Collection of Small Tips on Further Stabilizing your Hadoop Cluster

Koji Noguchi (@kojinoguchi) – Apache Hadoop and Pig Committer

"For the first time, the maestro shares his pearls of wisdom in a public forum. Call Koji and he will tell you if you have a slow node, misconfigured node, CPU-eating jobs, or HDFS-wasting users even in the middle of the night when he pretends he is sleeping."

Day 2 (12:05 PM). Hive on Apache Tez: Benchmarked at Yahoo! Scale

Mithun Radhakrishnan (@mithunrk), Apache HCatalog Committer

"At Yahoo, we’d like our low-latency use-cases to be handled within the same framework as our larger queries, if viable.  We’ve spent several months benchmarking various versions of Hive (including 0.13 on Tez), file-formats, and compression and query techniques, at scale.  Here, we present our tests, results and conclusions, alongside suggestions for real-world performance tuning."


Track: Future of Hadoop

Day 1 (4:35 PM). Pig on Storm

Kapil Gupta – Principal Engineer, Cloud Platforms

Mridul Jain (@mridul_jain) – Senior Principal Engineer, Cloud Platforms

"In this talk, we propose PIG as the primary language for expressing real-time stream processing logic and provide a working prototype on Storm.  We also illustrate how legacy code written for MR in PIG, can run with minimal to no changes, on Storm.  We also propose a “Hybrid Mode” where a single PIG script can express logic for both real-time streaming and batch jobs."

Day 2 (11:15 AM). Hadoop Rolling Upgrades - Taking Availability to the Next Level

Suresh Srinivas (@suresh_m_s) – Co-founder and Architect, Hortonworks, Apache Hadoop PMC

Jason Lowe – Senior Principal Engineer, Apache Hadoop PMC

"No more maintenance downtimes, coordinating with users, catch-up processing etc. for Hadoop upgrades.  The talk will describe the challenges with getting to transparent rolling upgrades, and discuss how these challenges are being addressed in both YARN and HDFS."

Day 3 (11:50 AM). Spark-on-YARN - Empower Spark Applications on Hadoop Cluster

Thomas Graves – Principal Engineer, Apache Hadoop PMC and Apache Spark Committer

Andy Feng (@afeng76) – Distinguished Architect, Apache Storm PPMC

"In this talk, we will cover an effort to empower Spark applications via Spark-on-YARN. Spark-on-YARN enables Spark clusters and applications to be deployed onto your existing Hadoop hardware (without creating a separate cluster). Spark applications can then directly access Hadoop datasets on HDFS."


Track: Data Science

Day 2 (11:15 AM) – Interactive Analytics in Human Time - Lighting Fast Analytics using a Combination of Hadoop and In-memory Computation Engines at Yahoo

Supreeth Rao (@supreeth_) – Technical Yahoo, Ads and Data Team

Sunil Gupta (@_skgupta) – Technical Yahoo, Ads and Data Team

"Providing interactive analytics over all of Yahoo’s advertising data across the numerable dimensions and metrics that span advertising has been a huge challenge. From getting results in a concurrent system back in under a second, to computing non-additive cardinality estimations to audience segmentation analytics, the problem space is computationally expensive and has resulted in large systems in the past. We have attempted to solve this problem in many different ways in the past, with systems built using traditional RDBMS to no-sql stores to commercial licensed distributed stores. With our current implementation, we look into how we have evolved a data tech stack that includes Hadoop and in-memory technologies."


Track: Hadoop for Business Apps

Day 3 (11:00 AM) – Costing Your Big Data Operations

Sumeet Singh (@sumeetksingh) – Sr. Director of Product Management

Amrit Lal (@Amritasshwar) – Product Manager, Hadoop and Big Data

"As organizations begin to make use of large data sets, approaches to understand and manage true costs of big data will become an important facet with increasing scale of operations. Our approach explains how to calculate the total cost of ownership (TCO), develop a deeper understanding of compute and storage resources, and run the big data operations with its own P&L, full transparency in costs, and with metering and billing provisions. We will illustrate the methodology with three primary deployments in the Apache Hadoop ecosystem, namely MapReduce and HDFS, HBase, and Storm due to the significance of capital investments with increasing scale in data nodes, region servers, and supervisor nodes respectively."


For public inquiries or to learn more about the opportunities with the Hadoop team at Yahoo, reach out to us at bigdata AT yahoo-inc DOT com.